Spatial and contextual relationship between a sound and its source

In 6DoF AAR, an essential narrative consideration is the relationship of virtual sounds to their physical counterparts. In other words, there may be many narrative consequences depending on whether a sound is attached to a real-world object or not, whether it matches with the object or not, etc. Further, the acoustic space around the listener may or may not match with the real one, and that appears to be a powerful storytelling tool in this medium.

In the Full-AAR project, we are also utilising the interactive capabilities the tracking technology offers us. Hence, those possibilities combined with the virtual spatial audio brings a whole new subset of narrative techniques to be explored.

Most of the narrative techniques presented here and tested in the project are based on the original identification by Matias Harju1).

Spatial relationship between sound and source


  • thanks to the ventriloquism effect, visuals cues improve localisation performance 2), a phenomenon Michel Chion describes as 'magnetization' 3)
  • however, according to our experience, if the user tracking in a headphone-based system is not stable and the sound source keeps constantly moving, the visual cue may not help to magnetize the sound
  • in case of contextual mismatch (see below), perhaps try to sell the mismatch through story
  • if sound is within-reach (see below), with binaural 6DoF, requires accurate and low-latency head tracking
  • Härmä et al. (2003) labelled this approach as localized acoustic events connected to real world objects4)
  • Cliffe (2022) calls the physical object to which the virtual sound is attached as audio augmented object5)
  • the object can also be called 'sound object'6)


  • could be considered as a move from pure augmented reality towards virtual
  • potentially challenging to make believable; also difficult to spatialise so that sound's position is clearly perceivable
  • on the other hand, acousmatic ('ghostlike') sounds may be more forgiving for their positional accuracy than sounds that are supposed to be attached to an object
  • ways to potentially improve plausibility:
    • foleys
    • believable acting
    • interaction with the player (asking questions, asking player to do something, etc.)
    • directional sound emitter ('mouth') moving around by either prefixed motion capture animations or AI-based pathfinding navigation
    • high-quality virtual acoustic rendering
    • introducing the character first 'out-of-sight' (behind door, talking from another room), then revealing them as being invisible

Locative audio

Spatial focus

Spatial offset

  • risk of appearing as an error, if not well motivated narratively

Panning towards the target


  • sets high requirements for virtual audio
  • with binaural 6DoF, requires accurate and low-latency head tracking


  • possible to realise by using ambisonics instead of object-based audio

Near field


  • typically used for ambisonic ambiences, but can be applied to object-based audio sources, too
  • sometimes useful for narrative or sound design reasons in e.g., memory scenes where the real environment can be contextually faded slightly to the background

Spatial (a)synchronisation

  • 3DoF technique would be one example (relative to the user)
  • in wilder scenarios sounds could be relative to another user
  • Härmä et al called this freely-floating acoustic events10)
  • other than 3DoF, perhaps not the most useable technique…

Contextual relationship between sound and source


  • Schraffenberger and Heide (2014) talk about virtual content relating or becoming a part of a physical element or object11)
  • the sound would complement or reinforce the object

Alternative match


Congruence and divergence are, of course, highly contextual; e.g. a dog doesn't normally talk (dog + talk = mismatch), but in a story it can (dog + talk = match). Also, drawing the line may be difficult, e.g. crackling sound of fire in a fireplace with a pile of unlit wood would match and mismatch at the same time. Another example would be illustrative sound design and sounds produced by imagination that would be a congruence with the narrative and dream images but, at the same time, a divergence with the real environment around the user.

In the context of museum items, Cliffe12) discusses the difference between augmenting silent objects (e.g., a photograph) and augmenting silenced objects (e.g. a radio receiver that no more produces sound). This is one very nice way to approach the issue.

When transforming the space into something else, the acoustic properties of the imagined new space won't necessarily match the real surroundings. E.g., the reverb decay may much shorter. However, to embed or 'glue' the sounds of the new environment to the user's real environment it may be necessary to use some amount of the real-space acoustics with the new sounds.

First-person sounds

  • May be challenging to get working, unless introduced carefully in the beginning of the experience.
  • 'Augmented humans'13)


Additive enhancement





Evolvement and adaptation

