# Spatial and contextual relationship between a sound and its source In 6DoF AAR, an essential narrative consideration is the relationship of virtual sounds to their physical counterparts. In other words, there may be many narrative consequences depending on whether a sound is attached to a real-world object or not, whether it matches with the object or not, etc. Further, the acoustic space around the listener may or may not match with the real one, and that appears to be a powerful storytelling tool in this medium. In the Full-AAR project, we are also utilising the interactive capabilities the tracking technology offers us. Hence, those possibilities combined with the virtual spatial audio brings a whole new subset of narrative techniques to be explored. Most of the narrative techniques presented here and tested in the project are based on the original identification by Matias Harju((Harju, Matias. //Exploring narrative possibilities of audio augmented reality with six degrees of freedom//, Aalto University, 2021, [[https://aaltodoc.aalto.fi/handle/123456789/108358]])). :!: This page is still a work in progress. :!: ## Spatial relationship between sound and source ### Attachment * virtual sound attached to a real-world object (visible/tangible/scentable) * character's voice emanating from a picture frame * sound heard behind a door * street sounds through a window... etc. * thanks to the ventriloquism effect, visuals cues improve localisation performance ((Tabry, Vanessa, et al. 'The influence of vision on sound localization abilities in both the horizontal and vertical planes'. //Frontiers in Psychology,// 12 December 2013, [[https://doi.org/10.3389/fpsyg.2013.00932]])), a phenomenon Michel Chion describes as 'magnetization' ((Chion, Michel. //Audio-Vision: Sound on Screen//, Columbia University Press, 1994)) * however, according to our experience, if the user tracking in a headphone-based system is not stable and the sound source keeps constantly moving, the visual cue may not help to magnetize the sound * in case of //contextual mismatch// (see below), perhaps try to sell the mismatch through story * if sound is //within-reach// (see below), with binaural 6DoF, requires accurate and low-latency head tracking * Härmä et al. (2003) labelled this approach as //localized acoustic events connected to real world objects//((Härmä, Aki, et al. //Techniques and Applications of Wearable Augmented Reality Audio.// Audio Engineeringn Society, 2003. www.aes.org, [[http://www.aes.org/e-lib/browse.cfm?elib=12495]])) * Cliffe (2022) calls the physical object to which the virtual sound is attached as //audio augmented object//((Cliffe, Laurence. //Audio Augmented Objects and the Audio Augmented Reality Experience//, University of Nottingham, 2022, p. 12 [[https://eprints.nottingham.ac.uk/69795/]])) * the object can also be called 'sound object'(([[https://audioar.org/glossary/]])) ### Detachment/Acousmêtre * sound is "acousmatic" and has no perceivable counterpart in the real environment, but is still relative to the environment, not user's head * ghost * imagined sounds, memories * acousmatic transformation of place... etc. * could be considered as a move from pure //augmented// reality towards //virtual// * potentially challenging to make believable; also difficult to spatialise so that sound's position is clearly perceivable * on the other hand, acousmatic ('ghostlike') sounds may be more forgiving for their positional accuracy than sounds that are supposed to be attached to an object * ways to potentially improve plausibility: * foleys * believable acting * interaction with the player (asking questions, asking player to do something, etc.) * directional sound emitter ('mouth') moving around by either prefixed motion capture animations or AI-based pathfinding navigation * high-quality virtual acoustic rendering * introducing the character first 'out-of-sight' (behind door, talking from another room), then revealing them as being invisible ### Locative audio * or 'locational audio'(([[https://audioar.org/glossary/]])) * soundscapes audible inside certain areas or zones * basic principle of //locative audio// experiences such as audio walks and some museum audio tours * possible to realise with proximity sensors without 6DoF tracking * can utilise 6DoF, or be head-locked \\ ### Spatial focus * a concept by Laurence Cliffe((Cliffe, Laurence. //Audio Augmented Objects and the Audio Augmented Reality Experience//, University of Nottingham, 2022, p. 83 [[https://eprints.nottingham.ac.uk/69795/]])) where the use of non-spatialised ambience may highlight the position of an augmented audio source more effectively when compared to using spatialised ambience ### Spatial offset * e.g., sound of an airplane appears to be lagging behind * e.g., 'zooming' into a distant sound source as if using audio binoculars * risk of appearing as an error, if not well motivated narratively ### Panning towards the target * instead of full spatialisation, the virtual sound (e.g. narrator's voice) is panned slightly towards the target (e.g. museum exhibit); used by Scopeaudio in, e.g., the porcelain museum of Vienna Augarten ### Within-reach * sound inside the 'play area' * user can walk around the sound and go very close * sets high requirements for virtual audio * with binaural 6DoF, requires accurate and low-latency head tracking ### Out-of-reach * sound outside of the 'play area' * sounds leaking from adjacent rooms, behind windows... * ambience sounds * possible to realise by using ambisonics instead of object-based audio ### Near field * sounds very close (< 1 m) to the listener's head((Roginska, Agnieszka. 'Binaural audio through headphones'. //Immersive Sound: The Art and Science of Binaural and Multi-Channel Audio//, Routledge, 2017, p. 122, [[https://doi.org/10.4324/9781315707525]])) * either head-locked or utilising head-tracking * fly buzzing around head, ASMR-style sounds... etc. ### 3DoF * sounds relative or 'attached' to the user's location, i.e. move with the user, but still support head tracking * typically used for ambisonic ambiences, but can be applied to object-based audio sources, too * sometimes useful for narrative or sound design reasons in e.g., memory scenes where the real environment can be contextually faded slightly to the background ### Spatial (a)synchronisation * the coordinates of the sounds are relative to something else than the real environment * 3DoF technique would be one example (relative to the user) * in wilder scenarios sounds could be relative to another user * Härmä et al called this //freely-floating acoustic events//((Härmä, Aki, et al. //Techniques and Applications of Wearable Augmented Reality Audio.// Audio Engineeringn Society, 2003. www.aes.org, [[http://www.aes.org/e-lib/browse.cfm?elib=12495]])) * other than 3DoF, perhaps not the most useable technique... ## Contextual relationship between sound and source ### Congruence * sound matches the physical appearance of its real-world counterpart * radio programme played from a radio receiver * environmental sounds match the real-life surroundings... etc. * Schraffenberger and Heide (2014) talk about virtual content //relating// or becoming a part of a physical element or object((Schraffenberger, Hanna, and Edwin van der Heide. ‘The Real in Augmented Reality.’ //xCoAx: Proceedings of the Conference on Computation, Communication, Aesthetics and X//, 2014, pp. 70–71)) * the sound would complement or reinforce the object ### Alternative match * sound aligns with its real-world source but presents a different interpretation of the actual object or scenario * two users are viewing a documentary film; nevertheless, they perceive distinct voiceovers, each providing a unique interpretation of the visual events * a bad smell begins to permeate the room; one user hears a car engine idling outside, suggesting the smell comes from the tailpipe, while the other perceives a hiss as if a gas pipe were broken ### Divergence * sound contrasts with the real-world object \\ * dog sounds emanating from a person * acousmatic transformation, ie. environmental sounds displaced from the real-life surroundings (you stand in a gallery space but hear forest around you) Congruence and divergence are, of course, highly contextual; e.g. a dog doesn't normally talk (dog + talk = mismatch), but in a story it can (dog + talk = match). Also, drawing the line may be difficult, e.g. crackling sound of fire in a fireplace with a pile of unlit wood would match and mismatch at the same time. Another example would be illustrative sound design and sounds produced by imagination that would be a congruence with the narrative and dream images but, at the same time, a divergence with the real environment around the user. In the context of museum items, Cliffe((Cliffe, Laurence. //Audio Augmented Objects and the Audio Augmented Reality Experience//, University of Nottingham, 2022, p. 67 [[https://eprints.nottingham.ac.uk/69795/]])) discusses the difference between augmenting //silent// objects (e.g., a photograph) and augmenting //silenced// objects (e.g. a radio receiver that no more produces sound). This is one very nice way to approach the issue. When transforming the space into something else, the acoustic properties of the imagined new space won't necessarily match the real surroundings. E.g., the reverb decay may much shorter. However, to embed or 'glue' the sounds of the new environment to the user's real environment it may be necessary to use some amount of the real-space acoustics with the new sounds. ### First-person sounds * user assumes a character's role: character's speech, foleys and other sounds attached to the user * May be challenging to get working, unless introduced carefully in the beginning of the experience. * 'Augmented humans'((Schraffenberger, Hanna, and Edwin van der Heide. ‘The Real in Augmented Reality.’ //xCoAx: Proceedings of the Conference on Computation, Communication, Aesthetics and X//, 2014, pp. 70–71)) ### Extension * the use of acousmatic sounds outside of the visual or otherwise perceivable space to create and suggest location, environment, time, weather, and other external parameters or set the narrative focus; a term by Michel Chion((Chion, Michel. //Audio-Vision: Sound on Screen//, Columbia University Press, 1994, pp. 86-89)) ### Additive enhancement * additional sound or effect attached to a real-world sound * a healthy car sound is augmented with a squeaky belt sound * an old radio receiver humming in the real world with a virtual radio programme augmented on top of that ### Synchresis * precisely synchronising a virtual sound to a single, momentary real-world event in order to create a contextual fusion between these two((Chion, Michel. //Audio-Vision: Sound on Screen//, Columbia University Press, 1994)) * potentially most effective when the real-world event is unpredictable and caused by a real object, such as other people, animals, nature, etc. ### Removing * real-world sound removed * requires an acoustically isolated auditory display system (e.g. closed-back headphones), or sophisticated wavefield cancellation technology ### Masking * real-world sound masked by a virtual sound * requires an acoustically isolated auditory display system (e.g. closed-back headphones), or sophisticated wavefield cancellation technology ### Manipulation * real-world sound manipulated by replacement * benefits from an acoustically isolated auditory display system, or sophisticated wavefield cancellation technology ### Evolvement and adaptation * constantly evolving or adapting steady sounds to help them remain distinct and prevent them from 'disappearing' as the user becomes accustomed to them((Dam, A., Lee, Y, Siddiqui, A, Lages, W. S. and Jeon, M. //Audio augmented reality using sonification to enhance visual art experiences: Lessons learned//, International Journal of Human-Computer Studies, vol. 191, Nov. 2024, p. 9 [[https://doi.org/10.1016/j.ijhcs.2024.103329]]))