====== What is 6DoF AAR? ======
Audio augmented reality (AAR) is a subset of augmented reality (AR). Instead of overlaid visual images, it enhances the real world with virtual sounds. The sounds can convey useful information about the environment, or they can provide a totally new narrative or emotional layer on top of the real one, including storytelling and musical elements.
Some AAR applications attempt at manipulating the surrounding soundscape by removing or replacing certain sounds or acoustic properties. While this approach still presents some technical challenges, such customisation of one's personal audio environment holds huge potential.
In an AAR experience with six degrees of freedom (6DoF), the user can move freely in a real-world space while the virtual sounds, played back binaurally through headphones, appear as embedded in the environment. The user's head is tracked by sensors, and when the head turns or moves in the space, the computer counter-rotates and moves the virtual soundscape to keep it spatially fixed to the real environment. This //6DoF dynamic binaural// approach potentially creates an illusion of another acoustic reality coexisting with the real world.
{{:listening_to_object_small.jpg?nolink|}}
The tracking system can also be used as an input method for interaction with the virtual world: audio events and dynamic narration can react to user’s location, head-orientation, and movements, thus advancing the non-linear narrative.
In the Full-AAR project, we call this 6DoF AAR (Six Degrees of Freedom Audio Augmented Reality), although it could be simply labeled as auditory //mixed reality//((Rauschnabel, et al. 'What is XR?', //Towards a Framework for Augmented and Virtual Reality//, Computer in Human Behavior, 133, 2022, p. 124 [[https://doi.org/10.1016/j.chb.2022.107289]])).
===== Narrative possibilites =====
AAR carries many intriguing possibilities for storytelling and immersive experiences. For example, it can be used to convey an alternative narrative of a certain place through virtual sounds interplaying with the real world. It can reveal stories of the invisible. It can bring back extinct sounds, or suggest how the world around us would sound in the future.
The medium is also potentially powerful in creating plausible illusions of something happening out of sight of the user, for instance, behind a wall or inside an object.
Unlike in traditional visual AR, in AAR the user's sight is not disrupted at all. In addition to the artistic possibilities it opens, this may be beneficial in places where situation awareness is important such as museums, shopping centres and other urban environments.
AAR might also be an interesting medium for visually impaired people. In the Full-AAR project, we are inviting visually impaired to test our demos and give feedback on the experience. We are curious to learn how they encounter the interplay between the real space and artificial acoustic elements as well as how they judge the fidelity and plausibility of the virtual audio.
{{:phone_ringing_small.jpg?nolink&250|}}
//Photo: A blind individual testing a narrative 6DoF AAR demo at WHS//
The use of headphones enables customised audio material to be fed to each listener, making the experience collective and private at the same time. The content can be modified according to the user, both in terms of content and time: audio information can offer different language, age level, and difficulty level versions, a plain language option, individual narrative content, etc.
===== Previous and existing 6DoF AAR projects =====
Here, we list some interesting narrative and artistic AAR projects with binaural audio with 6DoF tracking and which are //available to public//. The list is still very short, but will be complemented as soon as we become aware of more experiences fitting the criteria. We may be adding some research projects here, too, should they explore the narrative use of the medium.
In our list, indoor experiences are separated from outdoor experiences. That is due to the major difference in the way positional tracking can be done; outdoors, one can utilise the GNSS (satellite navigation) capabilities of mobile phones providing an easy and relatively accurate tracking method out of the box. In contrast, indoor positional tracking is currently much more challenging to realise, and something the Full-AAR project is mainly concentrating on.
==== Indoor experiences ====
LISTEN (2001-2003) by Frauenhofer Institute, IRCAM, TU Wien, AKG, and Kunstmuseum Bonn. A research project developing 6DoF audio guide for exhibitions. Both optical, marker-based IR tracking and RF-based pose tracking were used. The system personalised the content based on user's location, direction of gaze, and movement patterns. The project also explored with attractor sounds to draw the visitors' attention to certain exhibits.((Eckel, G. (2001). Immersive audio-augmented environments: The LISTEN project. Proceedings Fifth International Conference on Information Visualisation, 571–573. [[https://doi.org/10.1109/IV.2001.942112]]
)) ((Zimmermann, A., & Lorenz, A. (2008). LISTEN: A user-adaptive audio-augmented museum guide. User Modeling and User-Adapted Interaction, 18(5), 389–416. [[https://doi.org/10.1007/s11257-008-9049-x]]
))
[[https://foerterer.com/sound-of-things.html|Sound of Things]] (2013) by Holger Förterer: Two simultaneous users hearing virtual sounds produced by items on a table. Nice, poetic sound design with accurate 6DoF optical tracking using infrared LEDs mounted on wireless headphones.
[[https://usomo.de/en/projects/|Multiple exhibitions with the usomo system]] by FRAMED immersive projects. For instance, [[https://www.maisongainsbourg.fr/|Maison Gainsbourg]] in Paris and [[https://zenehaza.hu/|House of Music]] in Budapest. In the exhibitions utilising the //usomo// system, dozens of simultaneous users can move freely within the space wearing headphones and a mobile device. Head orientation and location are tracked using UWB and IMU, enabling versatile and interactive content with spatialised and environment-embedded augmented sounds.
[[https://www.augarten.com/de/content/museum.html|The museum of Wiener Porzellanmanufaktur Augarten]] with an audio AR guide and musical piece realised by [[https://www.scopeaudio.com/|Scopeaudio]] using their proprietary software for iPhone, which utilises inside-out tracking and object recognition. The museum experience contains numerous augmented audio events across two floors. While not supporting head-tracking, the handheld operation still offers a surprisingly good sense of 6DoF.
==== Outdoor experiences ====
Growl Patrol by Queen's University in Ontario, Canada (2011): A geolocative audio game in a park utilising head tracking and spatialised audio.((John Kurczak, //The use of ambient audio to increase safety and immersion in location-based games// (Queen’s University, Ontario 2012), pp. 42-45 [[http://hdl.handle.net/1974/6997]]))
[[http://www.scopeaudio.com/portfolio/sonic-traces/|Sonic Traces]] (2020) by Scopeaudio at Heldenplatz in Vienna: A large-scale location-based AAR experience about the history and future of Vienna. The experience uses Bose AR headphones with head-tracking capability.
[[https://the-planets.app/|The Planets]] (2022) by Sofilab UG and Münchner Philharmoniker: A interactive audio walk through the orchestral suite 'The Planets' by Gustav Holts. The app-based experience is available in multiple parks in European and US cities. Dramaturgy and sound design by Mathis Nitschke.
===== Technical requirements =====
:!: This section is still a work in progress. :!:
There are a few ways to embed virtual sounds in the user's real environment:
1. Direct augmentation((Normand, Jean-Marie, et al. ‘A New Typology of Augmented Reality Applications’. //Proceedings of the 3rd Augmented Human International Conference,// ACM, 2012, p. 18:1-18:8. ACM Digital Library, [[https://doi.org/10.1145/2160125.2160143]].))
- loudspeakers hidden in the real-world environment \\
2. Binaural audio through a wearable
- headphones or another wearable listening device
- 6DoF tracking \\
3. Binaural audio through loudspeakers / Cross-talk cancellation (CTC)((Choueiri, Edgar. 'Binaural Audio Through Loudspeakers', //Immersive Sound: The Art and Science of Binaural and Multi-Channel Audio//, Routledge, 2017, p. 124))
- beamforming separate audio signals to user's left and right ear from a multi-speaker array \\
4. Wave field synthesis (WFS)((Daniel, Jerome, et al. 'Further Investigations of High-Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging'. //Journal of the Audio Engineering Society//, www.aes.org, March 2003, p. 18, [[http://www.aes.org/e-lib/browse.cfm?elib=12567]]))
- a large number of loudspeakers used to produce ’holophonic’ sounds inside the listening space \\
5. Directional speakers
- narrow sound beam created with ultrasonic modulation and demodulation
- some speakers based on conventional speaker technology \\
6. Compensated amplitude panning (CAP)
- based on head-tracking data, accurately adjusting timings and amplitudes between loudspeakers to create stable virtual auditory images
Since the Full-AAR project concentrates on the second option, the technical requirements for it will be discussed below. The principles of a 6DoF AAR system have been described already in the 1990s by Jens Blauert((Blauert, Jens. //Spatial hearing: The psychophysics of human sound localization (Rev. ed)//, MIT Press, 1997, p. 386))
and Benjamin Bederson((Bederson, Benjamin. 'Audio augmented reality: A prototype automated tour guide.' //CHI '95: Conference Companion on Human Factors in Computing Systems,// May 1995, pp. 210–211, [[https://doi.org/10.1145/223355.223526]])) without major differences from modern setups such as presented by Hannes Gamper in 2014((Gamper, Hannes. //Enabling Technologies for Audio Augmented Reality Systems//, Aalto University, 2014, p. 24, [[http://urn.fi/URN:ISBN:978-952-60-5622-7]])), Robert Albrecht in 2016((Albrecht, Robert. //Methods and applications of mobile audio augmented reality//, Aalto University, 2016, pp. 22-23, [[https://aaltodoc.aalto.fi:443/handle/123456789/21198]])), or Naphtali and Rodkin in 2020 ((Naphtali, Dafna and Rodkin, Richard. 'Audio augmented reality for interactive
soundwalks, sound art and music delivery.' //Foundations
in sound design for interactive media: a multidisciplinary approach//, edited by Michael Filimowicz, Routledge, 2019
[[https://doi.org/10.4324/9781315106342]])). \\
Yang, et al. (2022)((Yang, J., Barde, A., & Billinghurst, M. (2022). Audio Augmented Reality: A Systematic Review of Technologies, Applications, and Future Research Directions. Journal of the Audio Engineering Society, 70(10), 788–809.)) have identified five key components required to technically realise AAR. They are
1. User-object pose tracking
2. Interaction technology
3. Display technology
4. Room acoustics modeling
5. Spatial sound synthesis
We will, however, use a slightly different categories below to describe the technical requirements of a headphone-based AAR experience.
==== Runtime engine ====
A key element of AAR is a system that controls and feeds audio to the user based on sensory data and interactional rules. Ronald Azuma called this a 'scene generator'((Azuma, Ronald. 'A Survey of Augmented Reality'. //Presence: Teleoperators and Virtual Environments, 6(4)//, August 1997, p. 363, [[https://doi.org/10.1162/pres.1997.6.4.355]])), although the term has not really been adopted, so we call it, boringly enough, 'runtime engine'. The runtime engine can be hosted by a computer or mobile device, or be purpose-built for the particular experience.
The runtime engine manages all the audio material, either pre-recorded or generated in runtime, and plays them back according to the programmed logic. Sometimes, a separate audio engine or audio middleware is used to handle the audio content, although in many wearable and mobile devices audio is handled directly within the hardware for efficient operation.
==== Sensors ====
Sensors are needed to track the user's location and head orientation in relation to the environment. A tracking system enables the virtual sounds to stay fixed in their three-dimensional positions regardless of the spatial location and orientation of the user. According to the head movements in the real world, the virtual environment is rotated and moved to keep it stationed with its physical counterpart.
[{{ ::6dof.png?300|Six degrees of freedom (6DoF)}}]When an object can move freely in three-dimensional space, it's called having "six degrees of freedom" (6DoF), consisting of three position coordinates (x, y, z) and three orientation angles (yaw, pitch, roll). With 6DoF tracking, a fully immersive AAR experience can be created, allowing users to move freely while sounds are rendered from their intended positions.
Also, with a sensory system, user's actions can be tracked to trigger interaction. For example, a location tracker can detect when the user walks into a certain zone and call the scene generator to start a corresponding sound scene. By tracking both user's location and head orientation, an interaction can be triggered when the user supposedly gazes at an object. Further, user's hand movements can be tracked, or their voice analysed, or why not even their biosignals measured.
To enrich the narrative environment, the sensors can track other objects, too, for example, items that the user is allowed to carry around and possibly emit an augmented sound of their own. Another example would be a door that can be equipped with a sensor to detect how much it is open, consequently controlling an audio filter simulating the occlusion effect of a sound source heard through the door opening.
{{:object_tracking_with_zed_480x480.gif?nolink&480|}}
//Photo: Detecting and tracking an object using ZED SDK by Stereolabs//
==== Spatialiser ====
Spatialiser or 'virtual audio engine' is a software plugin or package that makes the virtual sounds appear to emanate from the environment, outside of the user's head. It runs alongside the scene generator from which it receives the sounds with information about their location and orientation. It then incorporate an acoustic model of the space for simulating the reverberations and other reflections, and possibly attempts to simulate sound propagation as well. Finally, it creates a binaural render for the sounds and their reflections to be played back through headphones.
==== Headphones ====
Headphones -- or some other wearable listening device -- feed the binaurally rendered audio to the user’s ears and enable a subjective auditory experience in contrast to using loudspeakers. One of the key attributes of the headphones is their acoustic transparency. In general, one can argue that acoustically transparent headphones potentially attribute to a more plausible acoustic illusion of virtual sounds co-existing with the real world, whereas acoustically isolating headphones allow more control over the virtual soundscape.
Instead of traditional over-the-head headphones, audio can be fed using other 'earables' such as open-ear headphones and other devices or bone-conduction headphones.
:!: More detailed info coming up... :!: