Introduction to Ambisonics

The development of Ambisonic surround sound systems started in the 1970’s. They were based on a mathematical model of directional psychoacoustics, which was developed by Michael Gerzon. The model which he developed, described directional psychoacoustics in a mathematical form, so that it could conveniently be used in calculations relating to surround sound system (Mallham 10) Ambisonics can be defined as a method of recording information about a soundfield and reproducing it over some form of loudspeaker array so as to produce the impression of hearing a true three dimensional sound image.

Simply put, Ambisionics is a system of recording and playback of sound fields. The model was developed by Gerzon. It used several previous theories of loaclization, and hence was more directional as compared to other systems, resulting in benefits such as reduced listening fatigue, good inter-loudspeaker imaging with reduced susceptibility to the detent effect – the tendency for apparent source locations to be pulled towards the closest loudspeaker. This later feature improved image stability (Cotterell ch-1 16)

Don't use plagiarized sources. Get Your Custom Essay on
Introduction to Ambisonics
Just from $13/Page
Order Essay

Ambisonics is basically a two-part system of recording and playback, which is built upon strong mathematical foundations and theories of human hearing. The term soundfield, mentioned above, is a word used for describing sounds in the environment. It usually implies 3 dimensions. For recording in Ambisonics, an array of microphone capsules are built into one microphone called a Soundfield microphone. This Soundfield mike recording are then processed and encoded into a special format, called B-format, which may be written on and distributed using many different types of media.
In the endy, a decoder is used to process and recover the soundfield, This can be played back in many types of listening setup, giving a unique output for every speaker available (Adams 2) As is seen in above, Ambisonics employs a signal set, known as “B-format”. This is based on the principle of encoding direction, without reference to the loudspeaker layout used for reproduction. This results in Ambisonic systems being adaptable to multiple loudspeaker layouts. Using an Ambisonic decoder, appropriate loudspeaker feed signals can be derived from the transmitted B-format signals.
Generally the number of loudspeakers exceeds the number of B-format signals. This is done to obtain good performance from Ambisonic systems, since in the number loudspeakers usually gives better results (Adams 2) It is observed that the Brain can process Soundfield information in several ways. However, much of this processing is seen to rely on the differences between sounds reaching the ears. As an example, if a sound is played to on the right side, because the right ear is closer, it will first receive the sound than the other ear.
This is termed as Interaural Time Delay (ITD). Again, in this case the sound would be quieter in the left ear, not only because it is farther away, but because the body and head will absorb some of the sound. This is termed as Interaural Level Difference (ILD). The brain also pays attention to the timing between and relative loudness of sounds coming directly from the source and sounds reflected off of walls and other surfaces. This means that sources that are farther away tend to have more reflected energy than direct energy reaching the ears.
These ear-brain interactions are called psychoacoustics (Adams 2). When a Soundfield recording is done, more efforts are made to gather and record as much information about a very small volume of space, as possible. This information comprises of just – measurements of air pressure and air pressure changes. By just adding an omnidirectional microphone to the pair of figure eight units, it is seen that one can capture all this information, with simple low order microphones. The three hree capsules should be acoustically at exactly the same place in the soundfield i.
e. , The capsules are arranged to be truly coincident. This becomes even more difficult when theer is an addition of an up-down oriented figure eight capsule in order to record height information.. This problem has been overcome in the Soundfield microphone which uses four small capsules situated on the surface of a notional sphere to sample the incoming sounds (Mallham 12) Building upon recording techniques for stereo developed by Alan Blumlein in the 1930s, Ambisonics extends this recording of information into the third dimension.
The idea is to record not only the pressure at a location, but also the pressure changes in different directions – front-back, left-right, and up-down. With proper decoding of this information using mathematics as well as psychoacoustics, a soundfield can be produced by speakers in the room, which reconstructs what was going on in that small volume at the listener’s location (Adams 2). In Ambisonics the horizontal figure eight units are mounted front-back and side-to-side rather than at 45 degrees (Adams 2).
Basic Ambisonics Technology The Ambisonic surround sound system can be considered as a two part technological solution to the problems of encoding sound directions, and reproducing them over loudspeaker systems in such a way that the listeners ears are under the impression that the sounds they are gearing are from speakers which are correctly located. Technically speaking, this can take place over a 360 degree horizontal only soundstage i. e. pantophonic systems or over the full sphere i. e. periphonic systems (Mallham 14)
Ambisonic offers a hierarchy of encoding schemes from a stereo compatible UHJ format (Sinclair 27) There is no need to consider the actual details of the reproduction system when doing the original recording or synthesis Encoding Equations – The position of a sound within a three dimensional soundfield is encoded in the four signals which make up the B format: X = cosA. cosB (front-back) Y = sinA. cosB (left-right) Z = sinB (up-down) W = 0. 707 (pressure signal) Where, A is the anti-clockwise angle from centre front and B is elevation (Mallham 14)
B- Format A number of different signal sets may be used at some stage of an Ambisonic system: A-format signals – The output signals of the microphone capsules making up a soundfield microphone. This signal set is not available to the outside world; it is utilised only within the soundfield microphone itself. C-format signal – The signal set which is conveyed to the listener via a recording or transmission medium, when it differs from B-format, is termed C- format. The “C” is sometimes said to stand for “consumer”.
The C-format signal sets were proposed as part of the initial development of Ambisonics are together known as the UHJ hierarchy (Cotterell Ap-2 16) With Ambisonics, a sound field is decomposed into spherical harmonic components, termed W, X, Y and Z. These are collectively called B-Format (Leese 7). B-format is based on the principle of encoding direction, without reference to the loudspeaker layout used for reproduction. As a result of this, ambisonic systems are adaptable to multiple loudspeaker layouts; an ambisonic decoder derives appropriate loudspeaker feed signals from the transmitted B-format signals.
To obtain good performance from ambisonic systems, the number of loudspeakers should exceed the number of B-format signals used; a further increase in the number of loudspeakers will usually give improved results (Cotterell ch-1 18) B- format signals can also be expressed in terms of derivatives of sound pressure. The nth-order B-format signal set consists of the signals obtained from coincident microphones having polar patterns corresponding to the (n +1) linearly independent spherical harmonics of all orders up to and including n.
A signal set of any order can be extended to a higher order merely by augmenting it with additional signals; it is not necessary to change any of the existing signals. Thus, an nth order soundfield microphone is distinguished by its ability to provide outputs which are the nth-order B-format signal (Cotterell ch-4 1) B-format signal is the primary signal format for Ambisonic use. Ideally, the B-format signals would be communicated directly to the listener.
Unfortunately, this has not always been possible – in particular, the need to distribute recordings via two-channel media, retaining compatibility with existing stereo and mono equipment, led to the need for alternative signal formats to be employed (Cotterell Ap-2 16) Enhanced B-Format; BE-, BF- & BEF-Format Enhanced B-format signal sets have been proposed in connection with B-format decoders, which are optimized specifically for use with HDTV, or more generally for use in support of visual media (Cotterell Ap-2 19)
The primary motivation is to produce a frontal sound stage which is more stable with respect to movement by the listener, and specifically to lock centre-front acoustic images in place with respect to a screen. Thus, the motivation is substantially the same as for the use of the centre channel in cinema-oriented surround sound formats. Hence, two extra signals are defined, denoted as E and F, which have directional response patterns. (Cotterell Ap-2 16) Ambisonics and Stereo The B format signals are not at all stereo compatible.
However, it is possible to combine the three (X,W,Y) components required for horizontal work in such a way that not only is a good stereo compatible two channel system produced but with a suitable decoder much of the original surround sound image can be recovered. This resulting Soundfield is also not perfect, but carefully designing the encoding equations can make it possible to place the defects in areas where the ear is less susceptible (Mallham 10) This encoding method, called UHJ coding, is used to produce stereo compatible Ambisonic records, tapes and broadcasts.
The X,Y and W signals are combined in the form of a two-channel compatible stereo signal (Brice 225). This is done using the following equations Left = (0. 0928 + 0. 255j)X + (0. 4699 – 0. 171j)W + (0. 3277)Y Right= (0. 0928 – 0. 255j)X + (0. 4699 + 0. 171j)W – (0. 3277)Y The above decoding equations are such that a decoder for any of the levels will always extract the correct information from high level inputs – in other words the system is upward compatible (Mallham 10) B+ format Another extension of the B-format, mentioned above, was developed by Dr. Thomas Chen.
He calls it as “B+format”. This gives an optional enhancement to the Ambisonics listening experience. Definition B+ format is the standard 4 channels of B format material known as W, X, Y, and Z plus 2 channels of L/R dry stereo recording. This method of recording separates the dry/direct sound from the ambient/room sound in recording and reproduction. Each serves a distinct purpose and is ideally suited to it. (Chen 1) The “+” which is seen in B+ Format denotes the presence of a close-miked standard stereo recording simultaneous with a soundfield recording.
This is decoded to the front-half of the rig to give the impression that there is a set of stereo speakers on its surface in front of the listener. The angle between them is around 60 degrees (Adams 2) Chen is the inventor of B+ format. This format extends Ambisonic B format with two additional channels. Going by his definition of the format, one can see that, Chen suggested B+ format by combining the 1st order ambisonic B-format with two additional channels L/R dry channel recording (Trond 2) One way to look at this idea is that the format separates the dry signal and ambience.
This means that the dry and ambient signal can be dynamically mixed depending on the space used for playback as well as the preferences of the listener (Trond 3) Why use the B+ format Chen considers that the sound system he ahs created, would provide the following feature:. • Ambient – The system would be ambient. This means that the whole acoustic event could be captured and regenerated, such that it manages to give convincing impression to the listener’s ears, which a conventional stereo sound systems doe not. A soundfield must be present on reproduction.
• Accurate – The system would be accurate. This means that the direction of the sounds must be exact, as per the specifications or requirements i. e. front-stage, to the sides, above or below, behind. • Sounds would be free from audible distortions of tone, timbre or position. The system must equally suit all types of singers, without any particular voice sounding good or bad. • Scaleable – The system would support the group listening no matter the size of the group, and the same system could be used by an individual without compromising on the sound quality.
The scaling factor would be adjustable • Approachable – The system would be easy use without any complications, physical or otherwise, on the behalf of the listener. For e. g. – No head clamps or enforced body positions or forbidden head directions or microscopic sweet spots or compulsory narrow listening seat. • Practical – The system would be easily tailored to suit the ordinary domestic listening environments, at least as good as what conventional a stereo does.
• Compatible – The system would be able to satisfactorily replay the recordings of mono and stereo material; while sound recorded using the technology would replay satisfactorily on conventional stereo systems or would be able to readily convert to match them. For example, FM stereo being broadcast as a mono-compatible Sum and Difference rather than directly as a Left and Right signal. (Chen 2) As mentioned above, B+ Format consists of B format ambient-recorded signal plus two channels of L/R direct recorded signal. It is recorded on 6 channels and when reproduced it is decoded into as many channels as the listeners wishes. (Chen 3)
By using the two channels of L/R direct recorded signal , the directional sound clues from the direct signal as well as the sound of the instrument are present only in the front i. e. – instrument setting experience. There is no vertical height information in this pair, which is added to the frontal speaker array (Chen 3) By enabling the ambient and direct information to be stored separately, the listener can choose the balance of direct and ambient information at the time of reproduction. This will allow for many different sized reproduction spaces. The larger space will need less ambient information while the smaller spaces will need more.
In addition the listener can choose the amount of preferred ambience (Chen 3) Time, Location and Spatial Representation The direct sound gives the best directional and instrumental timbre information. Hence, the direct sound should arrive earlier than the ambient signal. Also, the direct sound doesn’t have any spacious or enveloping qualities (Chen 4) The ambient sound best gives spaciousness, spatial depth and envelopment to the reproduction of sound. B format presentation is known to be inaccurate in presenting location and therefore many scientists are trying to improve the reproduction with higher order of B format decoding.
Also, first order of B format is considered to be satisfactory for presenting the ambient sound because it is not well localized (Chen 4). Storage Requirements As mentioned in the section above, B+ Format recordings require 6 channels of storage. This can be obtained with either SACD or DVD-A using MLP compression. Chen says that the advantage of using B+ format disks is that one is not limited to the speaker arrangement that the format will dictate but rather a decoder will determine your listening set up or you can set the decoder for your listening set up (Chen 5) Recording Techniques for B+ Format
Direct Recording Stereo Mics – Recording of the direct channel done by conventional means. Can use stereo microphone techniques: X/Y, M/S or OCT. Stored as L/R or M/S. Direct signal should contain little reverberant information. Recorded using direction microphones. Spot Mics – Spot microphones or flanking microphones to balance recorded sound. Need to add ambience to the spot microphones and to flanking microphones. The ambience added should be in B format and added to the B format channel. Multi Track Techniques – Multi-track techniques used to generate the direct signal and to add ambience in B format.
B-Format Recording Room Recording – Room ambient recording done with the Soundfield microphone generating a B format output. The location of the Soundfield microphone is typically at the location in the room where the direct and reverberant sounds are equal. B-Format ambience by convolution –This is a digital technique where a room ambience is measured in B format and then is impressed upon another signal. B format ambient information can also be obtained with multiple reverberation devices. At least 3 engines are needed to generate B format reverberance, one for each axis (Chen 5)
Techniques for Monitoring B+ Format Ideal decoding with 16 channels Decoding of the B+ format is ideally presented as 16 channels. This is arranged as two rings of 8 speakers with 45 degrees of separation between them. From stereo, it is known that the maximum separation between speakers without a hole in the middle is 60 degrees Thus by using an upper and lower ring of eight speakers vertical/height information can be presented. The listener usually presents the L/R stereo on the front speakers without height information.
In this case the listener should be able to adjust the balance of dry versus ambient sound. For this Chen has developed decoders for 16, 12, 10 channels of 3D surround or without height as 8, 6, or 5 channels (Chen 6) Decoding in the present control room The listener can use the present control room to work in B+ format. In this case the decoding is done with 60-degree speaker spacing with two rings of 6 speakers. The front up and down speakers will be combined and the present control room speakers will be used for the sum of the front up and down signals.
After this listener can use his present console to obtain the L/R stereo mix and use a pair of auxiliary sends to sent to the B format ambience generator (Chen 6) Software for the B+ format Dr Thomas Chen has developed exciting and powerful software for the Creamware Pulsar system, providing multiple, configurable B-format and B+ format decoding, mixing and signal manipulation subsystems. He intends to make this software available commercially, either through Creamware or direct, in the near future.

Order your essay today and save 20% with the discount code: OFFNOW

Order a unique copy of this paper

550 words
We'll send you the first draft for approval by September 11, 2018 at 10:52 AM
Total price:
$26
Top Academic Writers Ready to Help
with Your Research Proposal
Live Chat+1(978) 822-0999EmailWhatsApp

Order your essay today and save 20% with the discount code OFFNOW