BACCH 3D Audio

A New Era in Music Reproduction

Speaker as a transducer of music reproduction has come a long way in its design. The current speaker design, be it cone drivers, panel or even omi all suffer from the lack of reality. Yes, some are better than the others, but if I blindfold you and drop you off in an unknown place and time, listening to some music, you can tell for sure that you are not listening to the real thing! This is the holy grail of all audiophiles and it is yet to be solved. What is missing, what can we do about it?

When we talk about speakers performance, we look at the frequency response, the waterfall, the transient response etc. But then again, those speakers that sounds good may not be good on paper. And those that looks bad on paper may not sound bad at all! Does that me what we measured does not really contributes good sound? Then there are also a whole lot of psychoacoustics issues and room acoustics that comes into play as well.

As a scientist, I think all experience should stem from solid foundation. If the basis is not correct, we have nothing to fall upon and we are just chasing after wind. So, if you were to design a dream speaker, what quality would you like to achieve so that it stands a better chance to sound “real”? Well, I think I need to get it time and phase coherent, with a flat frequency response. So, I have to eliminate group delay.

If you follow my writings on Acourate software, you would understand that this program is set out to give you a coherent and time aligned speakers. By using FIR filters, it can eliminate the problem of group delay as well. This is also the fundamental difference why I choose active instead of passive crossover system. Passive crossover is always minimal phase and create group delay. It is true that we can compensate the time delay to a certain extent by placing the tweeter and woofers in a different plane. But if you look at the time delay that is required, you know that the so called time coherent speakers with a slanting face plate can never compensate for these time differences! But Acourate can, the linear phase crossover generated by Acourate also corrects phase distortion intrinsically inherent in the crossover design as well. This is the basis of my Acourate system, and what I have achieved in the past 10 years.

The same problem goes with speaker cabinet design. With a ported design, the port resonance contributes to the bass which is always delayed. For the same token, transmission line design produces even stronger base but at the cost of much much delayed. I preferred sealed box design. While many will criticize that they sound very dry, but then again, this is more accurate. I do not like slow bass.

What do I get using this kind of system? Excellent transient response and very large sound stage with pin point imaging. I can point to each instrument playing in an orchestra clearly. Interaural phase and transient time of arrival cues are a huge part of direction sensing and make an enormous difference to the perception of space. Having a time and phase coherent set up can give you pinpoint localization. But then, the sound stage is still limited to the position of the speakers. What I also find lacking is the image depth and image height. Don’t get me wrong, I do get image depth and height, but pretty much like looking at a 3D drawing on a piece of paper. Not that kind of 3D, 3D if you know what I mean! I was quite happy, but compared to the real thing I heard in concert hall, there is still a big difference!

There are already many articles on the internet about speaker placement. There are also many articles about room acoustics and the use of various amount of diffusers and absorbers to get the best room reverb and R60 curve. So why bother writing more about these?

The reason is that all these articles do not make sense to me. In fact, it is the underlying principle of speaker placement that confuses me. I am a scientific person and everything has to make sense for me to buy into it. As an audiophile, I assume everyone is looking for the true reproduction of what was recorded in your listening room. This is the meaning of high Fidelity or HiFi, right? So, logic tells me that if what is in the source (be it CD/LP/SACD…..) can be heard directly without interference from the room boundary, it will stand the best chance to reveal what is recorded, pure, with high fidelity. This follows that the best place to listen to music is an open field (best without the floor) or in an anechoic chamber.

Everything is very logical so far, but I am sure 100% of the audiophiles will argue with me, telling me off that a speaker placing in an anechoic chamber sounds awful!!! Yes, indeed, sound awful, very dry, loss of dynamics, tonic balance becomes very strange too.

So now, the logic and the real life experience does not match. What should I do? 99.999% of audiophile will start moving their speakers and adding diffuser and absorber so that the reflections from the room can create what seems to be the real life reproduction of music. They are manipulating the listening room and the interaction of the room and speakers to recreate the soundstage, the ambience, the tonal balance of what is recorded in the CD.

This is the reason why you need to move your speakers to create a more life sound. By moving closer to the wall, closer together, toeing in/out, all affect the reflection and room interaction and therefore affect the ambience, the presentation and soundstage. Yes, this sounds better than the anechoic chamber but to me, this is artificial recreation of sound based on room reflections. Since different music are recorded in different location with different reverbs and room size. You need to move your speakers for every single CD to make it sound more like the original location or you can only get an average outcomes.

Unfortunately, I belong to the other 0.001% of the audiophile who do not believe in moving speaker as the ultimate solution in real life experience of music reproduction. but want makes speakers placing in anechoic chamber sound awful? I am lucky I am living in this day and age, and internet certainly provides plenty of information. Do you know that the current stereo reproduction of music is FLAWED????

So what is the missing link, what can I do to make it real? To understand more, I tried to understand more about how speakers give us a illusion of image in space. Now, if you read my web page, you wound notice that the ITD and ILD are important elements in sound localization. The problem with current stereo play back is that some of the sound that is supposed to reach your right ear only, now reaches your left ear at almost the same time. So, the cues that is meant for your right ear, gets heard with your left ear as well! This corruption of stereo information is called crosstalk.

Perhaps, it is much better explained by a video

What I have noticed is that the more accurate and coherent your speaker system is, the more “accurate” the crosstalk is presented at you as they are now perfectly aligned with no smearing effect. If you have group delay or left right imbalance, the amount of crosstalk is less apparent as they will not occur at the same time. That is why in my system, while it has good image width, lacks in image depth. My search went on and I found ambiophonics!

Ambiophonics is a technique of crosstalk cancellation. The use a technique called RACE, which aims to provide a high level of XTC. It is good, but there is a lot of spectral coloration that results in tonal changes. Worse still, you have to place the speakers in a traditional stereo dipole configuration for it to work. But who wants a pair of speaker standing in the your way in front of the TV? The end results is also very distorted with a lot of “phasiness” on music reproduction. You may still get away with with orchestral piece of music, but sound terrible with small group and studio recordings. While ambiophonics is a giant step in the right direction, it is not there yet.

What about the room? All audiophiles know that in order to get good sound, you have to strike a balance between absorption and diffusion. It does not and will not sound good in an anechoic chamber. But if you really think about it, we also want true fidelity of what’s recorded in your CD or SACD etc! Why would you want thee influence by your room? A speaker placed in the middle of an open field or an anechoic chamber should sound best! What is recorded go to your ears directly, nothing more, nothing less. Isn’t that contradictory? This is the biggest mystery I have been trying to find an answer for years and I think I found an answer. Many forum members think I am crazy, but then again, only crazy people get the true answer! And I do hold a few patents too and is responsible for a few medical breakthrough!

Thanks to this very web page, a complete stranger contacted me and told me he has invited Prof Choueiri to come over to Hong Kong to setup his system. He asked me if I know about him and his work. Of course! I have been following ambiophonics for awhile and I even have Prof’s initial crosstalk cancellation files that I downloaded from the ambiophonics site before!

So, what is the difference between Prof’s technique (BACCH) and RACE? It is best explained in his own words “The main impediment to the wide adoption of XTC-enabled BAL has been the huge spectral coloration that XTC filters inherently impose on the sound emitted by the loudspeakers. The fundamental nature of this spectral coloration, its basic features, its dependencies, and optimal methods to abate it with minimal adverse effects on XTC performance, are discussed in detail in this technical paper, which describes most basic aspects of BACCH filters (some aspects are not published for propriety reasons). BACCH® Filters are optimized crosstalk cancellation filters that allow 3D audio reproduction over loudspeakers. They yield maximum crosstalk cancellation level without introducing any spectral coloration to the input signal. A detailed discussion of BACCH filters can be found in this technical paper.

Here are some technical explanation as well and an excerpt is reposted here (

Sound coloration caused by a XTC filter consists of peaks in the frequency spectrum, typically exceeding 30 dBs even in the sweet spot. The idealized crosstalk cancellation should be infinite. It requires that sound pressure at each ear may only be received by its respective source.

For the creation of formulas that would allow optimal crosstalk cancellation it was assumed an idealized model of sound propagation containing no reflections or diffractions related to the listener’s head and ears.

Assuming two punctual sources, their configuration is illustrated below.

For a better understanding of the theoretical construction of BACCH filter’s description, we explain the elements given above:

DL e DR: audio signal

PL e PR: pressure at the ears, left and right

l1 e l2: trajectory length between each source and ipsilateral and contralateral ears

VL e VR: sources’ vectors, given by: v = [VL(iω), Vr(iω)].

H: filter’s matrix

C: system’s transfer matrix

R: performance matrix

First, it is necessary to determine the mathematical formulation of the transformation matrix. This assumption refers to the transformation of the signal through the filter H, considering the variables of v, that travel from the source to the ears, with pressure p.

In which, R is the performance matrix.

Next, a series of matrices are defined and these allow the evaluation of the spectral colouration added by the XTC filters – amplitude, frequency spectrum and the system’s frequency response. Below is the matrix that compares different types of XTC filters, χ:

Esi = amplitude spectrum (to factor α) of a signal, coming from a loudspeaker, reaching the ipsilateral ear.

Si = lateral image, related to the incoming signal

|| = ipsilateral ear, related to the incoming signal

There are eight metrics that make up the matrix presented above:

Esi||, Esiχ, Eci, Ssi||, Ssiχ, Sci, S, χ

Each one represent functions of frequency, and through them the spectral colouration and XTC performance are evaluated and compared.

Esiχ: lateral image frequency response at the contralateral ear.

Eci: frequency response of the system at each year, being the same audio signal divided equally between the two inputs. “ci” refers to the central image.

S: source’s frequency response

Correction of the artifacts generated by XTC

Once the crosstalk cancellation is acknowledged, corrections in time domain, frequency domain (constant parameters regularization) are performed.

It was demonstrated that, considering the two source model in free field XTC systems, based in the HRTF, there is a reduction of peaks mentioned above, but, on the other hand, produces a roll-off of the bass content and also generates high frequency artifacts at the filter’s systems response. It was also conclude that the constant regularization acts in discrete and distant frequencies in the spectrum.

The ideal optimization is achieved through based frequency regularization, which requires the audio spectrum to be divided in a band frequency hierarchy. Through mathematical calculations, three groups of frequency bands, in which one of them will be the full optimized filter. The analytical deduction of this solution relies in the most typical listening scenarios. From the three frequency bands, two are regularized and one is not (perfect filter). Above 6 kHz XTC filters are considered unnecessary, being the frequency cut [10].

Filter application

The kind of strategy conceived depends on it’s end.

The first approach is about the maximum levels of colouration to tolerate (dB). It takes in count the reproduction restrictions: room reflections, loudspeakers features and their distance to the listener. In what concerns to audiophiles, it is estimated that this level should not exceed 5 dB. If it is home theatre system, a higher level of colouration is tolerated, for there’s a higher level of headroom due to the reproduction surround special effects with two loudspeakers.

Angular distance between the loudspeakers

This value, although indicated at 60º, is not the only possibility and can be a variable in the filter’s design: once that tc (time that takes to a sound wave to travel l) varies with the loudspeakers’ span, the limits of the frequency bands can be modify, altering this configuration: thus, pointing a fixed value for ϴ, ϴ*, the maximum limit frequency of the second band frequency (considering the hierarchy system), it is possible to coincide with a cut off frequency, from which the XTC filter is no longer necessary.

Already demonstrated by Kirkeby, the stereo dipole configuration, with 10º of loudspeaker span, proves to be ideal for it allows higher resistance to head movements, and thus enlarging the sweet spot. This is easily shown once the trajectory Δl in a minor span is comparatively small to the head and torso movements.

I was invited to have an audition to his system and within 1 hour, I made the most expensive investment in my audio hobby ever. I am a DIYer and I do not like to spend a lot of money on something that I can build myself and achieve 90% performance with less than 10% price. But this has no alternative.

The measurement involves the use of an omi microphone placed at the listening position to record a sweep. This will give the overall room acoustic for Prof to analyze the room, probably so some kind of equalization in the end. Then you will be seating at the listerning position with the in-ear microphone. Sweep recording will than be done, first with the left channel and than with the right channel. Each time the sweep is played, a stereo recording was done. The information is than sent to Prof for his analysis and processing. After all the maths was done, Prof will send back to filters, one with XTC for the full range, the other with XTC for the 94Hz and above only. Prof believes that it is better to use a 2.1 or 2.2 system as sound reproduction below 94Hz is omi anyway with little information about localization.

My listerning impression kind of agree to this. With the full range filter, I get more layering of bass, but then the drum beat is a bit diffused and less solid. With XTC94, the deep drum beats are more solid, but the layering is loss. In theory, the XTCfull range should be more “correct” but since it is so hard to get rid of the side wall reflections in the lowest octave of frequency range, the XTC will be heavily affected and results in diffused and less solid sound. I suspect that if you have an anechoic chamber, XTCfull range should be the way to go. The other thing I have noticed is that with XTC94, the bass is actually louder. Of course, because there is no XTC in the bass region. Since my sealed speaker runs off at about 50Hz, i have asked Prof to kindly provide me a XTC50 filter which is the best compromised. The increase in gain in this region just compensates the run off of my frequency response curve of my speaker!

The following is the spectrogram for a song with full range XTC showing the L&R channels of the song.

The following is the spectrogram for the same song with XTC94 showing the L&R channels of the same song.

The spectrogram of the song is the same except the sub 100 Hz region. You can compare the bass response in the sub 100Hz region easily, about 6-7db difference!

The sound coming out my system now is the best I have ever heard, the huge soundstage, way beyond the speaker and the depth of the image is unreal!! The tonal balance is much much batter than the traditional stereo setup. Listening to binaural recordings, this setup is as close to the real world as you would possible get!

If you have a friend having this setup, pay him, bribe him to give you a chance to listen to his setup, you will be hooked! What about the room? Yes, you need your room to be devoid of any reflections to achieve the maximum XTC. It is best listening in an anechoic chamber. Congratulations to myself, the last piece of puzzle is now solved, the anechoic chamber is indeed, the best place to listen to music! This is where you can get started

PS According to Prof, there are 5 levels of music reproduction realism

Level 1:

* The recording is made binaurally with microphones in the ears of the same person who will be doing the listening.

* The listening is done in an anechoic chamber with a pair of speakers (having good phase coherence and able to approximate an impulse response) using a c-BACCH filter designed using the same speakers and the same microphones in the ears of the same listener.

* Comments: The resulting reproduction is essentially indistinguishable from being at the recording where the listener was sitting. I have just done such a recording (using my own head) of a 3D audio piece played through 98 speakers at the New York Armory: The results are simply uncanny.

Level 2:

* The recording is made binaurally with microphones in the ears of adummy. The rest is as in Level 1.

* Comments: This is what you experienced in my laboratory when I played for you binaural recordings. The price you pay (compared to Level 1) is in the accuracy of where the sound sources are reproduced. Since the recorded ILD and ITD cues are interpreted by the listener’s

ear/brain system differently from the case where these cues were recorded using his head, the locations of the sound sources will be slightly off in the reproduction but the reproduction of the 3D space and reverb of that space will be practically as good as in Level 1.

The deader the listening room (an anechoic room is ideal) the higher is the crosstalk cancellation (XTC) level that can be achieved and the more accurate the reproduction of the 3D space. You are among the very few who have experienced this (although you were listening with a BACCH filter that I designed for those speakers using a dummy head as I did not get the chance to use your head to design the BACCH filter -therefore you were listening at a sub-level of Level 2). As you heard in my lab, the reverb of a church, for instance, is completely realistic and surrounds you from all directions despite the fact that (actually, because) you were in an anechoic room.

Level 3:

* The recording is made non-binaurally with standard microphone techniques (e.g. ORTF, Blumlein, coincident, spaced omni, etc.,) and the listening is done in an anechoic chamber as in

Level 2.

* Comments: The result is very much like Level 2, except with less accuracy to the reproduction of the placement of the sound sources. However, the 3D space reproduction can be as good as Level 2, especially if the recording mic technique is done correctly and without too many spot mics (a pair of spaced omnis is the best technique for getting the reverb of the hall coded well, Blumlein is also very good).

Level 4:

Same as Level 2 or Level 3 but the listening is done in a real listening room where reflections will act to decrease the level of achievable XTC level (which depends on both the amount of early (up to 20ms after the arrival of the direct sound from the speakers to the ears) reflections and the directivity of speakers. For highly directive speakers (e.g. ESL 57, Janszen eletrostat hybrid, Geldee horns, in that order) and/or equivalently well-treated rooms, you should be able to get an average XTC levels at or above 8 dB, which gives you a far more 3D reproduction than any stereo system without BACCH. Unless the XTC level is above 15 dB, you won’t be getting full proximity effects (sounds reproduced at the head of the listener, such as a haircut) but that is not needed for practically all recorded music since even symphonic music does not contain ILD cues above 8 dB (unless the conductor decides to come whisper something in your ear!).

* Comments: Level 3 is the level your bespoke BACCH customers will be experiencing.

Level 5:

Same as Level 4 but the listening is done without a BACCH filter. There the XTC levels are too low (even for speakers that are widely separated) to reproduce the ILD and ITD cues even close to correctly. The 3D cues are severely corrupted by the crosstalk. You cannot rely anymore on reproducing the reverb of the hall that is in the recording and the only way to avoid having a dry and lifeless reproduction is to liven up the listening room and rely on the listening rooms on reflections to make the reproduction less dry and harsh. This is the level at which audiophiles are without BACCH.

Toole Floyd’s book and its recommendations are written for audiophiles living in Level 5 and below. They are very good recommendations but that is the pre-BACCH era.

When you play, for instance, the binaural recording Dancing flute and Drum that I recorded binaurally with Chesky, even at Level 4, there is no mistake that you have a realistic reproduction of the ambiance in that church even at Level 4, which is what you have in your listening room now. Listening to the same recording at Level 2 (as you did in the anechoic room of my lab) proves that, unlike in the pre-BACCH era in which Toole’s book was written, the deader the room the more accurate the 3D reproduction is. Essentially the speakers disappear

and you are in the reverberant acoustic environment where the recording was made. At Level 2 in my lab, I can get a cathedral reverb in my anechoic chamber that lasts (T60) for as long as 6 seconds. You are then in an incredibly reverberant space even though in reality you are sitting in an anechoic room!

It is difficult for some people to imagine listening at Levels 1 to 3, as, unlike you, they have not had the chance to experience it. Hopefully, more people can experience Level 4 as we spread custom-made BACCH 3D Sound technology.

Now that I need to vacate my equiptment room out for the maids to sleep, I thought I might as well take this opportunity to do a bit of acoustic treatment. First the front wall with fibre glass panels wrapped in glad wrap. The curtain is also changed to “Quiet curtain”. Today, I have just turned on my system after 3 months and the results are stunning!!!!! I get much more focused bass and the sound stage extension is now very pronounced in the left side with curtain. The openess of the right side is still causing some issues. I think acoutic treatment can improve the soundstage even further but you really need directive speakers to achieve proximity effects.

p.s. There are of course sub-levels to each of the above 5 levels. For instance, when listeners in your listening room listen with a BACCH filter done with a head (such as yours) other than theirs, they are listening at a sub-level of Level 4, which can be improved if they listen to a BACCH filter designed with their own head