In Part I of my conversation with Professor Edgar Choueiri, he laid out the basis of how we perceive a three-dimensional soundscape, and what the cues were that our ear/brain systems use to conjure up a 3D image. Let’s continue …
RM. So we are obviously receiving those cues from our loudspeakers in our listening rooms, because they are fully captured in a binaural recording, yet we only perceive a vague illusion of a sonic image. So why is it that these cues are insufficient to regenerate the original 3D audio sound field under normal stereo listening?
EC. These differential cues, the ITD, ILD, spectral cues and reverberant ratios, these are all fully captured in a binaural recording. But because both of the stereo speakers are radiating into the room, both of our ears receive sounds emitted from both of the speakers, whereas what we need is for the sound from the left speaker to be heard by our left ear only, and the sound from the right speaker to be heard by our right ear only.
In effect, the system suffers from crosstalk. Try this experiment. Place your speakers quite close together and angle them in towards your head. Now get a mattress and stand it vertically between the two speakers so that it buts up against your face. This will serve to eliminate a lot of the crosstalk, so that when you play a binaural recording the left ear will hear only the left speaker and the right ear will hear only the right speaker. With this peculiar setup you will hear a remarkably clear and precise 3D image. And, unlike with headphones, you can rotate your head and you won’t lose that image. Furthermore, this system will pass my proposed test, as we can reposition either of the speakers without affecting the image! There are actually a small number of enthusiasts around the world who fully understand this problem, and who have constructed listening rooms with a barrier! They sit there with a barrier down the middle so they can enjoy true 3D imaging.
RM. That conjures up quite a mental image!
EC. So the critical question is, can we do this crosstalk cancellation without having to erect a barrier? It is important to understand that this is a well-established challenge, and that research on crosstalk cancellation has been going on since as early as 1961. Initially it was done using all-analog circuitry, and some interesting results were obtained. More recently, digital audio has come along, and we have been able to construct cancellation filters in the digital domain, but even that has been going on for a long time before I got involved!
Imagine we record someone snapping their finger very close to your left ear. We record that on a binaural head, and play it back through a pair of loudspeakers. The sound comes almost exclusively out of the left speaker, but still the right ear manages to hear it, although it will be something like 4-5dB quieter than what the left ear hears. This is crosstalk, and we have designed a special filter to try to eliminate it.
How it works is that it first sends the sound of the finger snap out of the left speaker, and then after a slight delay, sends a negative image of the same sound out of the right speaker, but attenuated by 4-5dB and timed to reach your right ear at the exact same instant as the original finger snap sound from the left speaker. And because it is a negative image, it causes them both to cancel out. But that cancellation snap, having done its job at your right ear, will go on to be heard after another slight delay by your left ear, and that will upset the 3D image. So to deal with it, a third cancellation pulse, attenuated by a further 4-5dB, has to be sent out from the left speaker. This continues, back and forth, until the correction signal is no longer audible, and the net result is that the original sound of the finger snap was heard only by the left ear and not by the right ear. If it is done properly the whole process lasts no more that 300μs, and is quite seamless, and the ear/brain is fooled into hearing the original 3D sound field.
RM. That sounds pretty incredible. Does it actually work?
EC. There are two problems with the crosstalk cancellation system I just described. Number one, with just a slight movement of the head to the left or right all bets are off, because the delays will all be calculated wrongly. So you would need to recalculate the filter for that new position. But this is just a technological problem – once the technology is in place to detect the head motion and to switch from one filter to another in real time, we would just incorporate those capabilities into the system. And today, that technology is in place – a laptop computer can comfortably handle it. So problem number one is effectively solved.
Problem number two is more subtle, and a lot of people didn’t understand it very well, but it really is a major obstacle. These “perfect” cancellation filters, incorporating real-time head-tracking and multiple filters, they do an exceptional job of recreating a perfectly stable 3D sound field and behave very, very well from a crosstalk cancellation point of view. But the sound quality is awful! It suffers from dreadful tonal distortion – in other words the frequency response is extremely bad and cannot be simply corrected. No audiophile in their right mind will pay real money to listen to a perfect 3D image of a piano that sounds like a xylophone!
RM. They certainly wouldn’t! But are you saying those errors shouldn’t be there?
EC. This puzzled a lot of people, because on paper these crosstalk-correction systems should have a flat frequency response … the mathematics is quite simple really. But in practice they had spikes as high as 34dB, which not only sounded unacceptably bad, but had the capability of driving the associated electronics into clipping, and maybe blowing a drive unit. But the explanation also turned out to be quite simple. If you design the perfect filter, it provides perfect performance at the one point in space where it is asked to do that. But if you move even slightly out of position everything goes to hell. Including the frequency response. And this is what we were seeing. But it took many years to actually understand what was happening.
RM. But you eventually solved it.
EC. Yes. And this, finally, is where we come into the story. We developed a way to fix the frequency response problem by deciding to pay a price in terms of the amount of crosstalk-cancellation that we would deem acceptable, and that’s a lot more difficult and more complicated than it sounds. In summary, a perfect crosstalk cancellation filter will provide close to infinite attenuation, but in practical terms we don’t need that. Something like 20dB turns out to be more than enough, so by reducing the crosstalk cancellation requirement from infinity down to 25dB we found we could go from a frequency response with 34dB peaks to one that was flat – and I mean ruler flat. In effect we traded in a degree of crosstalk performance we didn’t need for a degree of tonal performance that we did. And that right there was our invention! It’s called the BACCH filter, and it was patented and trademarked by the University. In fact it has now become third most lucrative patent in the history of Princeton University if you can believe that!
RM. It sounds a lot more appealing than an upright mattress down the middle of the listening room … but is it up to the demands of hard-core high-end audiophiles? In other words, how well does it work with the high-quality real-world recordings that we like to listen to?
EC. Our accomplishment is that we’ve made crosstalk cancellation tonally transparent, so you can take any album you have and listen to it in 3D without tonal distortion. And here’s the kicker – I said “any album you have” – I didn’t say “any binaural album you have”. I think I’ve made it clear now why a binaural album should work. But I haven’t suggested any reasons why any stereo album should work, and the answer is actually also very clear.
Any properly recorded stereo album has ITD and ILD cues embedded in the recording, and it is these cues that present the normal stereo image. If you record using Spaced Omnis, for example, these will capture a strong ITD signal, and will capture, for example, the reverb of the recording environment. ORTF recordings use cardioid mikes, which tend to emphasize the ILD cues, since they are typically too close together for a strong ITD signal. So most acoustically recorded recordings will produce an extraordinarily impressive and satisfying 3D spatial image. So you will hear a very strong 3D image, and not the normal stereo image locked to the speakers. It just won’t necessarily be spatially accurate. [For that you’d need a binaural recording, recorded with a dummy head with your own personal HRTF … and then you would be able to recreate the exact original 3D acoustical image – RM.]
The question remains as to how pop music, or studio-generated music, will work. These are often assembled from individual tracks recorded with mono microphones, or generated by electronic instruments. But a good recording engineer is trying to construct a good stereo image using panning, reverb, and so on, and these techniques effectively add the very cues which allow a well-defined 3D image to develop outside of the speakers and in a consistent 3D space. That image can still be very satisfying, but it is not real. But neither is the conventional stereo image – that isn’t real either, it’s just the construct of the recording engineer. By the way, as a user of the BACCH system, you can just bypass the crosstalk cancellation at the touch of a button and listen in normal stereo whenever you want, but we don’t know any customers who prefer to do that.
RM. What about headphones?
EC. We have developed a new patented technology, called BACCH-HP, that emulates BACCH-filtered speakers through headphones. The result is that you would hear a fully head-externalized 3D sound field from your headphones that is virtually indistinguishable from what you would hear if your speakers were playing. It’s all done in software, apart from a camera for head tracking. Essentially we use the headphones to emulate the loudspeakers. It works so well because headphones are so much more accurate than loudspeakers. We can simulate the most expensive loudspeakers in the world over a $100 set of headphones, and you won’t be able to tell the difference. [That is possibly the most remarkable claim I have ever heard made in the history of high-end audio – RM]
RM. Who knew such amazing things were happening in the field of audio research! Can we expect the world of high-end audio to be turned upside-down by an onslaught of new developments?
EC. I record orchestras for fun, and I’ve been doing that as a hobby since high school and college. I’ve been recording my university orchestra for many years. And a year and a half ago I was invited to Berlin to record my favourite orchestra, the Berlin Philharmonic [invitations to record don’t come any more prestigious than that – RM], and for that I developed a special 3D mixer. So now you can navigate your way through the 3D sound space. For instance, if you want to listen to the timpani you can “walk over” and position yourself next to them!
A lot of the breakthroughs that are happening right now in audio research, especially over the last five years, waaaaay overwhelm all that happened in the previous 20 or more years. And a lot of the young PhD candidates and researchers doing this don’t give a damn about tubes or cables. They are dealing with much tougher problems, but only a few of these problems have direct relevance to the high-end audio field. I’m really an outlier here.
These efforts are all driven by AR/VR research (Augmented Reality / Virtual Reality), where the challenges are not only greater, the requirements are much tougher. For example, in AR one of the present challenges is to have the voice of a virtually added person in a real room sound as realistic as the sound of a real person in the same room (which requires the listener’s HRTF, on-the-fly modeling of the room’s acoustics, and more). Another example we’re working on is a system for cars where the driver and the passengers are simultaneously listening to different music of their choice, played through the same set of speakers! Think about that …
RM. I will indeed try to think about that, and all the other things you have described. But I have to tell you, my head is spinning … and I just hope your head-tracking algorithms will be able to keep up with it! Thank you so much for taking the time to talk with me, and for sharing both your insights and your remarkable developments with Copper’s readers.