Quibbles and Bits
Information
6: “What do you want?”
2: “Information.”
6: “You won’t get it.”
2: “By hook or by crook, we will.”
6: “Who are you?”
2: “The new number 2. You are number 6.”
6: “I am not a number. I am a free man!”
2: “Ha ha ha ha ha ha haaaa!…. ”
Suppose you phone me up and dictate a written message to me. I write it down and save it in a text file. Beyond certain obvious (and maybe not-so-obvious) limitations, once I have written it down I have fully and completely captured that message. The resultant text file then fully and completely captures the entirety of your message to me.
Now what about those limitations? Well, these include the following:
- I cannot capture the tone and inflection of your voice.
- What you said may not be what was written.
- What I write may not be what you said.
- You must limit what you say to things you know I can write down.
So, if it is important that the resultant text file accurately conveys your message, then it follows that the process of delivering that message starts with ensuring that the message itself is unambiguously phrased. Those of you who read my column in Copper #63 will understand the above in the context of the famous cold war “Hot Line,” and why it was implemented as a text messaging system and not as a person-to-person voice telephone link.
Anyhow, given that you have taken great pains to phrase your message in clear and unambiguous terms, my text file can now be read over and over, copied to different recipients, and incorporated into further messages, all without any loss or change to the original message. The text of the original message is the “information” contained within it. When it comes to the notion of “information,” digital representations are the easiest way to conceptualize the idea, since there is no nuance to numbers. The information is either there or it isn’t, and mathematics gives you the tools to determine one or the other categorically. [We’re ignoring a third option, which is that the information could be there, but you’d need additional information in order to enable you to find it.]
The obvious take-away is the direct parallel with digital audio. It reflects (i) the ability of a digital audio stream to fully represent the information comprising the original sound, and (ii) the fact that this information has to be accurately and unambiguously transcribed in the first place for this to be the case. The first part is by far the simplest to deal with, as it is now well understood that while 16/44.1 PCM is marginally capable of representing the information contained in acoustic music, formats like 24/192 and DSD are capable of fully and completely representing it. Therefore, the second part represents the bigger challenge – digital audio is, in practical terms, more fundamentally limited by our ability to accurately transcribe it, both in the A-to-D and D-to-A stages.
That being said, this column is about the fact that once we have our audio data encoded in digital form, it is possible to be totally precise as to the impact of any processing that we subject it to. By this, I mean that we can stipulate exactly what that impact will be. We can (with a couple of important qualifiers that I don’t have space to go into here), quantify it in great detail, and in its entirety. This contrasts starkly with the situation in analog space. Once a signal is represented in analog form, we can only describe any signal processing that we might perform in gross terms. And there are aspects to those analogue processes that continue to elude us, not only in terms of measuring or quantifying them, but also in terms of understanding how they operate and what their mechanisms of impact might be… or even, sometimes, whether they exist at all! Well-known examples include the sonic impact of individual components such as capacitors and resistors, or that huge bugaboo, interconnects. This isn’t the case with digital audio data. In many cases – if not in most cases – digital audio allows us to be remarkably precise and specific about the impact of any digital process upon audio data.
This brings us nicely to my key point here, which is that once you capture it accurately in that form, digital audio is arguably in a perfect place, where if we need to perform an operation on the signal, we can (in principle at least) find tools to perform that operation with a guaranteed preservation of information. Virtually any DSP operation can be analyzed in terms of its impact on frequency response and phase response, and we can stipulate what degree of data precision would be needed to implement it losslessly. Sometimes the mathematics of such an analysis would be stupefyingly complex, and not worth the effort given the purpose of the operation (an example might be a dynamic range compressor), but at least it can in principle be performed. In any case, for everyday operations such as filters, DSP can be a perfectible process.
I want to illustrate that with a trivial example. Let me take a 16-bit 44.1kHz audio stream. If I take a DFT (Discrete Fourier Transform) of it, that transform is typically massaged to generate the frequency spectrum of the signal. But at its core, the Fourier Transform is actually a complex mathematical formula that tells you how to reconstruct the exact original audio signal at any point in time. This means we can map out the waveform continuously, anywhere between the sampled values of the 44.1kHz data stream. So, we can use it to calculate, for example, what the data points would have been if the same audio stream had originally been sampled at 88.2kHz. In other words, we can use it to perfectly upsample the signal to 88.2kHz…so let’s do that. My new 88.2kHz data stream comprises my original 44.1kHz data stream, interleaved with a bunch of new data points positioned exactly half way between them.
At this point, from a perspective of information theory, even though I have upsampled my original audio stream from 44.1kHz to 88.2kHz, there is not a jot of additional information that has been added to the picture. I have twice as much data, but it is conveying exactly the same amount of information. Let’s do something to emphasize that.
As I mentioned, my 88.2kHz data stream comprises my original 44.1kHz data stream, precisely interleaved with a bunch of new data points. If I then extracted the original 44.1kHz data points and put them into a file, that file would obviously be a copy of my original 44.1kHz data file. This is blindingly obvious, but I’m stating it anyway. And I can do exactly the same thing with the remaining ‘interleaved’ data points. I could extract those and put them in a file of their own. This would be another 44.1kHz data file, and it would contain the exact same music as its companion, the original 44.1kHz data file, because it was created using the exact mathematical formula for the music.
Just think about that. I now have two files, both of them containing identical information, that information being the music contained in the original 44.1kHz file. But none of the numbers in the two files are the same. None of them. Exact same information. Different numbers. Cool, eh? [Well, strictly speaking, not the exact same information. Each contains the exact same information as the other, plus a time offset corresponding to the interval between consecutive samples at 88.2kHz sample rate. Somebody is bound to point that out.]
Upsampling PCM digital audio can always (in principle, at least) be performed perfectly, although it rarely is (partly because of the processing complexity, and partly because doing so presents some additional difficulties if it needs to be done in real time). This is because upsampling requires no loss of information. All information that is contained within a signal at one sample rate can be faithfully preserved at a higher sample rate. The converse is not true. The extra data space in a higher sample rate signal means that it can contain information that cannot be represented at the lower sample rate. Therefore, when downsampling, this additional information – if present – must be filtered out (and therefore lost) as part of the downsampling process.
Information theory also tells us some important things about what can and cannot be done using DSD. For example, DSD contains huge amounts of ultrasonic noise mixed in with the audio data. For DSD64, this ultrasonic noise starts to rise measurably starting at about 20kHz, and by about 50kHz it totally subsumes any signal that might be present in the clean audio signal. It is interesting that the signal bandwidth that DSD encoding is able to capture is quite colossal, extending up into the hundreds and hundreds of kHz. Unlike PCM, which simply cannot encode any signals above its Nyquist frequency, DSD will faithfully capture them, but then adds in a whole bunch of noise that drowns it out.
Information theory tells us that it is not possible to unconditionally separate signal from noise (otherwise it wouldn’t be noise). However, if the noise exists predominantly in one frequency band, you have the possibility to eliminate it by removing that frequency band with a filter. Your problem is that in doing so you must also remove any of the original audio data that was also present in that frequency band. DSD works so well because, to a first approximation, the noise is all above 20kHz, and the audio signal is all below 20kHz. One of the intriguing aspects of DSD playback is that it leaves the designer with a choice about what type of filter they wish to implement. You can preserve more of the high-bandwidth end of the original audio signal if you prefer, but at the expense of retaining some of the unwanted ultrasonic noise as well. At BitPerfect, for example, we definitely obtain a cleaner sound by following this approach, although many purists argue against it.
Another important point about DSD is regrettably lost – quite irretrievably apparently – on some of its strongest adherents. And there are two aspects to it. The first is that, while conversion from DSD to PCM can be performed with a minimal loss of information (and therefore fidelity), the opposite is not the case, for reasons I don’t have space to go into. Suffice to say that PCM-to-DSD conversions suffer from distortions and other sonic deficiencies which technology has not yet found ways to eliminate. To be quite fair, these are not grave failings – DSD can sound quite magnificent – but when taking the state-of-the-art to its extremes, DSD-to-PCM conversions are far superior to their PCM-to-DSD counterparts, and at their best are (to my ears) flawless.
Where I diverge in opinion for some of DSD’s strongest proponents is that DSD-to-DSD conversions inherently require a three-step process that involves (i) DSD-to-PCM conversion, (ii) PCM resampling, and (iii) PCM-to-DSD conversion. Therefore, if (as is the case with most DSD studios these days) a recording was originally made in DSD256, versions converted to DSD128 or DSD64 can be expected to sound slightly inferior to versions converted directly to 24/352.8 or 24/176.4 PCM, provided that in each case the best possible algorithms were used in the conversions. Which is not what some people want to hear.
If I am sounding a little controversial here, you must bear two things in mind. First, the foregoing is primarily based on technical considerations, rather than exhaustive, thorough, and comprehensive listening tests, although my own personal experiences do tend to bear them out. Second, when playing back high-end audio, whether DSD or PCM, once the audio data goes into your DAC the digital massaging is far from over, and you have no control over (nor, to be honest, much knowledge of) what that massaging entails. Therefore, if you wished to make serious comparisons of the sound quality of, say DSD vs PCM, your choice of DAC is likely to have a dominant impact on the outcome.
I strongly doubt that digitally sampling music will catch each peak of the original analog and most transient (not sinusoïdal) music signal. The propability that a signal peak will be in coincidence with the sampling point given by the sampling interval/ frequency is near zero. Thus the captured dynamic range must inherently be reduced when recording digitally. Reducing the sampling interval might increase the chance to get exactly the peak, but still far from 100%. I guess one could calculate the probability of getting the peak level only if one defines an interval of amplitude difference from the peak point. This interval is given by the bit depth used (16, 24 or 32 bits). It would be interesting to see how the probability now changes with specific sampling frequencies. Any idea?
This is something we see clearly at BitPerfect when we do DSD-to-PCM conversions. The first stage of the conversion is to run the native 1-bit DSD datastream through an appropriate low-pass filter. The result is a PCM datastream at the original DSD sample rate (2.8MHz for DSD64). If we choose our filter correctly (which we do!) then we can simply decimate this PCM datastream down to the target sample rate by the simple expedient of throwing away the excess sample data (essentially as described in the column above).
As you point out, this means that for each individual signal peak the chances of the retained sample coinciding with the actual signal peak are quite low. But this shouldn’t matter at all, because the value of that peak is implicitly encoded in the remaining data, and will be exactly reconstructed in the DAC (providing the DAC does a theoretically perfect job).
Where it is interesting is when you choose to “normalize” the level of the PCM datastream by re-scaling the data so that the largest individual data point in the entire stream is re-scaled to the maximum encodable value (the 0dB point). If you do this strictly, then the resultant signal will actually be encoded with one or more “implied peaks” above the 0dB level. Some standards explicitly require that this does not happen (Apple Music is one of them). And in fact, some DACs can fail to handle it properly, depending on how they process the incoming digital data. At BitPerfect, for this reason, we normalize not to the actual peak value, but to the “implied peak” value. We call this “analog normalization”.
I just went back to reread your Copper #63 column. Readers who were insufficiently scared should take a look at David Hoffman’s “The Dead Hand: The Untold Story of the Cold War Arms Race and its Dangerous Legacy “. Who knew that the Russians actually did build a Doomsday Machine?
You’re talking about the “Perimeter” system (also referred to as “Dead Hand”)? Many believe that system is still in operation in Putin’s Russia. I’ll look out for that book. But Perimeter was/is a system, rather than a weapon per se.
In 1961 the Russians also built and detonated the largest ever nuclear weapon, the “Tsar Bomba”. Technical details are sketchy, but it seems that the device was designed as a 200Mt bomb, but the thought of it was so scary that they only ever built a 100Mt version … and the one they tested was a 50Mt version. By contrast, the largest bomb ever tested by the USA was the 25Mt “B41”. The principles behind multi-stage nuclear weapons are frighteningly simple, and quite literally present the possibility of generating energy densities so intense that physicists remain uncertain of what the outcome would be.
Question; comparing DSD to 24/192 PCM, is the latter able to capture and deliver a more theoretically “usable” frequency response to 96KHz, free of ultra sonic noise, unlike DSD? If I understand correctly, DSD would have no issue capturing 96 KHz or even much higher but does so increases noise with ascending frequency not present in 24/192 PCM.
I understand that transducers at both playback and recording may not exist to capture or play back 96 KHz, just pondering which system could capture and render the most accurate, extended, and distortion free upper frequency signal.
You are quite correct to observe that 24/192 can encode a noise-free audio signal with a bandwidth all the way out to 96kHz, far outperforming DSD in that regard. However, I would also point out that with DSD128 the ultrasonic noise does not begin to rise until ~40kHz, and with DSD256 it does not rise until ~80kHz. And as I mentioned, DSD recording these days is done more and more using DSD256. And I know some people who are actively checking out DSD512.
By far the biggest difference between the high end PCM and DSD formats is the technical challenge of building a native transducer. Recording natively and directly to 24/192 PCM means sampling the music at a rate of one sample every 5.2µs. A bullet will travel about a quarter of an inch in 5.2µs. Sampling an analog signal to 24-bit accuracy in 5.2µs is essentially impossible. Furthermore, theoretically speaking, Nyquist-Shannon sampling theory requires an instantaneous measurement, and the best you can hope for with an analog sampler is to measure the average signal over the sampling period, which isn’t necessarily the same thing.
In practice, the only real method of recording signals to formats such as 24/192 is to record them with a DSD-like SDM-based circuit, and convert the output on the fly to your desired PCM format. This is how just about every serious ADC transducer operates these days.
Excellent! I appreciate your thorough explanation, even though it is likely above my intellectual pay grade as well! The good news is that my personal high frequency Roloff his only slightly down at 80 K!
Your interview a few Copper issues ago with Dr. Choueiri and his BACCH-SP was quite interesting. Here is a reviewer who was caught quite off guard and very pleasantly surprised by a demo of the system, both on headphones and speakers. I thought you would enjoy reading the relatively brief report from the California Audio Fest: https://parttimeaudiophile.com/2018/11/24/caf-2018-bacch-sp-genesis-advanced-technologies-vpi-and-a-mind-bending-demo/
Paul McGowan seems 100% a DSD evangelist https://youtu.be/AsjIS9BKIfY
I’m very much a DSD fan myself. I have hundreds of DSD albums. My position is that hi-res PCM can be every bit as good as DSD, but rarely rises to those levels.
I am convinced that the main reason is that the PCM format permits almost unlimited digital manipulation possibilities in the studio, whereas DSD allows none whatsoever. So DSD has to be captured right first time, straight from the mic feed, and that places much greater emphasis on the skill and talent of the recording engineer. Furthermore, even the best recording engineers, when faced with a goody-bag such as Pro-Tools just can’t help themselves, and are inevitably drawn into it for just one hit. And then another. And another. It is just the nature of the beast.
The whole thing seems oddly reminiscent of Betamax/VHS. The format with the inferior performance wins again…
With most every article you write, I find myself challenged and nearly always learn something.
It’s nice to read someone who is passionate about music and its reproduction, but with technical acumen and an intellectual curiosity to forge ahead.
That is sometimes lacking in the high end press!
Keep up the good work.
Thanks for your kind comments 🙂
I’m old enough to remember VHS vs Betamax. We all understand it as a victory of marketing over technology but it actually went deeper than that. Sony’s product managers believed that the primary market for VCRs was recording live TV for time shift viewing. That being the case, they focussed their marketing efforts on emphasizing the superior technical performance, which is what the customers would value most for off-air recording. JVC’s product managers believed that distribution of pre-recorded content – primarily movies – would be the killer app, so they focussed on a cassette standard that would support movie-length content (Betamax was limited to 1hr), as well as making a serious effort to line up big-hitting movie studios to release on VHS.
Ultimately, the battle was won primarily because pre-recorded content was decisively the “killer app”, and VHS had that covered hands down. But because Sony had put all their eggs in the technical superiority basket, that is what we end up remembering about Betamax. So their marketing was very successful at getting that message across, but, alas, it was the wrong message.
Fortunately or unfortunately I too am old enough to remember the format or’s, the first of many. I was a manufacturers rep in the mid to high-end industry for many years and I always looked to that as an example of how not to introduce a new format.
Wars