6: “What do you want?”
6: “You won’t get it.”
2: “By hook or by crook, we will.”
6: “Who are you?”
2: “The new number 2. You are number 6.”
6: “I am not a number. I am a free man!”
2: “Ha ha ha ha ha ha haaaa!…. ”
Suppose you phone me up and dictate a written message to me. I write it down and save it in a text file. Beyond certain obvious (and maybe not-so-obvious) limitations, once I have written it down I have fully and completely captured that message. The resultant text file then fully and completely captures the entirety of your message to me.
Now what about those limitations? Well, these include the following:
- I cannot capture the tone and inflection of your voice.
- What you said may not be what was written.
- What I write may not be what you said.
- You must limit what you say to things you know I can write down.
So, if it is important that the resultant text file accurately conveys your message, then it follows that the process of delivering that message starts with ensuring that the message itself is unambiguously phrased. Those of you who read my column in Copper #63 will understand the above in the context of the famous cold war “Hot Line,” and why it was implemented as a text messaging system and not as a person-to-person voice telephone link.
Anyhow, given that you have taken great pains to phrase your message in clear and unambiguous terms, my text file can now be read over and over, copied to different recipients, and incorporated into further messages, all without any loss or change to the original message. The text of the original message is the “information” contained within it. When it comes to the notion of “information,” digital representations are the easiest way to conceptualize the idea, since there is no nuance to numbers. The information is either there or it isn’t, and mathematics gives you the tools to determine one or the other categorically. [We’re ignoring a third option, which is that the information could be there, but you’d need additional information in order to enable you to find it.]
The obvious take-away is the direct parallel with digital audio. It reflects (i) the ability of a digital audio stream to fully represent the information comprising the original sound, and (ii) the fact that this information has to be accurately and unambiguously transcribed in the first place for this to be the case. The first part is by far the simplest to deal with, as it is now well understood that while 16/44.1 PCM is marginally capable of representing the information contained in acoustic music, formats like 24/192 and DSD are capable of fully and completely representing it. Therefore, the second part represents the bigger challenge – digital audio is, in practical terms, more fundamentally limited by our ability to accurately transcribe it, both in the A-to-D and D-to-A stages.
That being said, this column is about the fact that once we have our audio data encoded in digital form, it is possible to be totally precise as to the impact of any processing that we subject it to. By this, I mean that we can stipulate exactly what that impact will be. We can (with a couple of important qualifiers that I don’t have space to go into here), quantify it in great detail, and in its entirety. This contrasts starkly with the situation in analog space. Once a signal is represented in analog form, we can only describe any signal processing that we might perform in gross terms. And there are aspects to those analogue processes that continue to elude us, not only in terms of measuring or quantifying them, but also in terms of understanding how they operate and what their mechanisms of impact might be… or even, sometimes, whether they exist at all! Well-known examples include the sonic impact of individual components such as capacitors and resistors, or that huge bugaboo, interconnects. This isn’t the case with digital audio data. In many cases – if not in most cases – digital audio allows us to be remarkably precise and specific about the impact of any digital process upon audio data.
This brings us nicely to my key point here, which is that once you capture it accurately in that form, digital audio is arguably in a perfect place, where if we need to perform an operation on the signal, we can (in principle at least) find tools to perform that operation with a guaranteed preservation of information. Virtually any DSP operation can be analyzed in terms of its impact on frequency response and phase response, and we can stipulate what degree of data precision would be needed to implement it losslessly. Sometimes the mathematics of such an analysis would be stupefyingly complex, and not worth the effort given the purpose of the operation (an example might be a dynamic range compressor), but at least it can in principle be performed. In any case, for everyday operations such as filters, DSP can be a perfectible process.
I want to illustrate that with a trivial example. Let me take a 16-bit 44.1kHz audio stream. If I take a DFT (Discrete Fourier Transform) of it, that transform is typically massaged to generate the frequency spectrum of the signal. But at its core, the Fourier Transform is actually a complex mathematical formula that tells you how to reconstruct the exact original audio signal at any point in time. This means we can map out the waveform continuously, anywhere between the sampled values of the 44.1kHz data stream. So, we can use it to calculate, for example, what the data points would have been if the same audio stream had originally been sampled at 88.2kHz. In other words, we can use it to perfectly upsample the signal to 88.2kHz…so let’s do that. My new 88.2kHz data stream comprises my original 44.1kHz data stream, interleaved with a bunch of new data points positioned exactly half way between them.
At this point, from a perspective of information theory, even though I have upsampled my original audio stream from 44.1kHz to 88.2kHz, there is not a jot of additional information that has been added to the picture. I have twice as much data, but it is conveying exactly the same amount of information. Let’s do something to emphasize that.
As I mentioned, my 88.2kHz data stream comprises my original 44.1kHz data stream, precisely interleaved with a bunch of new data points. If I then extracted the original 44.1kHz data points and put them into a file, that file would obviously be a copy of my original 44.1kHz data file. This is blindingly obvious, but I’m stating it anyway. And I can do exactly the same thing with the remaining ‘interleaved’ data points. I could extract those and put them in a file of their own. This would be another 44.1kHz data file, and it would contain the exact same music as its companion, the original 44.1kHz data file, because it was created using the exact mathematical formula for the music.
Just think about that. I now have two files, both of them containing identical information, that information being the music contained in the original 44.1kHz file. But none of the numbers in the two files are the same. None of them. Exact same information. Different numbers. Cool, eh? [Well, strictly speaking, not the exact same information. Each contains the exact same information as the other, plus a time offset corresponding to the interval between consecutive samples at 88.2kHz sample rate. Somebody is bound to point that out.]
Upsampling PCM digital audio can always (in principle, at least) be performed perfectly, although it rarely is (partly because of the processing complexity, and partly because doing so presents some additional difficulties if it needs to be done in real time). This is because upsampling requires no loss of information. All information that is contained within a signal at one sample rate can be faithfully preserved at a higher sample rate. The converse is not true. The extra data space in a higher sample rate signal means that it can contain information that cannot be represented at the lower sample rate. Therefore, when downsampling, this additional information – if present – must be filtered out (and therefore lost) as part of the downsampling process.
Information theory also tells us some important things about what can and cannot be done using DSD. For example, DSD contains huge amounts of ultrasonic noise mixed in with the audio data. For DSD64, this ultrasonic noise starts to rise measurably starting at about 20kHz, and by about 50kHz it totally subsumes any signal that might be present in the clean audio signal. It is interesting that the signal bandwidth that DSD encoding is able to capture is quite colossal, extending up into the hundreds and hundreds of kHz. Unlike PCM, which simply cannot encode any signals above its Nyquist frequency, DSD will faithfully capture them, but then adds in a whole bunch of noise that drowns it out.
Information theory tells us that it is not possible to unconditionally separate signal from noise (otherwise it wouldn’t be noise). However, if the noise exists predominantly in one frequency band, you have the possibility to eliminate it by removing that frequency band with a filter. Your problem is that in doing so you must also remove any of the original audio data that was also present in that frequency band. DSD works so well because, to a first approximation, the noise is all above 20kHz, and the audio signal is all below 20kHz. One of the intriguing aspects of DSD playback is that it leaves the designer with a choice about what type of filter they wish to implement. You can preserve more of the high-bandwidth end of the original audio signal if you prefer, but at the expense of retaining some of the unwanted ultrasonic noise as well. At BitPerfect, for example, we definitely obtain a cleaner sound by following this approach, although many purists argue against it.
Another important point about DSD is regrettably lost – quite irretrievably apparently – on some of its strongest adherents. And there are two aspects to it. The first is that, while conversion from DSD to PCM can be performed with a minimal loss of information (and therefore fidelity), the opposite is not the case, for reasons I don’t have space to go into. Suffice to say that PCM-to-DSD conversions suffer from distortions and other sonic deficiencies which technology has not yet found ways to eliminate. To be quite fair, these are not grave failings – DSD can sound quite magnificent – but when taking the state-of-the-art to its extremes, DSD-to-PCM conversions are far superior to their PCM-to-DSD counterparts, and at their best are (to my ears) flawless.
Where I diverge in opinion for some of DSD’s strongest proponents is that DSD-to-DSD conversions inherently require a three-step process that involves (i) DSD-to-PCM conversion, (ii) PCM resampling, and (iii) PCM-to-DSD conversion. Therefore, if (as is the case with most DSD studios these days) a recording was originally made in DSD256, versions converted to DSD128 or DSD64 can be expected to sound slightly inferior to versions converted directly to 24/352.8 or 24/176.4 PCM, provided that in each case the best possible algorithms were used in the conversions. Which is not what some people want to hear.
If I am sounding a little controversial here, you must bear two things in mind. First, the foregoing is primarily based on technical considerations, rather than exhaustive, thorough, and comprehensive listening tests, although my own personal experiences do tend to bear them out. Second, when playing back high-end audio, whether DSD or PCM, once the audio data goes into your DAC the digital massaging is far from over, and you have no control over (nor, to be honest, much knowledge of) what that massaging entails. Therefore, if you wished to make serious comparisons of the sound quality of, say DSD vs PCM, your choice of DAC is likely to have a dominant impact on the outcome.