Prev Next

Oops. Yesterday I proclaimed that the 26 letters of the English language were a type of code restricted to a finite number of possibilities based on 226and this is simply wrong as many of you pointed out to me. Thanks for all the feedback, I wrote the post late at night and my attention was probably focused more on the waiting beer than the post. Sorry. In fact, the number I gave you would be true only for a binary scheme of a fixed length word. In the case of language each of the bits has 26 possibilities times the number of letters allowed in any one word - if there was such a restriction. But that's ok because it will lead us into our subject, PCM or Pulse Code Modulation. The challenge I brought up yesterday was how to translate a complex flowing event like speech or music into a binary representation. There are a number of ways to do this, PCM and DSD being but only two of the many schemes possible. We'll start out by focusing on PCM and then move on to the very different architecture of DSD. What's the easiest way to solve a complex problem? Break it down into smaller discrete parts; and that is exactly what we do when we encode analog to PCM. Let's turn to an example of how this works and why it is needed. If you were to sit in one spot and observe a fast moving train with hundreds of different cars, each packed full of passengers, and you wanted to count the number of people in each car it'd be a near impossible task. But if you took a snapshot picture of each car and later went through each picture you could get an accurate count. In fact, if you took enough pictures in rapid succession and then played them back at the speed you shot them, you'd have what appears to be a moving train - this is how a movie camera works - and this is what we do with PCM audio. Remember that our audio signal has already been encoded once. This event happened when we converted sound pressure into electrical signals with a microphone. We now have a continuous flow of changing AC voltage that is directly related to changes in sound pressure from the musical source we are recording. Our digital encoder takes snapshots of this changing voltage and gives us a single fixed voltage measurement. It is then a fairly simple matter to assign a numeric value that represents this voltage level with a binary code. Each time you take a snapshot of the voltage and assign a measurement number representing that voltage, you create what is known as a word. String all the words together and you have a digital data stream. Remember back to our moving train analogy? What happens if the passengers in each car move from one car to another between the time you take each picture? In that case your final passenger count will be incorrect because you are taking too few snapshots and events are happening between each picture. You can fix this by taking more photos and capturing these changes as quickly as they are happening. We have the same problem in audio and we define the speed that we take these snapshots, which we call samples, as the Sample Rate. So when you see someone referring to a sample rate of 44.1kHz that means they are taking 44,100 snapshots of the voltage every second. Theoretically anything moving at no more than half that speed (20kHz) can be sampled and recorded. Anything moving quicker will be lost. Tomorrow bit depth.
Back to blog
Paul McGowan

Founder & CEO

Never miss a post


Related Posts

1 of 2