Dr. AIX's POSTS — 01 September 2015

By

What is the simplest way to describe to someone the benefits of adding more and more bits to a PCM digital audio system? What if you were challenged to prepare an informational guide for newbies to high-resolution audio? How would you try to get a group of up to speed on the potential benefits of high-resolution audio/music? There are lots of websites, articles, and posts that have attempted this trick but most of them seem to mess up the theory while attempting to explain the central point. One of my favorites is the “pixel” mistake.

You may have already heard this one. It goes something like this. Imagine a photograph taken with a camera that has about 65,000 pixels because the camera that took the image has “16-bits of resolution”. If you switch to a camera that has 24-bit resolution, then the number of pixels will dramatically increase to almost 17,000,000 pixels. The realism and detail of the new “higher resolution” image will be much better than the previous one, right? In the world of digital photography…yes, it will, if all you’re doing is using 16 or 24 to generate a number of pixels. But the analogy fails miserably when it comes to audio resolution.

Talking about pixel count is different than “bit depth”, which is where increasing the number of bits really matters. So the issue isn’t really the number of pixels that are present, the more important aspect is the number of colors that can be individually associated with each pixel. Think of it this way. If I have a field of pixels and each one has a bit depth of 1-bit, then each pixel can only display two different colors or shades of black and white. The original Macs had 600 x 400 pixels with a bit depth of 1-bit…each pixel could be either black or white.

When the number of bits is increased from 1-bit to 8-bits, the number of discrete values (think colors or shades of a grayscale) increases to 256 (2 to the 8th power). Every subsequent increase in bit depth allows each pixel to display a wider array of colors. Confusing pixel count with bit depth doesn’t help understand how increased word length helps audio quality. It might be simple to grasp but it’s just plain wrong. Apples and oranges.

Moving from 16 to 24-bits in audio allows the system to identify more unique amplitude levels. And each one that you add provides an additional 6 dB of potential dynamic range and as a result potentially a lower noise floor. For a recording engineer, the increased number of bits gives your recording system more “headroom”, which means that you can capture a much wider range of volumes. In the old days, we had to be careful not to overmodulate the recording system. If you inadvertently pushed too hard into the “red”, distortion would result. Whoops.

To the uninitiated audio enthusiast, it’s enough to say that moving from 16-bits to 24-bits makes it possible to capture wider dynamic ranges. The added dynamic range is great for classical music and jazz but few, if any, pop, rock, or country music ever exceeds the dynamic range of a compact disc. And let’s be honest with these newbies, they aren’t going to hear the potential improvement because there’s no content that exhibits the increased dynamic range AND their systems can’t reproduce it even if there was.

Forget about pixels when it comes to audio discussions.

Forward this post to a friend and help us spread the word about HD-Audio Forward this post to a friend and help us spread the word about HD-Audio

Share

About Author

Dr. AIX

Mark Waldrep, aka Dr. AIX, has been producing and engineering music for over 40 years. He learned electronics as a teenager from his HAM radio father while learning to play the guitar. Mark received the first doctorate in music composition from UCLA in 1986 for a "binaural" electronic music composition. Other advanced degrees include an MS in computer science, an MFA/MA in music, BM in music and a BA in art. As an engineer and producer, Mark has worked on projects for the Rolling Stones, 311, Tool, KISS, Blink 182, Blues Traveler, Britney Spears, the San Francisco Symphony, The Dover Quartet, Willie Nelson, Paul Williams, The Allman Brothers, Bad Company and many more. Dr. Waldrep has been an innovator when it comes to multimedia and music. He created the first enhanced CDs in the 90s, the first DVD-Videos released in the U.S., the first web-connected DVD, the first DVD-Audio title, the first music Blu-ray disc and the first 3D Music Album. Additionally, he launched the first High Definition Music Download site in 2007 called iTrax.com. A frequency speaker at audio events, author of numerous articles, Dr. Waldrep is currently writing a book on the production and reproduction of high-end music called, "High-End Audio: A Practical Guide to Production and Playback". The book should be completed in the fall of 2013.

(24) Readers Comments

  1. Your later comments in this piece should be reassuring to those of us with large CD collections Mark. Contrary to the hipster trendiness of vinyl, I’m more in love with my CDs than ever. Now that the bugs have been ironed out of digital, I’m still learning how important the rest of the sound chain is to getting CD sounding at its best – a quest I’ve been on since 1985, one way or another.

    The big lesson is that we often blame CD for other weak links. Another is that system synergy is way more important than spending mega bucks too. And when that stuff is right you don’t need to stress over things like cables either.

    For those of us who most value legacy recordings from the 70s backwards, I truly be.ieve that, when done right, the CD is pretty much as good as it gets.

    • Chris, CDs can sound absolutely great…and actually much better than tape or vinyl LPs in absolute terms of fidelity. But people will choose the recordings they like based on their own sonic preferences.

      • Hi Mark, yes of course I agree people will choose on preferences. However I also think a lot of people are tending to believe that CD is a flawed format, which I think we would both agree it absolutely isn’t.

        • CDs are not flawed…in fact, they can be fabulous if done with care.

  2. I think this analogy is quite poor on a number of levels. (sorry, pun)

    – increasing audio bit depth is like using bigger pixels in a camera, not more pixels. You can get more signal into each piece of information (byte).

    – cameras don’t have a DAC process, so you are literally looking at the bytes, whereas digital audio recreates the analog signal and is fully analog when you experience it. There are no steps or chunks or joints in the user experience of digital audio.

    – when you write ‘colors’ you should be writing ‘shades’ or ‘tones’ for the sake of the analogy. Audio has no equivalent of the way colours are compiled and filtered in a digital camera.

    – “Moving from 16 to 24-bits in audio allows the system to identify more unique amplitude levels.” This quote is not part of the camera analogy, nor is it untrue, but you are perpetuating an anti-digital myth with this statement. Remember that the analog waveform post-DAC from a 16 bit waveform is an *exact* replica of the input waveform up to half the sampling frequency, plus noise. Therefore, all the ‘more unique amplitude levels’ in the 24-bit process are all lying below the noise floor of the 16-bit process, once you look at the final analog output. That’s an important qualifier to your statement, especially for playback purposes.

    • Grant…I’m not with you on this. It’s not bigger pixels that you get with more bits, you get more discrete values for each pixel. As for the DAC, you’re right there isn’t a DAC per se, but there is a monitor that converts the digital information into something that can be viewed…and I see those as similar.

      The discrete values can be shades or colors. The point of the article is that using images and photographic resolution doesn’t get us to a better understanding of digital audio resolution.

      The increase from 16 to 24-bits does provide more discrete levels to use in the digitization process. The practical result is a lower noise floor…more dynamic range. They don’t lie below the noise floor…they establish it.

      • Actually, monitor doesn’t convert, it’s done by Video DAC inside a graphics card.

        • OK, then the video monitor is the amplifier and speaker?

      • I’m happy to be wrong, but I think you might be ‘with me on this’ soon, Mark, heh heh.

        16 pixels of a light sensor are not analogous to a 16-bit byte of audio (and 24 pixels a 24-bit byte of audio), because a pixel is not an on-off state, it is an analog bucket that fills with light. The bucket has an electrical noise level fixed by technology, and the max signal is fixed by the size of the opening on the top of the bucket, i.e. the pixel area, where photons pour in. Therefore, when you increase the pixel area (size), you increase the available SNR, which is analogous to more bits in an audio byte.

        I agree that 24 bits provides more discrete levels to use in the *digitization* process, but the point you want to make isn’t about capturing the sound in the studio with 16-bit technology (I mean, who does that anyway?), it is about packaging the media for distribution in 16 bits or 24 bits. The question becomes, what form do the ‘more unique amplitude levels’ take, when we look at the dithered post-DAC analog signal of a 16/96 download vs a 24/96 download? And the answer is that both have the same peak level, both have perfectly smooth analog waves with all the original information below 48 kHz and above -93 dB, but the 24-bit signal has further musical information below -93 dB and all the way down to -141 dB. So that is why I wrote that “all the ‘more unique amplitude levels’ in the 24-bit process are all lying below the noise floor of the 16-bit process”.

  3. What a shame it’s always been that those who make such decisions serve up the biggest bunch of crap to the most popular selling styles of music. Rock and country has supported the music industry (and high end audio) for the last 50 some years yet has always been treated like redheaded step children when it comes to giving them quality.

    • Interesting how fast things changed in a decade.
      Arron Tippin’s – Read Between The Lines, 1992, RCA Nashville, DR 14 DR Max 15 Pretty Darn Good
      Arron Tippin’s – People Like Us, 2000, Lyric Street Records, DR 8 DR Max 9 SAD

  4. Mark wrote: “You may have already heard this one. It goes something like this. Imagine a photograph taken with a camera that has about 65,000 pixels because the camera that took the image has “16-bits of resolution”. If you switch to a camera that has 24-bit resolution, then the number of pixels will dramatically increase to almost 17,000,000 pixels. The realism and detail of the new “higher resolution” image will be much better than the previous one, right? In the world of digital photography…yes, it will, if all you’re doing is using 16 or 24 to generate a number of pixels.”

    There’s confusion here between resolution and colour depth: resolution (the number of pixels) is independent of colour depth; 65000 pixels at 16-bit will still be 65000 pixels at 24-bit, only the number of colours per pixel will change. This is accurately mentioned later on, but the above paragraph would be confusing to the uninitiated.

    • Your point is exactly why I wrote the piece. The person that pitches the pixel analogy confuses the number of pixels with the bit depth behind each pixel.

    • Video resolution is determined by the frame rate, not the number of pixels, which is just zoom/pan. Video frame rate is exactly the same as audio [up/over]sampling rate!

      • I don’t think you’re going to get a lot of agreement on your assertion that video resolution is determined by the frame rate. When the CE companies pitch increased resolution of their new televisions, they brag about moving to 4K not 48 fps.

        • And that’s the whole misguide as the very difference between HDTV and DVD-Video is not in that 1080 is bigger than 576, but it is due to MPEG4 being qualitatively superior to MPEG2. That’s it.

          • At a sufficiently high bitrate, MPEG2 and MPEG4 are indistinguishable. Many early BluRay discs are actually encoded in MPEG2 with bitrates up to 40Mbps (the maximum bitrate regardless of codec). DVDs are limited to about 10Mbps including audio (most commercial DVDs use 6-7Mbps for video), so the number of compressed bits per pixel is roughly the same between the two formats. The improved picture quality of such a BluRay is thus entirely a result of having more pixels. Using a more advanced codec like H.264 (aka MPEG4 part 10) raises the picture quality at a given bitrate (or lowers the bitrate for a given quality), which of course also contributes to the improvement over DVD, though at the bitrates used on BluRay (typically 20-30Mbps), the differences are subtle and generally only visible in particularly challenging scenes.

  5. There are a number of analogies one can make between audio and imaging.

    To begin with, let us compare a monophonic audio recording with a monochrome image. A digital audio recording is made by sampling the continuous analogue waveform at fixed time intervals. Similarly, a digital image is made by sampling the continuous analogue image at fixed space intervals (pixels).

    For each sample, the audio recording stores the air pressure level at the corresponding point in time, while the image samples store the light intensity at the corresponding points in space. In both cases, the number of bits per sample determines the accuracy with which we can record the air pressure and light intensity, respectively.

    Increasing the audio sample rate allows us to record higher-frequency sounds. The image equivalent is to increase the number of pixels per unit length, allowing us to capture higher spatial frequencies (smaller details). Just like an audio ADC requires an anti-aliasing filter, so does a digital camera (and it has the same name). In front of the image sensor, there is a filter which blurs the image ever so slightly. Without it, the captured image would have spatial frequencies higher than half the pixel interval (Nyquist) show up as low-frequency aliases known as moire patterns.

    In audio, we can use a stereo pair to reproduce an illusion of sound emanating from anywhere between the speakers. In imaging, a small number of colours (red, green, blue) provide an illusion of a full spectrum. Here a correspondence can be seen between the terms sound stage and colour gamut.

    On the reproduction end, an audio DAC converts a digital signal back to an analogue electrical waveform, which is then turned into pressure waves by a speaker. A display device, similarly, converts the digital image into electrical levels which in turn produce light of the desired intensity (whether directly or by attenuating a backlight). VGA graphics cards actually have a part called a RAMDAC which converts a digital image to analogue electrical signals one pixel at a time.

    Also in processing, many algorithms are shared across the domains. To resize an image, one uses a resampling filter very similar in design to audio resampling filters. If the bit depth is constrained, both audio and image processing use dithering and noise shaping to produce the best possible approximation.

    This brings us to DSD. This manner of representing audio is in fact a close relative of the halftone printing technique. In the latter, the lightness of an area in the image is determined by the density of tiny ink dots present there. In other words, it is a form of pulse density modulation. A variant of halftone uses differently sized dots, corresponding to pulse width modulation in audio. As a reproduction method, halftone printing enables the printing of pretty good images using cheap equipment. However, nobody in their right mind would suggest such a representation for a primary storage format, let alone try to use it in image editing.

    Similarities can further be found in lossy compression methods. Both in audio and imaging, these are based on the Fourier transform (usually in its cosine transform variant). Heavy JPEG compression introduces artefacts, ringing, especially around sharp edges (transients in audio).

    This is the beauty of digital signal processing. The algorithms see only a progression of numbers with complete disregard to what those numbers represent in the material world.

    Finally, audio/image analogies are not limited to the digital domain. Tape hiss and film grain are in fact very similar in nature, the former the result of magnetic domains in the tape, and the latter arising from crystals of photosensitive chemicals.

    • You make some very good points and thanks for taking the time to lay them out. My primary reason for the post I wrote was to hopefully point out that comparing a fuzzy compressed lo-res fuzzy image to an MP3 files is a bad analogy. And then to say that increasing the number of bits to 24-bits to improve the picture is analogous to improving the “resolution” of an audio recording fails on almost all counts. Bit depth vs. pixel count vs. amplitude levels and audio digitization don’t operate in the same space. I believe there is a better way.

      • As you can both hear & see, timing transient characteristic dominates either audio & video. The number of pixels per unit length in video is equivalent to audio soundstage. And more colours in video is same as more frequencies in audio. Audio resolution depends on sampling only!

    • Mans, as a digital photographer – who started out with film – and digital audio fan who started out with LPs and tape – I see your analogies as more intriguing than anything I’ve run across. Your comparison of SACD encoding to halftone printing is fascinating. And this is also the first time I’ve ever seen the analogy made between the anti-aliasing filter in DSLRs and the low-pass filters needed to comply with the Nyquist limit.

      Mindblowing.

    • Mans – With some of the latest very high resolution cameras, such as the 50.6 MP Canon 5DS R, low-pass filter cancellation is used to tradeoff higher resolution for a low but real risk of moire. With digital audio I doubt that completely bypassing the low-pass filter would ever make sense.

      • Most definitely not.

      • Omitting the anti-aliasing filter in a high-resolution digital camera is feasible because the types of patterns that would cause visible problems (finely spaced lines/grids) are rare in nature. Brick walls would be a problem for a low-resolution camera, but at 50MP that is not the case. Aliasing from irregular patterns, such as vegetation, doesn’t tend to be visually very noticeable.

        In audio, frequency aliasing is much more disturbing, so the acceptable amount is much lower (to the extent such a comparison is meaningful at all). Now the microphone itself acts as a lowpass filter for some frequency determined by various factors including the mass its moving parts, impedance of analogue signal wires, etc. I don’t know what a typical figure might be, but I doubt it’s more than 1MHz. Supposing we could make an infinitely fast 24-bit flash converter, operating it at a frequency twice as high as the upper limit of the microphone, we would not need any additional analogue filters. In practice, we use oversampling low bit-depth sigma-delta converters, which increases the required physical sampling rate accordingly, so if we could sample at, say, 100MHz, there would be no need for explicit lowpass filters. As we’d obviously still need a digital filter to convert the raw samples into useful PCM data, there is nothing to be gained from going to such extremes.

Leave a Reply

Your email address will not be published. Required fields are marked *