Dr. AIX's POSTS — 07 February 2015


I’ve made a large number of audio industry friends over the past few years of attending trade show. Many of these people are in the consumer sales and marketing areas of their companies…large and small. They aren’t audio engineers and they don’t have the background to be conversant on the merits of high-resolution audio from a technical point of view. I got an email today from one of these individuals. He wrote:

“It’s been interesting following what’s been going on around high-resolution audio lately, more specifically the Pogue article.

In the Pogue article he says ‘The songs you buy from Pono, on the other hand, go as high as 24 bit/192kHz. That means more bits of data per instant of sound, and more (smaller) instants per time period: higher resolution. It’s like having more color data and more pixels per inch in a photo.’ And then in one of your blog posts you mention that analogy that PCM digital audio is similar to digital imagery is simply not the case.

My question is what simple analogy would you use to explain hi-res audio to the everyday consumer who doesn’t know a thing about hi-res audio?

I’m just wondering how to break it down, as simply as possible.

Let me know your thoughts.”

I got thinking about how to arm a consumer sales and marketing person with simple analogies and information in order to empower their sales pitch. When I’m standing at my tables at the various trade shows that I attend, I have to deliver a quick and simple explanation as well. I know that I’m overly technical and usually say too much but today’s email got me thinking about the best way to give the elevator pitch about high-resolution audio.

First, we have to lose the analogy to digital images. I’ve read this from David Pogue, Brent Butterworth, and a dozen other online reviewers. Digital audio resolution bears no relationship to digital imagery. Increasing the number of pixels and the bit depth of each pixel does result in a “higher-resolution” digital image. But increasing the number of samples and the length of digital words in a PCM system doesn’t work the same way. It’s tempting to make this analogy but while one makes better pictures, the audio system produces lower noise levels (more dynamic range) and wider frequency response (or better response within the audio band). I’ve written about this a number of times and even produced some illustrations that help make the point.

If the digital image analogy doesn’t work, then what would do the trick? I usually start my pitch by avoiding numbers and specifications. I simply point out that throughout the history of recorded sound, recording technology has never been able to match human hearing…until now. High-resolution PCM digital audio at 96 kHz/24-bits accomplishes that for the first time. Lacquer 78s, analog tape, vinyl LPs, cassettes, and even CDs have failed to match our ears. The march of technology and innovation has proceeded from the time of the first Edison cylinders to the latest high-resolution PCM digital recorders and playback equipment. But today…in 2015 (and for the past 15 or so years)…musicians, producers, and engineers have equipment that has the POTENTIAL to eclipse any recording system that came before. That’s what high-resolution audio can offer and that’s why it is such a monumental move forward…for those who reach for the rarefied air of uncompromised quality, it’s available.

Unfortunately, that group is exceedingly small. There are a handful of labels that strive for the very best. Among them are 2L, Naxos, Linn, Chesky, Pentatone, Channel Classics, and of course, AIX Records. High-resolution audio is not for everyone. Most people won’t experience any difference between a standard definition vinyl LP or CD because they don’t have the equipment to produce fidelity that’s better than compact discs and they aren’t playing recordings that have audible fidelity better than what they’ve already heard (that’s where David Pogue and the Meyer/Moran guys failed). And that’s perfectly fine.

The best way to impress anyone about the merits of high-resolution audio is to play them an example of a REAL high-resolution file. It works everytime.

Forward this post to a friend and help us spread the word about HD-Audio Forward this post to a friend and help us spread the word about HD-Audio


About Author


Mark Waldrep, aka Dr. AIX, has been producing and engineering music for over 40 years. He learned electronics as a teenager from his HAM radio father while learning to play the guitar. Mark received the first doctorate in music composition from UCLA in 1986 for a "binaural" electronic music composition. Other advanced degrees include an MS in computer science, an MFA/MA in music, BM in music and a BA in art. As an engineer and producer, Mark has worked on projects for the Rolling Stones, 311, Tool, KISS, Blink 182, Blues Traveler, Britney Spears, the San Francisco Symphony, The Dover Quartet, Willie Nelson, Paul Williams, The Allman Brothers, Bad Company and many more. Dr. Waldrep has been an innovator when it comes to multimedia and music. He created the first enhanced CDs in the 90s, the first DVD-Videos released in the U.S., the first web-connected DVD, the first DVD-Audio title, the first music Blu-ray disc and the first 3D Music Album. Additionally, he launched the first High Definition Music Download site in 2007 called iTrax.com. A frequency speaker at audio events, author of numerous articles, Dr. Waldrep is currently writing a book on the production and reproduction of high-end music called, "High-End Audio: A Practical Guide to Production and Playback". The book should be completed in the fall of 2013.

(30) Readers Comments

  1. Re the issue of a lower resolution file being delivered in a high resolution bucket, the analogy I like to use is: If you feed a high definition TV with a standard definition signal, what you see is still standard definition. In this case, the video analogy seem useful.

    Keep up the good work!

  2. Having read today,s post, I do think that we can use a digital image analogy.
    Let’s say that one takes an incredibly detailed photograph with a camera having a medium format sensor– more light gathering surface area and greater dynamic range as compared to a small point and shoot camera. There is a much greater signal to noise ratio from the image of the large sensor camera.
    Now if either photo is printed to 4×6″, both pictures may look similar. Perhaps this may be analogous to playing a hi res music file on a car radio. CD quality or compressed MP3 files may not sound appreciably different on the car radio.
    On the other hand, let’s say we want to look at that hi res photo printed to a size of 40×60″. The hi res file may look great printed at 240 dpi but the smaller sensor image will not show the same detail. Even if printed at 300 dpi or 480 dpi, the smaller sensor image will not get better because the data isn’t in the file. ( Unfortunately the dynamic range of the printed image will not approach the dynamic range of a modern photo sensor.). If the audio file is played back in a high quality system with a good DAC, the hi res file will clearly sound better than the low res file even if the low res file is reproduced at a higher sampling rate.
    Hence the overused axiom: Garbage in: Garbage out !

    • I’m not comfortable equating “resolution” in photographs of any kind with audio resolution. The descriptors may work in both arenas but the technology is distinctly different.

  3. While I agree that the digital image analogy is often misused, it’s incorrect to say that “Digital audio resolution bears no relationship to digital imagery.”” In fact, they are very much related.

    Consider a single line of grayscale pixels. The number of bits gives the number of discrete values into which the quantity of light falling on that pixel can be divided. There is a certain amount of light beyond which the system records no increase – clipping –, and a quantity of light below which the pixel will register zero. When the dynamic range is increased in an imaging system, a greater variation in light intensity can be captured between the very bright and the very dark. The number of pixels in that line can be considered equivalent to sample rate. The only difference is that they are dividing a spatial dimension rather than a temporal dimension.

    If you want to look for frequency content in an image, you do the same FFT – just this time in two dimensions, rather than one. Where using imaging is an analogy runs into problems is that, in our everyday experience of looking at images, we are rarely interested in the frequency content. Instead, we are often most concerned with the abrupt changes in an image. Another problem is that, until relatively recently, digital imaging was so crude that it was easy for most people to see the improvement with finer resolution. Digital cameras have now reached a level of performance that it’s difficult for most consumers to discriminate, and we may be seeing the same thing with Ultra HD TVs. That puts the imaging and video industry into much the same position as the audio industry. That is, there are measurable differences in both the hardware and the content, but only a fraction of consumers will notice, much less care, about the difference – they are too busy snapping pictures with their cell phones and watching YouTube videos on the 5 inch display.

    • The difference of a spatial dimension and temporal are distinctly different…that’s my point. Time is not the same thing as static pixel information.

      • Actually, a spatial dimension and a temporal dimension are very much analogous from the standpoint of signal theory. I could take a one second sample of digital audio at 16 bits and 44.1 kHz sampling frequency, and represent it as a line 44,100 pixels long with the brightness of each pixel corresponding to the amplitude at that sample. Design of the circuits also has some striking similarities. It’s just the type of signals that we are trying to capture in each medium, and the difference in how the two senses – auditory and visual – interpret the results that leads to misuse of the analogy.

  4. Dr. AIX,

    Though not an exact analog digital audio does bear a close relationship to digital images. Again it’s that word provenance. If an image of 512 pixels is then transferred to an image field of 4096 pixels no additional resolution has been gained, much in the same way you’ve indicated that an old analog recording being released in 192/24 digital gains no additional resolution. There is post processing that can be done to both the low res image and the low res sound that can improve the apparent resolution.

    • Agreed…but there are ways to make them seem like the same type of improvements etc. But in the end time is lacking in photos.

  5. I think the photo analogyis valid if you put it like this: a 50megapixel photo using the best digital SLR has the potential to capture more total resolution and dynamic range with lower noise of a LIVE SCENE, than a piece of 35mm print film. A 96/24 digital recorder has the potential of capturing more resolution, dynamic range with lower noise of LIVE MUSIC, than analog tape. If you take a 50Mp picture of a print made from 35mm film, you have no more resolution or dynamic range, and no less noise, thn the print had. Likewise with sampling analog audio with higher resolution digital recording media. Any average person can pull out an old print and take a picture of it with their iPhone and satisfy themselves that the iPhone picture looks no better, even at 8Mp.

    • Based on today’s responses, I can see that a lot of you like the comfort of images, pixels, and audio resolution. There are ways to talk about digital (and even analog) photography in ways that may clarify things to casual consumers about audio resolution. But the basic idea about sample rates and pixel density are unique to each format.

  6. For me, the simple explanation would be to fall back on your provenance criterion. If the provenance was hi-res, then delivering it in hi-res captures the full hi-res content. If the provenance was not hi-res or if the audio is compressed, then upscaling it to “hi-res” does not provide the full hi-res content of the performance, but merely adds a lot of zero’s (more bits) and packing the rubbish closely together (higher frequency).

    You can use a similar picture analogy. If the scene is captured in hi-res (provenance), the hi-res reproduction will provide the full hi-res content. If the scene is captured in low-res, (or if the data is compressed) and the file is then upscaled by interpolating the pixels, and just boosting the color depth artificially, then it is not a hi-res picture as claimed.

  7. Unless people accept the Shannon-Nyquist theorem as a dogma, no need to understand the maths underlying it, they will never be able to intuitively understand what these samples per second are supposed to do. That’s just it. People should accept that (sounds of) frequencies lower than half of the samplng frequency are represented fully and theoretically accurately. The word length thing is even more difficult to understand intuitively but here people’s general exposure to digital stuff may help eventually. People accept the basics of the theories of relativity as such, maybe they will accept this as well in time. I have repeatedly written here that the choice of the term ‘resolution’ by the music / sound industry has been most unfortunate and I suspect has been made on purpose. It is the industry that has been intentionally misleading its customers. They could properly educate their salesmen and marketing people but they choose not to because the incorrect analogy to digital imaging is convenient for them.

  8. “Increasing the number of pixels and the bit depth of each pixel does result in a “higher-resolution” digital image.”

    First of all, the bit rate must be full {that is: not reduced}. The analogy between the video & audio is certainly in sampling: increase in frames per second exactly reflects oversampling. In other words, the more samples/frames is in a time unit, the more natural & ‘3D’ becomes the audio/video. Cause: impulse response.

    Word length in audio means nothing special. The sole requirement is that it must be >1-bit, i.e. 2 bits is sufficient for delivering the whole {or whatever is needed} dynamic range {should not be confused with SNR} provided the sampling frequency is high enough. Any kind of compression/limiting {as with CD} will be unnecessary since it is well-known a signal drowned in noise can be completely recovered.

    Now you see that sampling rate is the highly important parameter in audio because the still bigger figures will import: 1} less & less quantization noise across the audio band; 2} higher location of aliasing {anti-alias filtering is detrimental to sound quality}; 3} fuller a square wave.

    Hence, ’tis arch-important to create a non-anti-alias-filtering 2-bit THz-region {probably ECL-based} audio analog-to-digital converter with subtractive dither & 9th {at very least} order noise shaper + many times oversampling {to further improve the impulse response}.

  9. In digital photography/video highter pixel counts and bit depth = NEW information. Each pixel represents something unique that may or may not of necessity be related/correlated to its neighbor. We can easily/instantly see more pixels and more color.
    In digital audio, bit depth/sample rate are conveying information that is of necessity “related/correlated” the “NEW” information is not about filling in “missing dots” in the musical event but of increasing the size of the box that contains it (e.g. height=dynamic range, width=frequency response).

    • Very nicely put.

    • Increasing bit depth in an image gives a greater number of available colours over the chosen colour gamut (the gradations, or steps, between the colours get smaller and the gamut is better represented). Increasing bit depth in audio increases the number of steps (but the steps stay the same size, so the signal-to-noise ratio increases).

      Increasing the number of pixels in an image increases the resolution up to the point where we can no longer tell the difference (we’re there now with “retina” displays and 4K Video). Increasing sampling frequency in audio increases the frequency response. Sampling frequencies in excess of 48 kHz give upper frequency responses in excess of 24 kHz – plenty of headroom at the production stage, but arguable whether necessary at the end product stage since no one can “hear” in excess of 20kHz. Similarly 24 bits give plenty of headroom at the production stage, but few home systems can actually aspire to a 24 bit S/N.

      It’s better to concentrate on producing recordings that can do full justice to 16 bit 44 kHz rather than placing recordings with limited frequency and dynamic range into a larger container.

      • You’re right on David…good explanation. Let’s leave digital images out of the comparison.

      • VHS is still a livelier picture comparing to your 4K Video just because it has better impulse response .

        24fps in video & 384 kHz in audio — these are not enough in terms of impulse response thereby causing ‘flatness’ .

        • Jay…VHS???

          • while the analogy doesn’t exactly explain the how of why the audio sounds better with more samples, it is only an analogy which makes a similar point. wrangle around with the language if they want, this is the analogy that makes it clearer for the less technically oriented like myself.

            good call on your part.

  10. Went over to Linn site, talk about more confusion. Explanations, many correlating to in-house componentry, have little bearing to the masses. Anyone care to explain HDCD ? Linn really has nothing quantitative to say. As long as there are sales, provenance hasn’t a prayer.

    • HDCD is akin to MQA done by Pacific Microsonics many years ago on CDs. I’ll have to look up my post on it and provide the link….but it’s using the low order bits to get additional usable dynamic range.

  11. Well for one thing, I believe you will find most people don’t understand the high resolution in digital pictures either. So you haven’t made anything easier for them with picture analogies. Most just equate more pexels with better.

    More bit depth (16 vs 24) is less noise. Pretty simple and most can get that.

    Higher sample rate (96 vs 44) is more bandwidth or wider frequency response. Most get that, enough response to cover all that can be heard with a bit of margin to boot.

    Pretty simple and even accurate.

  12. “There are a handful of labels that strive for the very best. Among them are 2L, Naxos, Linn, Chesky, Pentatone, Channel Classics, and of course, AIX Records.”

    I’d add the recordings on the British Chandos label to the honor roll – they still produce SACD’s but most of their masters are 24/96, and they’ve have done a great job making their catalog available (even some in multi-channel) as 24/96 downloads (where you avoid the additional conversion to DSD, with the additional noise, for the sake of the SACD). Also, the BIS label from Sweden has been recording for the last couple of years in 24/96, with 24/96 downloads available (no multi-channel downloads yet however).

    • Thanks for the additions…I’m not in favor of some of these labels transcoding to DSD back and forth.

  13. I believe another large part of the confusion when comparing photography and audio involves “Aliasing”. In Digital imaging it is impossible to get rid of aliasing – but it does get less noticeable with increased “resolution”. Vynlephiles like to focus on the “smoothness” of analog compared with digital. It seems their primary focus is on the reconstruction of an analog wave form from an “incomplete” digital picture of it (Aliasing)…
    I don’t think they realize that aliasing truly isn’t a significant problem in a properly recorded/played modern audio system (with CD resolution or better).

  14. Here is where I think your being a technical professor where precision is important is getting in the way of your being able to get the hot polloi to understand the point you are trying to make. When dealing with the non technical public you must KISS, or you lose them; absolute accuracy or completeness is not required, or even desired. Explanations must be kept short and closely related to what the public already believes that it understands.

    I think that Provenance is something the public can understand. Tell them simply that the all of the fidelity of the master tapes and master digital files of the older music that they love will be maintained through the high resolution process, but they won’t be any higher resolution than the original master files; period. Only music files recorded and processed using the complete high resolution process will be the high resolution files that they purchase.

    • What’s simpler than saying that high-resolution audio provides recording and playback as good as your ears?

      • Because you haven’t given them anything higher to aim for. The earbud public already believes that MP3 gives them reproductions as good as their ears. The ideal, but totally impractical solution would be to run them all through your studio as a comparison and to recalibrate their knowledge of what’s possible. About the best that can be practically done is to give them a straightforward but not too technical explanation of the primary expected improvements, then encourage them to seek out sources of this latest technology in use. They can’t deal with the whole truth all at once; it’s almost as if they have to be tricked into seeking more knowledge.

        • This is why I do some many trade show and demos. You can’t know if you don’t hear it.

Leave a Reply to Alan Cancel reply

Your email address will not be published. Required fields are marked *

14 + 2 =