Mark Waldrep, aka Dr. AIX, has been producing and engineering music for over 40 years. He learned electronics as a teenager from his HAM radio father while learning to play the guitar. Mark received the first doctorate in music composition from UCLA in 1986 for a "binaural" electronic music composition. Other advanced degrees include an MS in computer science, an MFA/MA in music, BM in music and a BA in art. As an engineer and producer, Mark has worked on projects for the Rolling Stones, 311, Tool, KISS, Blink 182, Blues Traveler, Britney Spears, the San Francisco Symphony, The Dover Quartet, Willie Nelson, Paul Williams, The Allman Brothers, Bad Company and many more. Dr. Waldrep has been an innovator when it comes to multimedia and music. He created the first enhanced CDs in the 90s, the first DVD-Videos released in the U.S., the first web-connected DVD, the first DVD-Audio title, the first music Blu-ray disc and the first 3D Music Album. Additionally, he launched the first High Definition Music Download site in 2007 called iTrax.com. A frequency speaker at audio events, author of numerous articles, Dr. Waldrep is currently writing a book on the production and reproduction of high-end music called, "High-End Audio: A Practical Guide to Production and Playback". The book should be completed in the fall of 2013.

18 thoughts on “Time Aligning Waveforms

  • Mark,

    Clearly the person who was objecting to your recordings being “phase-compromised” (not a term I’ve heard before either in my 40+ years in pro audio) has very little if any experience in recording. Your method of aligning arrival times should be just fine, better than most, so long as the correct attacks are used, and I’m sure they are. Any time there are multiple mics in a sound field there will be multiple arrivals, and in some cases, those are very useful in creating a believable recording. Believable, but…ahem…phase compromised. Sheesh.

    We could also talk about the fact that above a certain mid-band transitional frequency human hearing localizes sources based on the arrival time and intensity differential portions of the HRTF because the wavelengths of the frequencies involved are far shorter than the interaural distance, so phase would be ambiguous, even in real life with just two ears. So our own human hearing is at some point “phase-compromised”. Your goal is to capture that phase-compromised audio with all possible accuracy. Seems that goal has been met.

    However, I do think the experiment was interesting to a point. I’m not sure it proved much because of the lack of control, and the “gaming” issues, and lack of a universal and properly done ABX comparator.

    I do think the one result that surfaced above all others was the fact that the “AVS” in AVS Forum stands for “Arguing Very Strongly”. A fact that has been tested and confirmed countless times before, and has just been again.

    • I was quite surprised by this assessment as well. I capture the signals coming in from a bunch of carefully places microphones and move ahead. I don’t change the phase of anything. The AVS Forum loves to argue and post…I’m looking forward to being on Scott’s show in September to talk about the “tests”.

  • Dave Griffin

    How do the musicians stay in time with each other if they are an appreciable distance apart and have no local monitoring (ie IEMs, Headphones etc). I notice on some of your recordings there can be quite a large ensemble who can be spaced out quite a bit (to avoid microphone bleed I assume) yet appear to have no local monotoring.

    • I don’t deliberately space the musicians far apart…the space and the ensemble sort of manage that themselves. I don’t try to place the rhythm section close to each other for the best musical results…getting the groove is critically important. The rest of the musicians just listen. I have on some occasions used IEMs.

  • Vince Stone

    Consider the difficulty of recording an artist live (or maybe in the studio). If we are to get overly concerned with phase alignment, what are we to do about pitch as a saxophone great moves his wailin’ sax about? Should we correct for the Doppler effect? Does it add or subtract to the performance? Is it even noticeable? Correct just for the lead or for all acoustic instruments moving about relative to their mike?

    “Sometimes better is the enemy of good enough.”

  • Seems like a red herring argument from the emailer for this case. Time alignment needs to be the same between differently sampled versions of a song if one is trying to compare them so that differences in timing don’t give away which song is which in a blind test, but that’s about it as far as I understand time alignment in this situation.

    But in general, “phase” has somewhat baffled me, so I have some sympathy for the emailer in not understanding something. In my simple way I thought “phase” was another way to refer to timing differences between sinusoidal signals (for example, in AC power calculations), but when Ethan Winer and JJ Johnston stated that our ears are relatively insensitive to phase differences, I frankly never spent the time to pay attention to what they were stating and understand it. There is also the aspect that speaker manufacturers sometimes refer to as “phase coherence”, but rightly or wrongly I thought that just meant ensuring that the combo of the speaker enclosure and crossover shouldn’t create timing problems reproducing the different frequency components of an instrument. That stated, given the bouncing around of sounds in a room and frequently not sitting exactly between my speakers and still hearing things OK (some losses to stereo imaging, but no muddiness to the music), the problem of a lack of perfect “phase coherence” and “phase” in general are things not high on my worry radar.

    So for me and perhaps others, understanding “phase” could be mostly laziness (definitely for me anyway), but it could also be that “phase” is a term used a bit differently by audio industry players in different situations or perhaps the same way but with different assumptions that are not clearly stated. Whatever the case, if the music sounds really good, and yours always does, I stop worrying about how you’re doing it.

    • The two versions of the songs…A and B…are the same stereo mix with only the sample rate and word length changed. The timing “compromise” is indeed a red herring.

  • If I properly understand the description of your technique, it’s almost equivalent and result to lining up the impulses from each channel, in that The time for the beginning of a note to hit the microphone will be about the same for each instrument. Is this the correct interpretation so far?

    If so, the phase relationships of those signals will be different than for a listener at the conductor or audience position. The natural setting\would have a delay between, for example, the first violins and the woodwinds, with an even longer delay for the brass and the percussion section. If you do not correct for these delays, the time, and therefore phase, relationships between instruments will be different than those experienced by a listener in the hall.

    The corrections would, of course, be different for your various mixes. Maybe you are making similar adjustments, but would describe them differently.

    I only ever bought one AIX DVD-Audio — a couple of Mozart symphonies, I believe. The soundstage seemed rather flat to me, which would be consistent with altering the timing/phase relationships, as I just described.

    I’m almost exclusively a stereo listener, so soundstage depth is important to me. The phase relationships would be both less important, and more similar to the tracks you are laying down, with the middle-of-the-action perspective you seem to favor.

    Better preserved timing and phase relationships is one of the theoretical benefits of higher sample rates, so that may be what the commentor had in mind. Still, he should get his facts straight about your process. And, ultimately, you’re recording, mixing, and mastering techniques are a matter of artistic expression. To that extent, the way you want them is the way they should be.

    In any case, it’s great that the test generated so much interest.

    • Andrea…I think your description of my recording technique is reasonable enough. If you’d had a chance to view some of the videos of my work, you would see the microphones placed all around the ensemble…usually in stereo pairs and close to the instruments. This minimizes that effect of timing or phase problems but it does mean that the actual experience of the recorded sound is different that what a conductor would hear. I make no adjustments for distance as alleged by the author of the email.

      It’s interesting that you have the Mozart recording and find it flat. A review of the Bach Brandenburg from the same sessions was lauded for being the “first time I’ve heard great depth into the sections” because of the multiple stereo mikes that I employed. Having not altered the timing/phase relationships makes this possible.

    • When I say it sounded flat, I mean that the various sections of the orchestra sounded as though they were roughly the same distance away from me, rather than one being behind the other. I see how the enhanced detail of your close-miking approach could be described as “hearing great depth into the sections” — one could, for instance, presumably hear more of what the violas were doing than on a recording using only a few microphones further back. Here, I’m using depth in the context of spatial perception, which may be different from what that reviewer intended.

  • Andrea

    As some of the other commenters have pointed out, the extent to which phase relationships are audible is another one of those open issues. Most research indicates it’s not terribly important. That doesn’t mean it’s not worth thinking about.

    • Phase is another area of audio and hearing that could use some additional research. I agree.

  • Human hearings sensitivity to arrival phase quickly diminishes once the phase differential is enough that determining lead/lag becomes ambiguous. That differential is a function of frequency and arrival angle relative to directly in front. However, as phase becomes less a factor in determining source location, arrival time itself, determined by differential between attack arrival, takes over, and the difference in the frequency response part of the HRTF is dominant already at that point.

    It seems pretty clear that once any arrival time anomalies caused by physical mic spacing have been corrected for (and the need for that would be somewhat subjective), the job’s been done.

    The idea that something is “phase-compromised” is simply the result of someone’s lack of understanding of phase and human hearing.

    • Very nicely put. Thanks.

  • The “phase-compromised” guy could also be a binaural guy thinking that binaural is the only pure way to capture natural phase. Ignoring, of course, the big problems in binaural, in particular the fact that HRFT if unique to the individual, and can only be generalized for everyone, thus making every binaural recording a “compromise”.

    • My gut reaction is that the guy was looking for an excuse not to participate out of fear that he wouldn’t be able to tell the CD vs. HD files apart. I’m just guess however.

  • andrea (it)

    hello mr.Mark

    I’m quite happy because I was able to “solve” the test straight away on the first run which means:
    i’ve good ears 🙂
    and a good system altough a very modest one (but I live happy with it)
    For those who say there’ s no noticeable difference if I may a couple of suggestions:
    – volume pretty high (not too much)
    – first 30 seconds are enough, if you can’t hear the difference at the start then you wont be able either later
    – dont concentrate on the notes, instead close your eyes and let the music comes to you
    once you reach this state of mind the difference become more obvious
    Of course if you’re not carefully listening (say you’re browsing the net) then it is almost impossible
    because the notes are the same and the overall impact is really close but..
    the hd version have
    on the street more energy
    mosaic shakers more natural and “alive”
    just my imagination is just more musical, vinil kind of way
    the cd version at first do seems the be the same but its a colder presentation
    for me HD makes a lot of sense!
    sorry for my poor english (italy..) and thanks again to mr. Mark for the effortless contribution to this amazing hobby keep going!
    best regards

    my system
    laptop with tons of tweaks applied for maximum music enjoyment
    HRT II+ dac
    old pioneer SA-710 with some tweaks applied
    twisted OCF cables and no brand signal ones
    blacknoise filters and a dedicated power line

    • Thanks for the observations…and positive feedback.


Leave a Reply

Your email address will not be published. Required fields are marked *