Update 2003-01-09: I have written a brief executive summary of the skew issues in the ICSI Meeting data. You can read it here.
In March 2001 I was looking at timing delays between the same voice being picked up by different channels in the meeting recordings. Some of the numbers I was getting made no sense; I was getting negative delays to distant microphones, and some of the delays suggested path differences far larger than the room used for the recordings. Reluctantly I realized that our recording software had introduced systematic delays between the recorded channels. These delays are consistent with a fixed skew introduced when the individual channels are initialized at the start of a recording session, and should remain absolutely fixed throughout the session. However, they are likely to vary slightly between different sessions. They appear quantized at multiples of 2.64 ms (42.667 samples at 16 kHz).
Consider the following plot:
The lower half shows waveforms from four channels. Chan A (A is for Adam) is active, the others are picking it up as crosstalk. Now, chans 0 and 3 are showing delayed versions of chan A, which you'd expect. Except that the delay in chan 0 is around 800 samples, or 50 ms at 16 kHz sampling, or a straight-line delay of 17 meters, which is a bit difficult in room 6A.
But channel F is even stranger. It's a PZM, so the background noise pickup makes it a little hard to see, but perhaps you can believe that it's showing an advance of around 300 samples.
The top half shows the (normalized) cross-correlations between all the channels and channel A (I didn't have the audio for chans 4,5,6,7 or E, so they are blank). Chan A shows a peak at lag = 0, of course. But the overall trend is to have peaks approximately at lags -5ms*(channel number - 10). Now this will include some genuine path delays, but I think they should be under 15-20ms tops.
These are correlations against channel A, but I think it doesn't matter - I'd get similar pictures regardless of active channel.
So, I have a theory. Basically, we appear to have a fixed delay applied to each channel that is highly correlated with the channel number. And when the audio card drivers are opened, the channels are opened one at a time in a loop, starting at channel 0 and going up from there. I bet it takes about 5 ms to open each channel. I thought the channels were all being synchronized later on in the opening code, but it looks like one way or another, they're not.
So I'm afraid our 'sample synchronous' recordings probably all have an indeterminate fixed skew between channels (even between the PDA L and R). And it's probably slightly different every time we do a recording. However, it doesn't drift within a recording, so actually given the uncertainty over where the microphones are anyway, I'm not sure it's such a huge loss.
But it does mean that you can't assume that timing skews between channels will be bounded by a few hundred samples. Channel 0 may be up to 1100-1200 samples delayed relative to chan F.
Caveat: I've only looked at small portions of this one recording, but I can't think why this one should be special.
Why didn't we notice this before? It's hard to do, since of course we expect the signal to be different in each channel, and indeed to see timing skews between channels. And it's only now that we've looked into wholesale fine-scale timing comparisons between channels. However, it should have shown up in my Feb 2000 preliminary experiments with cross-correlation of the PDA channels. I just looked at that signal again, and it really is synchronized to within a fraction of a millisecond. So maybe the delays are quantized, which would make sense in view of the double-buffering scheme that must exist within the soundcard (the dropout problems we used to get resulted in timing shifts that were multiples of 128 samples at the 48 kHz native sampling rate, or 2.67 ms quantum).
I hope I'm not crying wolf. The more I think about it, the harder I find it to believe that this hasn't come up before. But the cross correlations I've just done seem pretty unambiguous.
This report was originally distributed as an email message, with the following header:
Date: Thu, 15 Mar 2001 19:53:17 -0500 To: email@example.com cc: firstname.lastname@example.org X-Phys-location: at IDIAP, Martigny, Switzerland From: "Dan Ellis"
Subject: [mtgrcdr] Uh-oh: strange channel time skews
Back to DAn's Meeting Recorder index - ICSI Meeting Recorder homepage - DAn's homepage - ICSI Realization group homepage