Fixed between-channel timing skews in the ICSI Meeting data
Dan Ellis <dpwe@ee.columbia.edu> 2003-01-09


Summary: A bug in the recording system introduced a fixed timing skew
between the simultaneously-recorded channels in the ICSI meeting
data.  In many cases, this skew amounts to an advance of 64 samples 
(4 ms) between successive channels, which cumulates to a skew often
exceeding 50 ms between low-numbered and high-numbered channels.

We are confident that the skew remains fixed within each recording.
However, it is possible that different recording sessions will have
different skews, since the skew appears to be related to the
initialization of the soundcard.  

Because the skew is confounded by actual acoustic-path-length timing
differences, it is difficult to come up with an automatic method to
measure the skew for all recordings, and thus we are releasing the raw
data as recorded.  We hope to release details of the 'reference'
timing skews for each recording once they can be verified.


Details:  The ICSI Meeting data was recorded with a Sonoros AUDI/O 16
channel digital audio PCI card interface.  The Linux driver we used
for the recordings was a beta release, and although it appeared to
operate quite reliably, we later discovered a major bug:  initializing
the card for multichannel recording introduced a fixed timing skew
between the recorded channels.  This skew is generally too short to be
perceived, but became very significant when we started looking at the
cross-correlation between channels for the purposes of spatial
location estimation.  

As far as we can tell, the problem arises because the channels are
initialized sequentially, and a certain number of samples accumulate
in lower-numbered channels before the higher-numbered channels are
opened.  This appears as a *delay* in the lower-numbered channels
relative to the higher-numbered channels, since the lower-numbered
channel essentially starts recording at an earlier time.  Thus, an
impulse that arrived simultaneously at two microphones on
adjacently-numbered channels would appear at a lower sample index in
the sound file recorded from the higher-numbered channel.  An
illustration of this negative trend between timing and channel number
can be seen in the cross-correlation image on the web page:
http://www.icsi.berkeley.edu/~dpwe/research/mtgrcdr/chanskew.html

We believe that the low-level organization of the card buffers samples
in blocks of 64 (at 48 kHz), so the skews are quantized to this
amount.  Of course, the recorded data has been downsampled by a factor
of 3 to 16 kHz, so the quantization of the skews appears at multiples
of 21.33 samples (1.33 ms) in the final waveforms.  In practice, the
skews appear very often to be exactly 64 samples at 16 kHz (or three
buffers of 64 samples at 48 kHz), i.e. an advance of 4.0 ms per
channel.  However, sometimes the delay between channels 0 and 1, and
less often between channels 1 and 2, can be larger than this e.g. 5.33
ms (four buffers) or larger.  The skew will not change within a single
recording session, but may be different if the soundcard is
reinitialized.  Thus, in general, then the skews in each separate
meeting recording will not be the same.

These numbers come from a number of test recordings made by Thilo Pfau
in September 2001 in which the microphones were arranged to receive
approximately simultaneous wavefronts.  In the actual meeting
recordings, the spatial variety of microphone and sound source
positions introduces acoustic-path-related delays that confound the
fixed skews.  For this reason, it is difficult to measure the precise
skew between a pair of channels in a particular recording.  In
situations where this number has been of concern (e.g. research we are
conducting into recovering the location of each speaker from the
tabletop mic recordings), our approach has been to try adding fixed
multiples of the 1.33 ms quantum to each delay until the inferred
positions become plausible.  This approach is generally unambiguous,
and confirmed the results from the direct measurements that the skews
between channels are usually -4.0 ms.

We had hoped to develop an automatic technique to go through all the
recordings and measure the skews, either to correct them before
release or at least to publish reference figures.  Unfortunately, we
have yet to come up with a satisfactory technique with which to
calculate these numbers.  If and when this analysis is performed, the
results will be made available.  

* end *