Fixed between-channel timing skews in the ICSI Meeting data Dan Ellis 2003-01-09 Summary: A bug in the recording system introduced a fixed timing skew between the simultaneously-recorded channels in the ICSI meeting data. In many cases, this skew amounts to an advance of 64 samples (4 ms) between successive channels, which cumulates to a skew often exceeding 50 ms between low-numbered and high-numbered channels. We are confident that the skew remains fixed within each recording. However, it is possible that different recording sessions will have different skews, since the skew appears to be related to the initialization of the soundcard. Because the skew is confounded by actual acoustic-path-length timing differences, it is difficult to come up with an automatic method to measure the skew for all recordings, and thus we are releasing the raw data as recorded. We hope to release details of the 'reference' timing skews for each recording once they can be verified. Details: The ICSI Meeting data was recorded with a Sonoros AUDI/O 16 channel digital audio PCI card interface. The Linux driver we used for the recordings was a beta release, and although it appeared to operate quite reliably, we later discovered a major bug: initializing the card for multichannel recording introduced a fixed timing skew between the recorded channels. This skew is generally too short to be perceived, but became very significant when we started looking at the cross-correlation between channels for the purposes of spatial location estimation. As far as we can tell, the problem arises because the channels are initialized sequentially, and a certain number of samples accumulate in lower-numbered channels before the higher-numbered channels are opened. This appears as a *delay* in the lower-numbered channels relative to the higher-numbered channels, since the lower-numbered channel essentially starts recording at an earlier time. Thus, an impulse that arrived simultaneously at two microphones on adjacently-numbered channels would appear at a lower sample index in the sound file recorded from the higher-numbered channel. An illustration of this negative trend between timing and channel number can be seen in the cross-correlation image on the web page: http://www.icsi.berkeley.edu/~dpwe/research/mtgrcdr/chanskew.html We believe that the low-level organization of the card buffers samples in blocks of 64 (at 48 kHz), so the skews are quantized to this amount. Of course, the recorded data has been downsampled by a factor of 3 to 16 kHz, so the quantization of the skews appears at multiples of 21.33 samples (1.33 ms) in the final waveforms. In practice, the skews appear very often to be exactly 64 samples at 16 kHz (or three buffers of 64 samples at 48 kHz), i.e. an advance of 4.0 ms per channel. However, sometimes the delay between channels 0 and 1, and less often between channels 1 and 2, can be larger than this e.g. 5.33 ms (four buffers) or larger. The skew will not change within a single recording session, but may be different if the soundcard is reinitialized. Thus, in general, then the skews in each separate meeting recording will not be the same. These numbers come from a number of test recordings made by Thilo Pfau in September 2001 in which the microphones were arranged to receive approximately simultaneous wavefronts. In the actual meeting recordings, the spatial variety of microphone and sound source positions introduces acoustic-path-related delays that confound the fixed skews. For this reason, it is difficult to measure the precise skew between a pair of channels in a particular recording. In situations where this number has been of concern (e.g. research we are conducting into recovering the location of each speaker from the tabletop mic recordings), our approach has been to try adding fixed multiples of the 1.33 ms quantum to each delay until the inferred positions become plausible. This approach is generally unambiguous, and confirmed the results from the direct measurements that the skews between channels are usually -4.0 ms. We had hoped to develop an automatic technique to go through all the recordings and measure the skews, either to correct them before release or at least to publish reference figures. Unfortunately, we have yet to come up with a satisfactory technique with which to calculate these numbers. If and when this analysis is performed, the results will be made available. * end *