ICSI Speech FAQ:
3.2 What are the wavfile data formats, and how can I manipulate wavfiles?

Answer by: dpwe, gelbart - 2000-05-26, 2002-05-07

Wavfile, wavefile or soundfiles are names for the files containing the original waveform of the speech or sound data. At some level, this is just a series of numbers corresponding to the voltage measured (or 'sampled') across a microphone at a regular, short interval (the sampling period). When converted back to a voltage and fed to a speaker, the original sound is played back. The majority of soundfiles we deal with store samples as 16 bit signed integers, sampled at either 8 or 16 kHz (i.e. a sampling interval of 125us or 62.5us). The differences between the different file formats are merely in the header, i.e. the extra information that may be included to describe the format of the bulk of the file.

For information about how to view and listen to wavfiles, see sndplay, xwaves, and wavesurfer.tcl under the "What are the principal programs used to do speech recognition research at ICSI?" FAQ question.

Soundfiles can vary across the following dimensions. Often, this information (or part of it) is stored in the header (depending on the format):

Sampling rate i.e. the interval between successive samples
Channels - typically mono or stereo, though sometimes more. In most cases, data in multichannel soundfiles is stored interleaved, so for a stereo file the sample sequence is left0, right0, left1, right1, left2 ... where the numbers are time indexes.
Sample format - although 16 bit linear samples are most common, almost any numerical representation is possible. 8-bit "mu-law" encoded is quite common, as are 8 and 32 bit linear encodings. 32 bit floats are not unknown.
Byte order (for formats that use more than 8 bits per sample) i.e. is the data in little-endian (Intel, least-significant-byte first) or big-endian (Sun/Macintosh, most-significant-byte first)?

Here are the most common soundfile formats encountered at ICSI;

Format	SNDF tag	Filename	Description
NIST/Sphere	NIST	.wav, .nist, *.sph	Headered soundfile format defined by NIST and used in many of the LDC and other standard speech databases. Can contain compressed data, for instance by holding embedded shorten data. Header format is mainly ascii and thus easily extensible, with the result that previously-unknown formats sometimes appear and break the sndutils.
Microsoft WAVE	MSWAVE	*.wav	Microsoft's basic sampled sound format, used in Windows.
AIFF	AIFF	.aif, .aiff	Audio Interchange File Format, a fully-fledged file format able to hold sampled waveforms plus ancilliary data. Popular on the Mac and SGI platforms.
Raw PCM	PCM	.raw, .pcm	Headerless sound samples, usually 16 bit shorts, byte order unspecified.
ESPS	ESPS	*.sd	The wavefile format used by the xwaves/ESPS packages. This is a pretty complicated header that I haven't fully decoded, and is probably best avoided (xwaves now accepts NIST format as an alternative). Unfortunately, some of the databases at ICSI (e.g. BeRP) have wavfiles still in this format.
Shorten	-	*.shn	Shorten is not a wavfile format per se, but a compression algorithm specifically designed for audio data. It is implemented by Tony Robinson's shorten program, which is invoked transparently by the sndutil programs (sndcat, sndplay, etc.) for any filename ending .shn. Shorten compression is usually lossless (this is an option at compress time) and typically saves 50-80% of the original file size, depending on the amount of energy in the original file. Shorten compression can be applied to data in any wavfile format; since it is lossless, it doesn't have to specifically avoid the header data (which will be exactly reconstructed), although the compression will be very poor if the format, alignment or endianness of the actual sound data is not correctly matched. Alternatively, shorten can skip a specified number of bytes at the beginning of the file to leave an initial header unmodified (shorten -a 1024). Compressed wavfiles conventionally have two extensions e.g. chan1.wav.shn is a shorten-compressed NIST wavfile.

The SNDF tag in the table above is the shorthand used to refer to that format by the various sndutils tools ( sndcat, sndplay etc. - any program based on the sndf library) as well as some scripts that use them.

How do I extract portions of wavfiles?

Use sndcat's -k, -K, -d, -D, -e, or -E options.

How do I convert between formats?

The sndcat tool allows you to convert between many different file formats by specifying input format with the -S option and output format with the -T option. To convert between different encoding types use the -f option. The NIST w_decode and w_encode tools can also be useful for converting between encoding types. They may be able to deal with NIST-format files whose headers confuse sndcat (this may happen for example with the embedded-shorten encoding type).

There are also the sox and sph2pipe tools:

Date: Thu, 28 Jun 2001 12:37:49 PDT
From: Andreas Stolcke 
To: speech@ICSI.Berkeley.EDU
Cc: sysadmin@ICSI.Berkeley.EDU
Subject: New audio file utilities

I updated the sox (Sound Exchange) tool to the latest version.
It lives in /usr/local/bin/.  A big bonus is that this version
supports NIST SPHERE wav files, so you can convert them directly from/to
all the other formats supported by sox.

Another tool is sph2pipe (installed in /u/drspeech/*/bin) which 
extracts channels or time ranges from SPHERE files, and writes
either SPHERE or Windows .wav format.
Run sph2pipe -h for more info.

sph2pipe also comes in a Windows version, and can be used to play
SPHERE files there.  Just grab /u/drspeech/src/sph2pipe_v2.1/sph2pipe.exe .

--Andreas

How do I concatenate wavfiles?

This can be done with sndcat. Merging soundfiles with differing sample rates will not resample any of them, but will write a concatenated soundfile tagged with the sampling rate of the first input.

Previous: 3.1 I found this file. What is it? - Next: 3.3 What are the feature data formats?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:14 PDT 2009

ICSI Speech FAQ: 3.2 What are the wavfile data formats, and how can I manipulate wavfiles?

How do I extract portions of wavfiles?

How do I convert between formats?

How do I concatenate wavfiles?

ICSI Speech FAQ:
3.2 What are the wavfile data formats, and how can I manipulate wavfiles?