Audio Signal Processing in Humans and Machines

EECS 225d (3 units)
Audio Signal Processing in Humans and Machines
Initial class (Jan. 22): 405 Davis, MWF 3-4
Afterwards: ICSI, 1947 Center Street, 6th floor, MWF 3-4
Spring Semester, 2003
Professor Morgan

The focus of the course is on engineering models for speech and audio processing. These models are used to design systems for analysis, synthesis, and recognition. For each of these topics there will be an emphasis on physiological and psychoacoustic properties of the human auditory and speech generation systems, particularly from an engineer's perspective: how can we make use of knowledge about these natural systems when we design artificial ones?

Topics include: an introduction to pattern recognition; speech coding, synthesis, and recognition; models of speech and music production and perception; signal processing for speech analysis; pitch perception and auditory spectral analysis with applications to speech and music; a historical survey of speech synthesizers from the 18th century to the present; robustness to environmental variation in speech recognizers; vocoders and music synthesizers; statistical speech recognition, including introduction to Hidden Markov Model and Neural Network approaches.

This semester, in addition to Professor Morgan, lectures will be given by Prof. Hynek Hermansky, inventor of some of the leading signal processing methods used in speech recognition. Dr. Barbara Peskin, formerly head research scientist at Dragon Systems, will also be speaking. Dr. John Lazzaro will also augment the course with some new material on internet audio and the MPEG standards. Additional expertise will come from Prof. David Wessel, the Director of the Center for New Music and Audio Technology (CNMAT), and from Prof. John Ohala of the Linguistics department.

Prerequisites: EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Required text: B. Gold and N. Morgan ``Speech and Audio Signal Processing'', Wiley Press 1999.

Supplementary reading:

R. Duda, P. Hart, and D. Stork, ``Pattern Classification,'' Wiley Interscience, 2001 edition. (Note: the 1973 version, entitled ``Pattern Classification and Scene Analysis'' and without Stork as co-author, is still useful).
J.L. Flanagan, ``Speech Analysis Synthesis and Perception,'' Springer-Verlag, 1972.
B. Moore, ``An Introduction to the Psychology of Hearing,'' Academic Press, 1989.
D. O'Shaughnessy, ``Speech Communication,'' IEEE Press 2000 (1987 version was published by Addison-Wesley).
L. Rabiner and B.-H. Juang, ``Fundamentals of Speech Recognition,'' Prentice Hall, 1993.
A. Waibel and K.F. Lee (eds.), ``Readings in Speech Recognition'' Morgan-Kaufmann, 1990
A number of recent papers

EECS 225d Tentative Schedule, Spring 2003

HISTORICAL BACKGROUND
WEEK 1:
1. Overall Introduction (Morgan) Jan 22
2. Early History of Synthetic Audio (Morgan) Jan 24
WEEK 2:
3. Speech Analysis/Synthesis Overview (Morgan) Jan 27
4. Brief History of Automatic Speech Recognition (ASR) (Morgan) Jan 29
5. Speech recognition Overview, continued (Morgan) Jan 31

MATHEMATICAL BACKGROUND
WEEK 3:
6. DSP Refresher (Morgan) Feb 3
7. DSP Refresher continued (Morgan) Feb 5
8. Pattern Classification (Morgan) Feb 7
WEEK 4:
9. Statistical Pattern Classification (Morgan) Feb 10
10. Expectation Maximization (EM) (Morgan) Feb 12
11. Wave Basics (Morgan) Feb 14

ACOUSTICS
WEEK 5:
President's Day Holiday Feb 17
12. Room Acoustics (Morgan) Feb 19
13. Speech Production Models (Hermansky) Feb 21
WEEK 6:

14. Music Production Models (Wessel) Feb 24
AUDITORY PERCEPTION
15. Ear Physiology (Lazzaro) Feb 26
16. Psychoacoustics (Hermansky) Feb 28
WEEK 7:
17. The Auditory System as a Filter Bank (Hermansky) March 3
18. Models of Speech Perception (Hermansky) March 5
19. Models of Pitch Perception (Lazzaro) Mar 7

SPEECH FEATURES
WEEK 8:
20. Human Speech Recognition (Hermansky) Mar 10
21. Filter Banks and Cepstral Analysis (Morgan) Mar 12
22. LPC for Speech Analysis (Morgan) Mar 14

AUTOMATIC SPEECH AND SPEAKER RECOGNITION
WEEK 9:
23. Discussion of projects (Morgan/Gelbart) March 17
24. Acoustic Phonetics; a brief introduction (Ohala) Mar 19
25. Feature Extraction for ASR (Morgan) Mar 21

WEEK 10:
Spring Break March 24-28

WEEK 11:
26. Deterministic Sequence Recognition (Morgan) Mar 31
27. Statistical Sequence Recognition (Peskin) Apr 2
28. Statistical Model Training 1 (Peskin) Apr 4

WEEK 12:
29. Statistical Model Training 2 (Peskin) Apr 7
30. Probability Estimation (Peskin) Apr 9
31. Complete ASR Systems (Peskin) Apr 11

WEEK 13:
32. Adaptation in ASR (Peskin) Apr 14
33. Speaker Verification (Peskin) Apr 16
34. Discriminant Training (Morgan) Apr 18

SYNTHESIS AND CODING

WEEK 14:
35. Speech Synthesis (Silverman) Apr 21
36. Speech Coding I (Hermansky) Apr 23
37. Speech Coding II (Hermansky) Apr 25

WEEK 15:
38. Pitch Detection of Speech and Music (Lazzaro) Apr 28
39. MPEG Audio (Lazzaro) Apr 30
40. Audio and internetworking (Lazzaro) May 2

WEEK 16:
41. Some Current Speech Research Topics (Morgan) May 5
42. Music Synthesis (Wessel) May 7
43. More Current Research Topics - Prosody (Shriberg) May 9

WEEK 17:
43. Student presentations May 12

Maintained by:
N. Morgan
morgan@ICSI.Berkeley.EDU
$Date: 2002/11/08 18:05:47 $