Audio Signal Processing in Humans and Machines

EECS 225d (3 units)
Audio Signal Processing in Humans and Machines
203 McLaughlin, MWF 1-2
Spring Semester, 1999
Professors Morgan and Gold

The focus of the course is on engineering models for speech and audio processing. These models are used to design systems for analysis, synthesis, and recognition. For each of these topics there will be an emphasis on physiological and psychoacoustic properties of the human auditory and speech generation systems, particularly from an engineer's perspective: how can we make use of knowledge about these natural systems when we design artificial ones?

Topics include: an introduction to pattern recognition; speech coding, synthesis, and recognition; models of speech and music production and perception; signal processing for speech analysis; pitch perception and auditory spectral analysis with applications to speech and music; a historical survey of speech synthesizers from the 18th century to the present; robustness to environmental variation in speech recognizers; vocoders and music synthesizers; statistical speech recognition, including introduction to Hidden Markov Model and Neural Network approaches.

Prerequisites: EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Required text: B. Gold and N. Morgan ``Speech and Audio Signal Processing'', Wiley Press 1999; this book is not out yet, so we will be distributing a preliminary preprint in class.

Supplementary reading:

R. Duda and P. Hart, ``Pattern Classification and Scene Analysis,'' Wiley Interscience, 1973 (new version coming out soon).
J.L. Flanagan, ``Speech Analysis Synthesis and Perception,'' Springer-Verlag, 1972.
B. Moore, ``An Introduction to the Psychology of Hearing,'' Academic Press, 1989.
D. O'Shaughnessy, ``Speech Communication,'' Addison-Wesley, 1987 (new version coming out this year).
L. Rabiner and B.-H. Juang, ``Fundamentals of Speech Recognition,'' Prentice Hall, 1993.
A. Waibel and K.F. Lee (eds.), ``Readings in Speech Recognition'' Morgan-Kaufmann, 1990

EECS 225d Tentative Schedule, Spring 1999

HISTORICAL BACKGROUND
WEEK 1:
1. Overall Introduction (Morgan) Jan 20
2. Early History of Synthetic Audio (Gold) Jan 22
WEEK 2:
3. Speech Analysis/Synthesis Overview (Gold) Jan 25
4. Brief History of Automatic Speech Recognition (ASR) (Morgan) Jan 27
5. Speech Recognition Overview (Morgan) Jan 29

MATHEMATICAL BACKGROUND
WEEK 3:
6. Digital Signal Processing (Gold) Feb 1
7. Digital Filters (Gold) Feb 3
8. Pattern Classification (Morgan) Feb 5
WEEK 4:
9. Statistical Pattern Classification (Morgan) Feb 8
10. Expectation Maximization (EM) (Morgan) Feb 10

ACOUSTICS
11. Wave Basics (Gold) Feb 12
WEEK 5:
President's Day Holiday Feb 15
12. Room Acoustics (Morgan) Feb 17
13. Speech Production Models (Gold) Feb 19
WEEK 6:
14. Music Production Models (Gold) Feb 22

AUDITORY PERCEPTION
15. Ear Physiology (Gold) Feb 24
16. Psychoacoustics (Gold) Feb 26
WEEK 7:
17. Models of Pitch Perception (Gold) March 1
18. Models of Speech Perception (Gold) March 3
19. Human Speech Recognition (Morgan) March 5

SPEECH FEATURES
WEEK 8:
20. The Auditory System as a Filter Bank (Gold) March 8
21. Filter Banks and Cepstral Analysis (Gold) Mar 10
22. LPC for Speech Analysis (Morgan) Mar 12

SYNTHESIS AND CODING
WEEK 9:
23. Speech Synthesis (Gold) Mar 15
24. Pitch Detection of Speech and Music (Gold) Mar 17
25. MIDTERM (on WEEKS 1-8) March 19

WEEK 10:
Spring Break March 22-26

WEEK 11:
26. Channel Vocoders and Predictive Coding (Gold) Mar 29
27. Low Rate Coding (Gold) Mar 31
28. High Rate Coding (Celp, STC, etc) (Gold) Apr 2

AUTOMATIC SPEECH RECOGNITION
WEEK 12:
29. Feature Extraction for ASR (Morgan) Apr 5
30. Acoustic Phonetics; a brief introduction (Fosler-Lussier) Apr 7
31. Deterministic Sequence Recognition (Morgan) Apr 9
WEEK 13:
32. Statistical Sequence Recognition (Morgan) Apr 12
33. HMM Training 1 (Morgan) Apr 14
34. HMM Training 2 (Morgan) Apr 16
WEEK 14:
35. Discriminant Training (Morgan) Apr 19
36. Complete ASR Systems 1 (Fosler-Lussier) Apr 21
37. Complete ASR Systems 2 (Fosler-Lussier) Apr 23

OTHER APPLICATIONS
WEEK 15:
38. Speaker Verification (Genoud) Apr 26
39. Audio Transformations (Gold) Apr 28
Charter Anniversary (Holiday) April 30
WEEK 16:
40. Music Synthesis (Gold) May 3
41. Student presentations May 5
42. Student presentations May 7
WEEK 17:
43. Q and A, Some Current Research Topics (Morgan) May 10

Maintained by:
N. Morgan
morgan@ICSI.Berkeley.EDU
$Date: 1999/01/19 00:03:03 $