EECS 225d (3 units)
Audio Signal Processing in Humans and Machines
Initial class (Jan. 21): 212 Cory, Wed 3:30-5
Afterwards: ICSI, 1947 Center Street, 6th floor, MW 3:30-5
Spring Semester, 2009
Professor Morgan

The focus of the course is on engineering models for speech and audio processing. These models are used to design systems for analysis, synthesis, and recognition. For each of these topics we will discuss not only the engineering methods, but also some of the physiological and psychoacoustic properties of the human auditory and speech generation systems. This latter information can provide an important perspective: how can we make use of knowledge about these natural systems when we design artificial ones?

Topics typically include: an introduction to pattern recognition; speech coding, synthesis, and recognition; models of speech production and perception; signal processing for speech analysis; pitch perception and auditory spectral analysis with applications to speech processing; a historical survey of speech synthesizers from the 18th century to the present; robustness to environmental variation in speech recognizers; vocoders; statistical speech recognition, including introduction to Hidden Markov Model and Neural Network approaches.

Prerequisites: EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Required text: B. Gold and N. Morgan ``Speech and Audio Signal Processing'', Wiley Press 1999.

Supplementary reading:

EECS 225d Tentative Schedule, Spring 2009

HISTORICAL BACKGROUND
WEEK 1:
1. Overall Introduction: What's the big idea? Jan 21 (Morgan)
WEEK 2:
2. Brief History of Synthetic Audio/speech analysis and synthesis Jan 26 (Morgan)
3. Brief History of Automatic Speech Recognition Jan 28 (Morgan)
WEEK 3:
4. Speech Recognition Overview Feb 2 (Morgan)
5. Human Speech Recognition Feb 4 (Morgan)

MATHEMATICAL BACKGROUND
WEEK 4:
6. Pattern Classification Feb 9 (Morgan)
7. Statistical Pattern Classification Feb 11 (Morgan)
WEEK 5:
Holiday Feb 16
8. Signal Processing review Feb 18 (Morgan)
WEEK 6:
9.Acoustical Basics Feb 23 (Morgan)
10.Room acoustics Feb 25 (Morgan)

ENGINEERING APPLICATIONS
WEEK 7:
11. Filter Bank, Cepstral and LPC Analysis for ASR March 2 (Morgan)
12. Feature Extraction for ASR March 4 (Morgan)
WEEK 8:
13. Linguistic Categories March 9 (Johnson)
14. Deterministic Sequence Recognition March 11 (Morgan)
WEEK 9:
15. Statistical Sequence Recognition March 16 (Morgan)
16. Statistical Model Training March 18 (Morgan)
WEEK 10:
Spring Break March 23-27
WEEK 11:
17. Complete ASR Systems March 30 (Stolcke, SRI)
18. Discriminant Acoustic Probability Estimation April 1 (Morgan)
WEEK 12:
19. Speech Understanding Apr 6 (Hakkani-Tur)
20. Spoken Dialog Systems Apr 8 (Hakkani-Tur)
WEEK 13:
21. Speaker Verification April 13 (Lei)
22. Speech Synthesis Apr 15 (Silverman, Apple)
WEEK 14:
23. Speech Coding (Lopez) April 20
24. Music/audio processing (Lazzaro) April 22

MAMMALIAN PROCESSING
WEEK 14:
25. Auditory Pathway (chapter 14) Apr 27 (Ghitza)
26. Psychophysics (chapter 15)Apr 29 (Ghitza)
WEEK 15:
27. Pitch Perception (chapter 16) May 4 (Ghitza)
28. Speech Perception (chapter 17) May 6 (Ghitza)
WEEK 16:
29. Student presentations (extended session) May 11



Maintained by:
N. Morgan
morgan@ICSI.Berkeley.EDU
$Date: 2008/11/3 16:00:00 $