EECS 225d (3 units)
Audio Signal Processing in Humans and Machines
Initial class (Jan. 18): Cory Hall 531 (Wang Room), Wed 4:00-5:30
Afterwards: ICSI, 1947 Center Street, 6th floor, MW 4:00-5:30
Spring Semester, 2012
Professor Morgan (with contributions from several area experts)

The focus of this recently modified course is on engineering models for speech and audio processing. Mirroring the new edition of the associated textbook, the class will include more material about current audio processing methods such as psychoacoustic audio coding (e.g., MP3) and sound source separation. The methods discussed are used to design systems for analysis, synthesis, and recognition of speech and music. For many of these topics we will discuss not only the engineering methods, but also some of the physiological and psychoacoustic properties of the human auditory and speech generation systems. This latter information can provide an important perspective: how can we make use of knowledge about these natural systems when we design artificial ones?

Topics will include, among others: auditory physiology; room acoustics; music signal analysis; speech synthesis; models of speech production and perception; signal processing for speech analysis; robustness to environmental variation in speech recognizers; statistical speech recognition, including introduction to Hidden Markov Model and discriminative approaches.

Previous classes have included students from both EE and CS, whose varied backgrounds have enriched the course.

Prerequisites: EE123 or equivalent, and Stat 200A or equivalent; or grad standing and consent of instructor

Grading: 70% for final project, 30% for homeworks and quizzes in the first half

Required text: B. Gold, N. Morgan, and D. Ellis ``Speech and Audio Signal Processing'', 2nd edition, Wiley Press 2011.

Supplementary reading:

EECS 225d Tentative Schedule, Spring 2011

1. Overall Introduction: What's the big idea? Jan 18 (Morgan)
2. Brief History of Synthetic Audio/speech analysis and synthesis Jan 23 (Morgan)
3. Brief History of Automatic Speech Recognition Jan 25 (Morgan)
4. Speech Recognition Overview Jan 30 (Morgan)
5. Human Speech Recognition Feb 1 (Morgan)

6. Pattern Classification Feb 6 (Morgan)
7. Statistical Pattern Classification Feb 8 (Morgan)
8. Deep Learning (Vinyals) + Quiz Feb 13
9. Ear Physiology Feb 15 (Mesgarani, UCSF)
Holiday Feb 20
10.Acoustical Basics Feb 22 (Morgan)
11.Room acoustics Feb 27 (Morgan)
12. Linguistic sound categories Feb 29 (Johnson, Linguistics UCB)

13. Feature Extraction 1 March 5 (Morgan)
14. Feature Extraction 2 March 7 (Morgan)
15. Pitch + Quiz March 12 (Lazzaro)
16. Perceptual Audio Coding March 14 (Lazzaro)
WEEK 10:
17. Music Signal Analysis March 19 (Lazzaro)
18. Source Separation March 21 (Lazzaro)
WEEK 11:
Spring Break March 26-30
WEEK 12:
19. Deterministic Sequence Recognition April 2 (Morgan)
20. Statistical Sequence Recognition Apr 4 (Wegmann)
WEEK 13:
21. Statistical Model Training April 9 (Wegmann)
22. Adaptation and Discriminant Acoustic Probability Estimation April 11 (Wegmann)
WEEK 14:
23. Speech Synthesis April 16 (Silverman, Apple)
24. Language modeling for ASR Apr 18 (Stolcke, Microsoft)
WEEK 15:
24. Speaker Recognition April 23 (Mirghafori)
26. Speaker Diarization April 25 (Friedland)
WEEK 16:
No class, April 30
27-28. Student presentations (extended session, recitation week) May 2

Maintained by:
N. Morgan
$Date: 2008/11/3 16:00:00 $