Slides from my talks
NB: This page has been superceded by the version at
Columbia, please go
there for more recent talks.
This page points to the slide packs for talks I've given since the start
of 1997. They are almost all in Acrobat PDF format, derived from my
FrameMaker originals. Many of them have links to sound files, but I'm
afraid they won't work (except where noted).
- Recognition and Organization of Speech and Audio
(hosted by Brian Whitman at NEC Research Institute, Princeton NJ, 2001-08-16)
- Latest version of my "get to know me" talk with more emphasis on the non-speech-recognition aspects of LabROSA's work.
- Computational Models of Auditory Organization
(At the EU Advanced Course in Computational Neuroscience, Abdus Salam Center for Theoretical Physics, Trieste, Italy, 2001-08-09)
Also available as a ZIP file including all the linked sound examples.
- A talk on high level auditory perception and efforts to model it. The audience were experts in neural system modeling, but the talk barely touches that field - a case of "and now for something completely different..."
- Recognition and Organization of Speech and Audio
(hosted by Jont Allen at AT&T Shannon Labs, Florham Park NJ, 2001-06-21)
- Updated version of talk introducing my new lab at Columbia and some of the current and planned projects. Includes slides on new projects such as multisource decoding, lyrics recognition and acoustic detection of meeting participant motion.
- Tandem modeling investigations
(at the Respite project meeting, Saillon, Switzerland, 2001-01-25)
- Brief slide pack updating my colleagues on the Respite project
of recent work by myself and collaborators on Tandem acoustic modeling.
- Recognition and Organization of Speech and Audio
(at Chin Lee's group, Lucent, Murray Hill, 2001-01-12)
- A slightly improved version of my LabROSA introductory talk, including detail on the Tandem modeling approach. (I reused these slides for a talk at IDIAP in Martigny, Switzerland on 2001-02-01.)
- Recognition and Organization of Speech and Audio
(at the Columbia Univ. EE dept., 2000-10-13)
- This is a combined tutorial and seminar on the speech and audio
processing themes that I will be researching in my new lab, dubbed
LabROSA.
- RESPITE: Tandem & multistream research
(at the Sphear/RESPITE research workshop, Mons, Belgium, 2000-09-16)
- Review of work performed at ICSI (and OGI, CMU, Columbia...) in the preceding 6 months
relevant to the RESPITE multistream speech recognition theme.
Covers the latest experiments with Tandem modeling for the
large-vocabulary SPINE task, as well as online normalization and
foreign languages. Also mentions Barry Chen's work on multistream
mixtures-of-experts, and Mike Shire's work on multicondition
feature design.
- Tandem Acoustic Modeling: Neural nets for mainstream ASR?
(at Sheffield University Speech & Hearing Group, 2000-06-20, then at ICSI Real lunch, 2000-06-27)
- A discussion of the 'Tandem modeling' approach (feeding neural network
outputs as features into HTK to do better than either approach alone).
This is based on my
ICASSP-2000 poster
on the same topic, but has some new figures, partly in response
to comments received during the poster session.
- Improved recognition by combining different features and different systems
(at AVIOS 2000, San Jose, 2000-05-24)
- This was meant to be a relatively general-interest talk on
the various ways that speech recognition can be improved
by combining different approaches to the same problems.
AVIOS (the American Voice Input Output Society) is a very
applications-focussed conference.
- Content-based analysis and indexing for speech, sound & multimedia
(at the ICSI Real Lunch meeting, 2000-04-04)
- I hadn't given a talk about my own work to my own group for a long
time, so this was meant as an overview of the things I have been
thinking about for the past year or so, and the direction in
which I plan to go. Specifically: applying information retrieval
to multimedia content, particularly sound mixtures that are
broken up into objects using computational auditory scene
analysis.
- Speech Interfaces
(at the Human Centered Computing retreat, UCB, 2000-02-24)
- John Canny of Berkeley CS has been organizing an initiative in
Human Centered Computing -- roughly, the intersection of computer
science, social sciences and design. This talk was to provide
an overview of the state of speech recognition and some current
projects at ICSI, emphasizing our highly collaborative nature.
(n.b. it opens in full-screen mode - Ctl-L (or something like
that) returns you to normal window view).
- Sound Content Analysis
(at Shih-Fu Chang's group, Columbia University, 2000-02-08)
- I was visiting this group in at Columbia who are working on
content-based indexing and retrieval based on image and video cues;
it's an obvious match to my interest in audio content-based
retrieval. This talk spanned speech recognition, auditory
scene analysis, and my ideas for content-based analysis.
(Be prepared to hit Ctl-L to get out of full-screen mode).
- Jan 2000 European tour review
(at the ICSI real lunch meeting, 2000-02-01)
- Before you know it, I'm back to Europe, attending the final
meeting of the Thisl project and the end-of-year-one meeting
for the Respite project. These slides provide some overview and
updating of these meetings as I presented to the rest of the
home team; they are based on
the slides I used at the
meetings.
- European tour review
(at the ICSI real lunch meeting, 1999oct05)
- This was my brief slide pack reviewing the parts that I found
interesting at Eurospeech and the two EU project meetings, as
well as updating my colleagues on what we will be doing in
those projects.
- Thisl update
(at the Thisl meeting, Les Marecottes, Switzerland, 1999sep20)
- My second meeting in Switzerland was a brief progress review of
the
Thisl project on spoken document retrieval. This is a very brief
slide pack summarizing work at ICSI since the last meeting in June.
- AURORA with a neural net etc.
(at the RESPITE/SPHEAR workhop, Les Marecottes, Switzerland, 1999sep13)
- This was a private workshop for participants in the two European
projects being managed by Phil Green of Sheffield. I am involved
in RESPITE, and this brief talk described the work I've recently
been doing on addressing the AURORA noisy digits task with
neural net acoustic models, as well as a couple of other multistream-
related projects going on at ICSI.
- An overview of speech recognition research at ICSI
(at the Tampere University of Technology, 1999sep02)
- My second talk at TUT gave a little background to ICSI and
the realization group, a brief introduction to connectionist
speech recognition, and a lightning tour of some research
projects in speech recognition currently happening within
the group.
- CASA: Principles, practice & applications
(at the Tampere University of Technology, 1999sep01)
- As the guest of Anssi Klapuri and the Tampere International
Center for Signal Processing, I spent a few days at TUT and
gave a couple of talks. This one is intended as an introduction
to auditory scene analysis, computational modeling thereof,
and some applications - including some speculation about
content-based retrieval for nonspeech audio.
- ICSI/Thisl progress report
(at the Thisl meeting, Sheffield UK, 1999jun24)
-
This brief report summarized the work at ICSI on the THISL project
since the previous report in February. Specifically, the Thomson
NLP parser was integrated into the GUI, we trained an MSG acoustic
model on the BBC data, and I reported on some related projects
and developments.
- European projects update
(at ICSI Real Lunch Meeting, 1999feb11)
- On my return from the European trip described below, I gave
a lunch talk describing the meetings, and what I and others
had said at them.
- Current work at ICSI
(BBC R&D, London, and ICP Grenoble, France, 1999feb03-08)
- I spent ten days in Europe attending meetings of the THISL
and RESPITE projects - EU funded collaborations with European
labs and ICSI - and another meeting to discuss a possible future
project proposal involving many of the same partners. These
slides were the ones I used when presenting our work at these
meetings. I called it 'current work at ICSI', but of course
it was a very limited subset, just the work related to those
projects.
- Broadcast News: Features & acoustic modelling
(SPRACH final review meeting, INESC Lisbon, Portugal, 1998dec15)
- The 3 year EU collaborative SPRACH project ended in December 1998;
our final review meeting was mainly taken up with a description of the
system we had collectively submitted to the Broadcast News evaluation.
I was describing just ICSI's contribution in the acoustic modelling
(modulation-filtered spectrogram features and very large
multi-layer perceptron classifiers).
- Some aspects of the ICSI 1998 Broadcast News effort
(Part of the BN overview Real Lunch, 1998nov25)
- After the crazy rush to fulfill our part in submitting a full LVCSR
system to the 1998 NIST/DARPA Broadcast News evaluation, Morgan,
Eric, Adam and I gave a lunch talk to the rest of the group to explain
what all the fuss had been about. My part was about feature choice,
large nets, and some preliminary work on whole-utterance filters
(i.e. nonlinear segment normalization) and gender-dependence.
- Speech Recognition at ICSI: Broadcast News and Beyond
(at Erv Hafter's Ear Club, UC Berkeley Psychology, 1998sep21)
- Erv Hafter runs a seminar series as part of his UCB psychoacoustics
group which I agreed to address. In the event, it turned out to be
a fairly general talk about the Broadcast News task, our efforts
(in conjunction with our European partners) to field a system in
this year's evaluations, and other aspects of speech recognition that
I thought would interest hearing scientists.
- Review of September SPRACH/Thisl meetings
(at the Realization Group Lunch Meeting, ICSI, 1998sep09)
- Morgan and I went to another pair of meetings in Cambridge, UK,
for these two EU-funded projects. On return, I gave a brief review
of what had been discussed and the projects' status at our lunch
meeting, using these eight slides.
- Auditory Scene Analysis: Phenomena, theories and computational models
(at the NATO Advanced Studies Institute on Computational Hearing,
Il Ciocco, Italy, 1998jul11)
- NATO has a fund to support scientific meetings with a 'tutorial'
aspect. My colleague Steve Greenberg organized this 12 day meeting
on hearing which ranged from anatomy and physiology through to
speech recognition and auditory organization. I gave a 90 minute
talk on auditory scene analysis on the last full day.
- SPRACH/ThisL review
(parts of the EC project review meetings, 1998mar24/25, Mons, Belgium)
- Projects funded by the European Commission `Framework' program
have annual progress review meetings with external reviewers. We
are currently subcontractors on two, SPRACH and ThisL, and we had
back-to-back reviews for them. These are the few slides I
contributed to each day's proceedings covering aspects of the
work done at ICSI under these grants, and the single slide
I used to summarize the ThisL project when making a trip
report to the rest of our group on return.
- ICSI Speech Technology
(at Randy Katz's group meeting, UCB CS dept, 1998feb26)
- This group on campus is interested in using speech recognition
in some demo applications for their work in scalable and mobile
networking. I presented this one-page summary of what we do
at ICSI and the tools we could share with them.
- Visualization tools & demos and the ICSI Realization group
(ICSI Real Lunch, 1998feb12)
- One of my projects since being at ICSI has been to encourage and
support the proliferation of accessible demos of the research we
do. To this end, I've developed a number of specific visualization
tools within a Tcl/Tk + extensions framework. This talk served to
publicize these tools, and to share my vision of "a demo on every
desk".
- Automatic audio analysis for content description & indexing
(MPEG-7 Symposium, San Jose, 1998feb04)
- MPEG-7 is to be a new standard for the description and indexing of the
content of multimedia 'assets' such as video and audio. I was invited
to talk about my work in computational auditory scene analysis as
one approach to extracting the kind of information that the standard
might want to cover. You can learn more about MPEGs 1, 2, 4 and 7 at
the MPEG home page.
- ICSI/ThisL status report
(IDIAP, Switzerland 1997dec11)
- The ThisL project (Thematic Indexing of Spoken Language) had an informal
meeting. I went representing ICSI, and I gave a very brief presentation
of some relavent work at ICSI: visualization/user-interface tools,
recognizing in reverb by combining information at different time scales,
and my speech-mixtures stuff.
-
Problems and future work for ASR-in-CASA
(Stanford Hearing Seminar 1997nov20 /
Berkeley Ear Club 1997nov24)
- A replacement for the section 5 of the original Mohonk '97 talk, for
the extended version I gave of that talk when I got back to
California.
- On the importance of illusions for artificial listeners
(Haskins 1997oct24 / NUWC Newport 1997oct25)
- Forgive the title trying to be cute. This pack (again in Acrobat PDF)
comprise slides for two talks I gave while 'out east' for mohonk,
basically just introducing my work. One talk was to Haskins Lab in
New Haven - a bunch of very serious speech, hearing and language
scientists who probably see this stuff as too applied, and the next
was to a group of Navy sonar researchers who probably think this
hearing modelling is extremely left-field, blue sky stuff. Most
slides were the same in both talks, although the voice over differed!
- Computational Auditory Scene Analysis exploiting Speech Recognizer knowledge
(Mohonk 1997oct22)
- This is the actual presentation I made at the 1997 IEEE Mohonk Audio
Workshop. The slides are in Acrobat PDF format - I got sick of
translating them to HTML; hope that's OK.
- Exploiting ASR in CASA
(ICSI 1997may21)
- This was a lunchtime talk I gave to my colleagues at ICSI describing
the paper I had submitted to IEEE WASPAA'97
on an idea for integrating a speech recognition engine into a
computational auditory scene analysis system that is anticipating
a mixture of speech and nonspeech sounds. The first big problem
that came up was working the speech recognizer 'backwards' to recover
an estimate of the speech spectrum from the recognized phoneme labels;
the talk focusses mainly on this aspect.
- Digital Audio
(Lego 1997may06)
- This was a talk I gave at a mini workshop on digital audio
hosted by Lego (the plastic brick people) at their headquarters
in Billund, Denmark. They are looking into future generations
of computer-based toys, and brought together a collection of
researchers from industry and academia to brainstorm about
audio in toys. My talk was supposed to provide an introduction
and framework, focussing on synthesis.
- Divisive issues in Computational Auditory Scene Analysis
(Stanford Hearing Seminar, 1997mar06)-
- This was a talk I gave at Malcolm Slaney's Stanford Hearing Seminar.
It was intended to be a brief overview of research into computational
models of auditory scene analysis, focussing on the distinctions
between the different projects in this field.
[ Dan's research ] [ Dan Ellis ] [ ICSI Realization group ]
Updated: $Date: 2001/08/09 14:02:11 $
DAn Ellis <dpwe@icsi.berkeley.edu>
International Computer Science Institute, Berkeley CA