DAn Ellis's ICSI homepage
Until August 2000, I was a
senior research scientist working on speech and audio processing and
recognition within the Realization group at the International Computer
Science Institute in Berkeley, California.
This is basically a frozen version of the web site from that
phase of my career, though there are certain parts that I am
keeping up-to-date (such as my talks and publications lists).
I am now an assistant professor at Columbia University in the City of
New York. For more details, see
my Columbia website.
You can also see my even older web site,
with some personal and research-related information, at
the M.I.T. Media Lab.
It hasn't been updated since 1996apr.
.. wherein may be found several pages relating to my research.
Again, I apologize for the absence of an overview or roadmap
My bookmarks file.
I finally created a web page to describe the
SPRACHcore connectionist speech recognition software release.
Put up a page of my
notes on some papers
at ICASSP-2000 complete with links to the ICSI-local online
copy of the
Also put up the PDF of my
talk on Tandem Acoustic
Modeling, as given at Sheffield last week, and re-used for the
Realization lunch this week.
My ICASSP 2000 poster
on Tandem connectionist modeling is now up.
I have started to collect the results from my experiments with
this year's Aurora noisy digits task in the
Aurora 2000 page.
I gave a talk at the AVIOS conference in San Jose on
through combination - a pretty high-level survey of
combination strategies in recognition.
Also added a page on
the recording control software
for the Meeting Recorder project.
Created a page describing the
Videoconferencing Room Audio Setup
being used for data collection in the Meeting Recorder project.
I added this within my
directory of Meeting Recorder Project information.
Added slides for my latest talk,
Sound content-based analysis
which was an update to the members of the group on what I'm thinking about.
Created a page describing
the use of cross correlation for
with the new meeting recorder data we are collecting.
Added the slides from my quick overview talk on
that I gave at a recent mini-workshop on campus.
Added the slide pack describing my most recent
meetings to the realization group lunch.
Finally finished a page sketching the tortuous path by which the
original training targets for the AURORA noisy digits task
Added my slide pack from the ICSI lunch talk reviewing
I've started assembling some information on my new project,
General Audio Mixture Analysis and Retrieval,
which is concerned with using information retrieval, computational
auditory scene analysis and machine learning to provide content-based
access to real audio, particularly that which does not contain speech.
Currently, there's just the PDF of the project summary and description
that I submitted to the NSF as part of a grant application.
My European trip continued with a poster on speech/music discrimination at Eurospeech-99, then talks at meetings of the RESPITE and THISL projects, both held in Switzerland.
I have been in Finland visiting the Tampere University of Technology.
I gave two talks, one on
and one on
both available in Acrobat PDF from my
- New details continue to appear on the AURORA page, including
a comparison of
the statistics of MSG and MFCC features, which would
appear to account for the poor performance of the HTK/MSG system.
- I've started a page describing my experiments with the
AURORA noisy digits task
as part of the RESPITE project.
- I reproduced some results presented to us by Hynek Hermansky of
showing that the
mean spectral features
around a given label have
significant structure out to several hundred milliseconds i.e. far
beyond the duration of that single phone.
is a speech-input spoken-document-retrieval system that we
are developing in conjunction with several European partners.
I've been working on a Tcl/Tk front-end, as described in this page.
- I did an investigation of the
temporal structure of modulation-filtered spectrogram features
just to see what these excellent features of Brian's really look like.
- I'm trying to finish a detailed description of the Spam+AntiSpam (a/k/a spamnotspam) project I worked on a little before the ICSLP submission deadline. You can see the beginning of it at The spamnotspam report.
- I've copied a bunch of proceedings CD-ROMs onto my scratch disk
(for now); you can find them at the Realization Group Online Proceedings Page.
- I have some pages relating to the ICSI Broadcast News (HUB4) effort: Results summary table, Feature comparison matrix, and training and decode machine usage.
- Put up the slides from my Realization Group Meeting talk last week on Visualization tools. Also, added a 2nd image to my (private) RESPITE site
- Added the slides from my talk on automatic audio content analysis at the recent MPEG-7 symposium in San Jose.
- Put up the
companion page for my submission to
the Speech Communications special issue on CASA which includes
client-side image maps to play the different sound examples.
- Added improved figures and some new placeholder text to my
introduction to the ICSI speech recognition software.
- Put together a bunch of hints and tips
relating to the Mohonk workshop from myself and other committee
members. It includes a bunch of general notes on publishing
electronic documents via PostScript and PDF.
- Made a page for my older but possibly more useful
pfile viewer pfview.
Also, put up the page for my
- Started a page of noise-excited speech examples to illustrate the use
of my surfsynth tool, as well as the insight
it can provide about what's really happening in our feature sets.
- Added links to the slides of my Mohonk WASPAA talk and Haskins/NUWC talk from my recent trip out east to the talks page.
- The beginnings of my introduction to the ICSI speech recognition software. This is so vestigial that I feel ashamed even putting it on the site, but maybe the shame will encourage me to work on it.
- A page describing the Speech recognition visualization front-end I recently constructed as a demo of our work, but which ended up looking like it might make a useful research diagnostic tool.
- A page documenting the ICSI AppleTalk network.
- My package of sound file utilities, dpwelib/sndutils, now has its
own home page.
Click here to go to a page of things available
only from within ICSI.
My web-based browser for speech files generates spectrograms on the
fly using a Tcl version of Boutell's "gd" library. See it live in
The home page for the 1997 IEEE Mohonk
Workshop on Applications of Signal Processing to Acoustics and Audio.
I'm on the committee and am responsible for the web pages.
There's a fun web-site introducing sinewave speech here at Haskins
Labs. I put together some matlab code to allow you to play with the
data they provide - you can download it here.
STP: I have a
personal STP homepage, including my notes on
how to verify the phoneme labels produced by
the transcribers working on this project.
There also will you find the latest version of the bestiary.
I've produced an HTML filter for stripping annoying ad banners
from some common web index sites. Read about it
My demonstration of on-line
soundfile submission and processing (adding reverb in this case) runs
CD-ROM server runs on MONTOYA where the CD-ROM drive and
SGI binaries are.
Some pages of 3rd party information
that I have copied onto my server to be accessible even when the net is
down (surely never!).
I have made a spreadsheet of all the disks
I could find around the realization group.
Here's my Change of address notification
(print your own version from the postscript here,
preferably on a modern laser printer like zp2).
Updated: $Date: 2000/09/19 19:35:50 $
International Computer Science Institute, Berkeley CA
(510) 666-2940 fax: (510) 666-2956
-----BEGIN PGP PUBLIC KEY BLOCK-----
-----END PGP PUBLIC KEY BLOCK-----