Date: Wed, 8 May 2002 09:29:07 -0700 (PDT)
From: Renate and Thilo Weller and Pfau <renate_thilo@yahoo.com>
Subject: Re: presegmentation
To: Panu Somervuo <panus@ICSI.Berkeley.EDU>
MIME-Version: 1.0
X-Keywords: 

Hi Panu,

it has been a long time since I gave something
directly to Jane, since the last meetings were
transcribed by 'tigerfish#. In that case you have to
convert the trs-file (the one without the trs-ending)
into the linearized version!
In order to do that you first have to run trs2list and
then some of Adam's tools (makewaveseg ..., best ask
him about that) to convert the multichannel output
into the linearized wavefile, which then will be sent
to tigerfish for transcription.
When transcirption is done inhouse, first do a manual
check loading the trs file (*-AfterCorrelAndPzmCorrel)
into tghe multichannel transcriber to see if the
quality is reasonable. If not, some parameters can be
tuned:
-speech and nonspeech priors in the cfg-files
-correlation thresholds for the last posprocessing
step, or check if the quality before correl and
pzmcorrel seems to be better!

Hope this helps

Thilo


--- Panu Somervuo <panus@ICSI.Berkeley.EDU> wrote:
> Hi Thilo,
> 
> I run your segmenter to btr001 and btr002 data.
> Could you tell what did you 
> usually give to Jane or transcribers (there are more
> than one trs file in the 
> output directory). And did you do some manual
> corrections or parameter tunings 
> afterwards (or manual checking/correction of
> automatic segmentation).
> All information is welcome :)
> 
> Panu
> 
> 
> 
===========================================================
Date: Tue, 4 Jun 2002 02:12:31 -0700 (PDT)
From: Renate and Thilo Weller and Pfau <renate_thilo@yahoo.com>
Subject: Re: ICSI and SRI data
To: Panu Somervuo <panus@ICSI.Berkeley.EDU>
MIME-Version: 1.0

Hi Panu,

sorry again for the late response. Have been quite
busy around here.

--- Panu Somervuo <panus@ICSI.Berkeley.EDU> wrote:
> Hi Thilo,
> 
> ookayy, now the question things start again.
> But first: your programs have been running smoothly
> so nothing to complaing 
> about them.
>
Good to hear! ;-) 
> Now we have data from SRI and I was told there are
> no time skews in the channels 
> of their recordings. I run the segmenter with and
> without skew compensation and 
> surprisingly got the same segmentations. Do you
> remember did you benefit much 
> from the skew compensation? The quality seems to be
> ok, although there are some 
> inserted and some deleted speech segments every now
> and then, but probably there 
> were those also for the ICSI segmentations. Do you
> have some suggestions what to 
> do if using the data from other site besides running
> the segmenter as it is?
> 
What exactly did you change to run the segmenter
without skew??? I am asking, since several tools have
to be set to run withgout time skew correction (at
least the segmenter itself and the feature extractor
if I remember correctly). I am not sure if there are
option for the featire extractor to run it without
time skew cirrection or if there was a separate
version, just ignoring tine skews in the acoustic
preprocessing. If i remember correctly there is a
funciton 'start' in the class CAkupreSimple which can
be given an optional argument which defines the skew.
If you do not want to correct for the skew you just
have to set this argument to zero.

> Another thing is trying to develop the segmenter. I
> have read your ASRU2001 
> paper. How did you end up to the current approach?
The problem on the one hand was, that I did not like
the results with Gaussian system too much (too many
insertions mostly due to crosstalk). 
On the other hand I did not have a good tool for
training (or clustering) the Gaussians in the case of
mixture models and the ANN tools are quite simple to
configure.


> Hmm, maybe this is too broad 
> question, but I ask this because I would not like to
> repeat all the stuff you 
> have already done. On the other hand, it can be
> difficult to just continue from 
> the level where the segmenter is now without knowing
> its history. Well, let's 
> see.
>
Let me know, if you have more questions!!!

 
> Best,
> Panu
> 
Hear from you soon.

Thilo
===========================================================
Date: Wed, 5 Jun 2002 06:48:15 -0700 (PDT)
From: Renate and Thilo Weller and Pfau <renate_thilo@yahoo.com>
Subject: Re: ICSI and SRI data
To: Panu Somervuo <panus@ICSI.Berkeley.EDU>
MIME-Version: 1.0

Hi Panu,

--- Panu Somervuo <panus@ICSI.Berkeley.EDU> wrote:
> >Hi Panu,
> >
> >sorry again for the late response. Have been quite
> >busy around here.
> >
> 
> Hi,
> 
> I appreciate if you have time to answer. But of
> course I'm aware that you have 
> your own things to do and really it's not your job
> to be my teacher all the 
> time. Anyway, thanks for replying.
>
Don't worry, it really is no hassle at all, and I
appreciate your efforts to get the segmentation
running again!
 
> >What exactly did you change to run the segmenter
> >without skew??? I am asking, since several tools
> have
> >to be set to run withgout time skew correction (at
> >least the segmenter itself and the feature
> extractor
> >if I remember correctly). I am not sure if there
> are
> >option for the featire extractor to run it without
> >time skew cirrection or if there was a separate
> >version, just ignoring tine skews in the acoustic
> >preprocessing. If i remember correctly there is a
> >funciton 'start' in the class CAkupreSimple which
> can
> >be given an optional argument which defines the
> skew.
> >If you do not want to correct for the skew you just
> >have to set this argument to zero.
> >
> 1) For CreateFeatAndLab in the .cfg file I set
>    Tools.CorrectForSkew = No 
>    I also compiled the program again and checked
> that this flag has an effect.
>    (for some reason I felt that without
> recompilation that flag had no effect, 
> at least the program outputted the skews which were
> not zero)
> 
> 2) For correlations, I used
> calcmaxcorreltopzm-noskew
> ...then qnsfwd to get apost-files
> 
> 3) For sns-detector (spdtest...), I explicitly set
> the ByteSkips of all 
>    channels to 0 and recompiled the program.
> 
> What I compared was the results of the sns-detector
> with and without skew 
> compensation. Since steps 1) and 2) were the same
> for the two experiments, this 
> means that the apost files were the same for the
> both runs.

Oh, I understand. You should have created new
aposterioris abd used those!

 > But since the sns-detector will add the skews (if
> not explicitly set to zero)
> I thought there would be some difference between the
> performances.
>
The only thing the segmenter needs to know if there
are skews or there arent't is for the (optional)
thresholding step whoch not only uses the correlations
between close talking and pzm channels but also
correlations between different close talking channels.
 
> Maybe I should confirm the very fundamental thing:
> So sns-detector reads the pfiles (feature files),
> not anymore the waveform 
> files? 
It does read the waveform, just to calculate
correlations!!!! The hybrid system only uses the
aposteriors which are stored in the pfiles!!!

So even if you can define the  ByteSkips,
> does it have any effect 
> anymore? Or does the sns-detector do the entire
> feature computation all over 
> again (or just the normalization for the existing
> pfiles I thought). I guess the 
> latter, but I just want to confirm. I probably
> should read the code line by line 
> to really know what's going on.
> 
> Also very fundamental question:
> After the qnsfwd recognition stage, is the role of
> the HMM-based sns-detector  
> more to eliminate the false alarms than to add new
> segments? It apparently also 
> finetunes the existing segment borders.
It does so by using the priors, which are defined in
the cfg files.

> There are also some parameters for breath-detection,
> is this more like 
> finetuning or really essential part of the
> segmentation? 
This is just finetuning and I am not sure if it has
any effect in the currtent version any more, sorry.
> I thought it could be 
> possible to traing a separate breath-model (but most
> likely, what you have done, 
> does effectively the same).
> 
What I did, is just to find temporal patterns which
look like consecutive breaths. I guess what would be
better is to use additional features to characterize
breaths better?!


> >>
> >Let me know, if you have more questions!!!
> 
> I feel guilty already now asking too many questions
> :)
> But do you know, are you going to write some kind of
> tech. report of all your 
> stuff in some near or far future?
> 
I should and I will!


> Despite my mails, enjoy the summer!
> 
> Panu
> 
> 
Thilo
===========================================================
Date: Wed, 5 Jun 2002 06:51:09 -0700 (PDT)
From: Renate and Thilo Weller and Pfau <renate_thilo@yahoo.com>
Subject: Re: ICSI and SRI data
To: Panu Somervuo <panus@ICSI.Berkeley.EDU>
MIME-Version: 1.0


--- Panu Somervuo <panus@ICSI.Berkeley.EDU> wrote:
> 
> I wrote:
> > Maybe I should confirm the very fundamental thing:
> > So sns-detector reads the pfiles (feature files),
> not anymore the waveform 
> > files? So even if you can define the  ByteSkips,
> does it have any effect 
> > anymore? Or does the sns-detector do the entire
> feature computation all over 
> > again (or just the normalization for the existing
> pfiles I thought). I guess 
> 
> Ok, reading from the code it looks that it computes
> the features all over 
> again...
> 
 It does, but it only uses the correlations if I
remember correctly. In principle, all the feature
normaliyation stuff and related things can be thrown
out and you can use the feature of the pfiles, which
are already normalized. In fact this would save time,
since the feature extraction can be quite lenghty if
done remotely (not on the machine on which the wav
files are stored) 

 
> -Panu
> 
>

Thilo
===========================================================