This is G o o g l e's cache of http://www.nist.gov/speech/tests/sigtests/signtest.htm as retrieved on 3 Dec 2007 09:57:24 GMT.
G o o g l e's cache is the snapshot that we took of the page as we crawled the web.
The page may have changed since that time. Click here for the current page without highlighting.
This cached page may reference images which are no longer available. Click here for the cached text only.
To link to or bookmark this page, use the following url: http://www.google.com/search?q=cache:frbFZKB52XwJ:www.nist.gov/speech/tests/sigtests/signtest.htm+site:nist.gov/speech+significance&hl=en&ct=clnk&cd=2&gl=ca&client=firefox-a


Google is neither affiliated with the authors of this page nor responsible for its content.
These search terms have been highlighted: significance 

Speech Group
Home
Benchmark Tests
Tools and APIs
Test Beds
Publications
Staff
History
Participants
ITL Website
IAD Website
Contact Webmaster
Tool and API's

Sign Test

Sign Test

The sign test, first suggested for use in speech recognition benchmark tests by Makhoul ([1], p. 12), is a test comparing word error rates on the different speakers, or on the different conversation sides, or on other prespecified subsets of a test set. It looks simply at which system performs better on each such subset. If there is systematic evidence of differences in a consistent direction, this may prove to be significant even if the magnitudes of the differences are small.

If the measure used as basis for deciding better performance is continuous, then the probability of exactly equal performance would be zero. In practice, the possibility of equality must allowed for, generally by dropping such subsets (speakers) from the collection considered. This has only slight theoretical difficulties (see [2], p. 855), and is standard practice.

If the null hypothesis holds, then the probability is 1/2 that either subset will have better performance. Thus, if there are N subsets, the distribution of the statistic CA is the binomial B(N,1/2).

Let c, cA, and cB be the measured values of C, CA, and CB, respectively. The null hypothesis is rejected if

Prob(C <= c) = Prob(min(CA, CB) <= c) = Prob(CA <= c) + Prob(CB <= c)

= 2 * Prob(B(N,1/2 <= c) <= 0.05 (two-tailed)

Prob(CA <= cA) = Prob(B(N,1/2) <= cA) <= 0.05 (one-tailed)

These probabilities may be found directly from tables for the binomial distribution, or for large N (> 10), from the normal approximation. Table 1 lists critical values, i.e., upper bounds on C for significance at p=0.05, for a range of values of N. (See [3] for one source of this data.).

The Sign test is generally less powerful than the Wilcoxon test, described next, which applies in similar evaluation situations. It is, however, simple and easy to use, and thus regularly used by NIST in evaluation reports.

Sign Test Critical Values, p=0.05

Number of Subsets (N)

Two-Tailed

One-Tailed

5

---

0

6

0

0

7

0

0

8

0

1

9

1

1

10

1

1

11

1

2

12

2

2

13

2

3

14

2

3

15

3

3

16

3

4

17

4

4

18

4

5

19

4

5

20

5

5

21

5

6

22

5

6

23

6

7

24

6

7

25

7

7

Table 1: Critical values for sign test for different numbers of subsets at significance p=0.05. For significance, the test statistic must be less than or equal to the critical value.


References

[1] D. Pallett, J. Fiscus, and J. Garofolo, "Resource Management Corpus: September 1992 Test Set Benchmark Test Results", Proceedings of ARPA Microelectronics Technology Office Continuous Speech Recognition Workshop, Stanford, CA, September 21-22, 1992.

[2] R. Winkler and W. Hays, Statistics: Probability, Inference and Decision, second edition, Holt, Rinehart, and Winston, 1975.

[3] G. Kanji, 100 Statistical Tests, SAGE Publications, 1994

 
Created: 04-Oct-2000
Last updated: 20-Apr-2001