Transcription Performance Metrics

Modified on Fri, Nov 10, 2023 at 6:45 AM

Below are the Metric Details for ASR Benchmarking:

1. Word Error Rate (WER)

Summary: Computes the number of word errors (insertions, deletions, and substitution) relative to the length of the reference transcription.

Description: This is the most popular automatic metric for Speech Recognition evaluation. To compute it, the automatically transcribed word sequence is aligned with the reference (spoken) word sequence using dynamic string alignment. Given word insertions (I), deletions (D), and substitutions (S), the word error rate is computed as

WER = (I + D + S)/N (Where N is the number of words in the reference)

Range of values: >=0%; there is no upper limit for this metric.

Direction of improvement: Lower is better.

Human manual transcription has an estimated WER slightly higher than 5%.

2. Character Error Rate (CER)

Summary: Identical to WER except that it is computed on characters instead of words. This is more suitable for character-based languages such as Chinese.

Range of values: >= 0%; there is no upper limit for this metric.

Direction of improvement: Lower is better

3. Match Error Rate (MER)

Summary: MER is the proportion of I/O word matches which are errors.

Description: MER is similar to WER but is normalized with the number of words in reference (N) plus the number of insertions (instead N only). To compute it, the automatically transcribed word sequence is aligned with the reference (spoken) word sequence using dynamic string alignment. Given word insertions (I), deletions (D), substitutions (S), and (H) hits (matches), match error rate is computed as:

WER = (I + D + S)/(N+I) (Where N is the number of words in the reference)

Range of Values: >= 0, there is no upper limit for this metric.

Direction of improvement: Lower is better

4. Word Information Lost (WIL)

Summary: Approximates RIL using hits, insertions, deletions, and substitutions.

Description: Given the number of word hits (H), insertions (I), deletions (D), and substitutions (S), it is computed as:

1 - H2/(H+S+D)(H+S+I)

Range of Values: 0<= WIL <= 1

Direction of improvement: Lower is better

5. Word Information Preserved (WIP)

Summary: The WIP measure results from an approximation to the proportion of the information about the true sequence which is preserved in the recognized sequence. It has comparable simplicity to WER but neither of its disadvantages.

Description: The WIP measure results from an approximation to the proportion of the information about the true sequence which is preserved in the recognized sequence. It has comparable simplicity to WER but neither of its disadvantages.

6. Exact Match works with functions with output modality text

Summary: A given predicted string's exact match score is 1 if it is the exact same as its reference string, and is 0 otherwise.

Description:I f the characters of the model's prediction exactly match the characters of (one of) the True Answer(s), score is 1, otherwise score is 0. This is a strict all-or-nothing metric; being off by a single character results in a score of 0.

Range of values: 0 or 1

Direction of improvement: Higher is better.