Below are the Metric Details for ASR Benchmarking:
Word Error Rate (WER)
Summary: Computes the number of word errors (insertions, deletions, and substitution) relative to the length of the reference transcription.
Description: This is the most popular automatic metric for Speech Recognition evaluation. To compute it, the automatically transcribed word sequence is aligned with the reference (spoken) word sequence using dynamic string alignment. Given word insertions (I), deletions (D), and substitutions (S), the word error rate is computed as
WER = (I + D + S)/N (Where N is the number of words in the reference)
Range of values:
>=0%, there is no upper limit for this metric.
Direction of improvement:
Lower is better.
Human manual transcription has an estimated WER slightly higher than 5%.
Character Error Rate (CER)
Summary: Identical to WER except that it is computed on characters instead of words. This is more suitable for character based languages such as Chinese.
Range of values:
>= 0%, there is no upper limit for this metric.
Direction of improvement:
Lower is better
Match Error Rate (MER)
Summary: MER is the proportion of I/O word matches which are errors.
Description: MER is similar to WER, but is normalized with the number of words in reference (N) plus the number of insertions (instead N only). To compute it, the automatically transcribed word sequence is aligned with the reference (spoken) word sequence using dynamic string alignment. Given word insertions (I), deletions (D), substitutions (S), and (H) hits (matches), match error rate is computed as:
WER = (I + D + S)/(N+I) (Where N is the number of words in the reference)
Range of Values:
>= 0, there is no upper limit for this metric.
Direction of improvement:
Lower is better
Word Information Lost (WIL)
Summary: Approximates RIL using hits, insertions, deletions, and substitutions.
Description: Given the number of word hits (H), insertions (I), deletions (D), and substitutions (S), it is computed as:
1 - H2/(H+S+D)(H+S+I)
Range of Values:
0<= WIL <= 1
Direction of improvement:
Lower is better
Word Information Preserved (WIP)
Summary: The WIP measure results from an approximation to the proportion of the information about the true sequence which is preserved in the recognized sequence. It has comparable simplicity to WER but neither of its disadvantages.
Description: The WIP measure results from an approximation to the proportion of the information about the true sequence which is preserved in the recognized sequence. It has comparable simplicity to WER but neither of its disadvantages.