2013-12-27

From cslt Wiki
Jump to: navigation, search

AM development

Sparse DNN

  • Optimal Brain Damage(OBD).
  1. Online OBD held.
  2. OBD + L1 norm start to investigation.
  • Efficient computing
  1. Conducting rearrangement the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed.


Efficient DNN training

  1. Moment-based training. With m=0.2 performs the best on WER. Results are not so consistent. For 1900, m=0.2 is the best; For online1 and online2, m=0 is the best.
  2. Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  3. Frame-skipping. Skipping 1 frame speeds up decoding in a consistent way while retaining the accuracy largely. Skipping more frames lead to unacceptable performance degradation.
                      mom0.05  mom0.1  mom0.3  mom0.4  mom0.5  mom0.6  mom0.8  fs_1  fs_2  fs_3
            -------------------------------------------------------------------------------------
            avg_time | 4500     4175    3380    3460    3448    3521    4212    3149   2692  2716
                RT   | 1.52     1.44    1.12    1.14    1.14    1.16    1.38    1.04   0.90  0.92

Optimal phoneset

  • Experiment 3 phone sets: Tencent, CSLT, PQ
  • The CSLT, PQ sets are similar (initial-final), with minor difference on Ri. The Tencent set is of phones
  • Test on the same NN structure.
  • CSLT and PQ obtain similar performance, and better than the Tencent set in most test cases
  • On online1 and online2, the Tencent set is a little better.
  • We therefore prefer a phoneset based on initial-finals.


 

CSLT:

map 8: %WER 25.76 [ 3768 / 14628, 131 ins, 436 del, 3201 sub ]

2044 7: %WER 22.63 [ 5259 / 23241, 396 ins, 615 del, 4248 sub ]

notetp3 7: %WER 15.76 [ 292 / 1853, 19 ins, 30 del, 243 sub ]

record1900 11: %WER 5.98 [ 711 / 11888, 37 ins, 270 del, 404 sub ]

general 7: %WER 36.21 [ 13622 / 37619, 543 ins, 1085 del, 11994 sub ]

online1 12: %WER 37.73 [ 10729 / 28433, 634 ins, 2229 del, 7866 sub ]

online2 13: %WER 28.95 [ 17112 / 59101, 1113 ins, 3015 del, 12984 sub ]

speedup 9: %WER 25.71 [ 1351 / 5255, 49 ins, 276 del, 1026 sub ]

 

PQ:

map 9: %WER 24.25 [ 3547 / 14628, 115 ins, 428 del, 3004 sub ]

2044 8: %WER 22.80 [ 5300 / 23241, 425 ins, 665 del, 4210 sub ]

notetp3 9: %WER 16.73 [ 310 / 1853, 34 ins, 28 del, 248 sub ]

record1900 11: %WER 5.88 [ 699 / 11888, 54 ins, 257 del, 388 sub ]

general 8: %WER 36.80 [ 13844 / 37619, 636 ins, 1102 del, 12106 sub ]

online1 14: %WER 37.77 [ 10739 / 28433, 592 ins, 2401 del, 7746 sub ]

online2 13: %WER 28.65 [ 16932 / 59101, 1136 ins, 2965 del, 12831 sub ]

speedup 9: %WER 26.32 [ 1383 / 5255, 66 ins, 273 del, 1044 sub ]

 

Tencent:

map 8: %WER 25.83 [ 3778 / 14628, 157 ins, 486 del, 3135 sub ]

2044 8: %WER 24.51 [ 5697 / 23241, 502 ins, 765 del, 4430 sub ]

notetp3 10: %WER 19.86 [ 368 / 1853, 36 ins, 45 del, 287 sub ]

record1900 12: %WER 7.96 [ 946 / 11888, 50 ins, 378 del, 518 sub ]

general 7: %WER 37.94 [ 14274 / 37619, 537 ins, 1270 del, 12467 sub ]

online1 12: %WER 36.36 [ 10337 / 28433, 495 ins, 2082 del, 7760 sub ]

online2 13: %WER 28.46 [ 16822 / 59101, 893 ins, 2940 del, 12989 sub ]

speedup 10: %WER 28.53 [ 1499 / 5255, 62 ins, 349 del, 1088 sub ]

Engine optimization

  • Investigating LOUDS FST. On progress.


LM development

NN LM

  • Collecting a bigger lexicon: 40k words related to music, 56k words from an official dictionary.
  • Working on NN LM based on word2vector.

Embedded development

  • Narrow and deep small scale NN trained. Investigating some bugs.
  • Embedded stream mode on progress.
  • On-the-fly grammar compiler
  • LG compile is fine
  • CLG compile is fine
  • HCLG compile is slow
  • Working on speed up method.

Speech QA

  • Use N-best to expand match in QA. Better performance were obtained.
  • 1-best matches 96/121
  • 10-best matches 102/121
  • Use N-best to recover errors in entity check. Working on.
  • Use Pinyin to recover errors in entity check. Future work.