From cslt Wiki
Jump to: navigation, search

Resoruce Building

  • Maxi onboard
  • Release management should be started: Zhiyong (+)
  • Blaster 0.1 & vivian 0.0 system release

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (++)
  • Found a paper in 2000 with similar ideas.
  • Try to get a student working on high performance computing to do the optimization

Noise training

  • With-clean training done. Much better on clean testing
  • Experiments done. Prepare paper.


  • GFBank sinovoice 1400 MPE stream
  • GFBank sinovoice 6000 MPE stream

Multilingual ASR

  • MPE-based training is not very sensitive to data imbalance for English & Chinese
  • Data duplication can trade-off the performance of two languages
  • Test sharing shemes

Denoising & Farfield ASR

  • Baseline: close-talk model decode far-field speech: 92.65
  • Will investigate DAE model.


  • VAD bug fixed???
  • Test frame VAD accuracy


  • Phone-sequence based graph decoding done
  • online scoring on going

Word to Vector

  • Paper writing

LM development


  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training


FST-based matching

  • Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
  • THRAX toolkit for grammar to FST
  • Investigate determinization of G embedding
  • Refer to Kaldi new code