From cslt Wiki
Jump to: navigation, search

Resoruce Building

  • Maxi onboard
  • Release management should be started: Zhiyong (+)
  • Blaster 0.1 & vivian 0.0 system release

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (+)
  • Found a paper in 2000 with similar ideas.
  • Try to get a student working on high performance computing to do the optimization

Noise training

  • More experiments with no-noise (+)
  • More experiments with additional noise types (+)

AMR compression re-training

  • Stream model deliver to wechat server (Mengyuan + Liuchao)


  • GFBank Sinovoice test on 1700 MPE (10.34-10.14)
  • GFBank sinovoice 1700 MPE stream

Multilingual ASR

  • all phone strategy baseline done
  • Some strange behavior observed when fixing early-leyers click here

Denoising & Farfield ASR

  • Baseline: close-talk model decode far-field speech: 92.65
  • MPE1: 92.78
  MPE2:  91.15
  MPE3:  91.21
  MPE4:  91.51
  • Will test the result on the dev set


  • VAD bug on smoothing approach was found.


  • A speaker identification system based on ivector was delivered
  • Male/female identification based on UBM was delievered
  • Phone-sequence based graph decoding was delivered

Word to Vector

  • Dimension of low space varies from 10-100 done. Expand to 200 dimensions. Some strange behavior was found on w2v. Try on daily people data.
  • Test multi-classification from 2-9. w2v done. Work on lda.
  • Test on various w2v window-size n=3-15. Strange behavior at n=9. click here

LM development


  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Overlfow found. Code change done. Run into 6 iterations.
  • Investigate MS RNN LM training


FST-based matching

  • Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second.
  • THRAX toolkit for grammar to FST
  • Investigate determinization of G embedding
  • Refer to Kaldi new code