From cslt Wiki
Jump to: navigation, search

AM development

Sparse DNN

  • Optimal Brain Damage(OBD).
  1. GA-based block sparsity

Efficient DNN training

  1. Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?

Multilanguage training

  1. Pure Chinese training reached 4.9%
  2. Chinese + English reduced to 7.9%
  3. English phone set should discriminate beginning phone and ending phone
  4. Should set up multilingual network structure which shares low layers but separate languages at high layers

Engine optimization

  • Decoder RT reached lower than 0.2: HCLG + MKL + icc
  • Investigating LOUDS FST.


  • Using linear hidden transform reduce WER from 14% to 11%.

Word to Vector

  • Test a training toolkit Standford University, which can involve global information into word2vector training
  • C++ implementation (instead of python) for data pre-processing
  • Ready for training 100M data
  • Ready for training word sense
  • Investigating Senna toolkit from NEC. Intending to implement POS tagging based on word vectors.

LM development


  • Word-based and character-based NNLM using google word2vector completed
  • Character-based NNLM completed (6000 characters, 7gram)

3T Sogou LM

  • split the data into 24 sub sets, train 3gram for each set, prune with 1e-9
  • Merge completed with equal weights

Embedded development

  • CLG embedded decoder is almost done. Online compiler is on progress.
  • Zhiyong is working on layer-by-layer DNN training.

Speech QA

  • Use N-best to expand match in QA. Better performance were obtained.
  • 1-best matches 96/121
  • 10-best matches 102/121
  • Use N-best to recover errors in entity check.
  • Design a non-entity pattern to discover the possible place of an entity
  • By this position range, search entities within the N-best result
  • Use Pinyin to recover errors in entity check. Future work.
  • Design a non-entity pattern to discover the possible place of an entity (as above)
  • Match the Pinying strings of all the entities, and then match the pinyin strings with the entity pinyin
  • Keep the most matched entity based on Pinyin with a threshold
  • A bit worse then the original test.
  • A possible problem is that the LM is over-strong, thus lead to unmatched Pinyin string in acoustic space
  • Liu rong will provide a weak LM to support the research.
  • Investigate some errors in entity-based LM.
  • Still some errors exist
  • Running entity-base LM with a small entity list