From cslt Wiki
Jump to: navigation, search

Resoruce Building

  • Current text resource has been re-arranged and listed

AM development

Sparse DNN

  • Optimal Brain Damage(OBD).
  1. GA-based block sparsity

Efficient DNN training

  1. Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?

Multilanguage training

  1. Pure Chinese training reached 4.9%
  2. Chinese + English reduced to 7.9%
  3. English phone set should discriminate beginning phone and ending phone
  4. Should set up multilingual network structure which shares low layers but separate languages at high layers

Noise traing

  • Train with wsj database by corrupting data with various noise types
  • baseline system ready
  • noise data ready, selected 5 noise which is noise in reality
  • Liuchao's noise-adding toolkit ready

Engine optimization

  • Investigating LOUDS FST.


  • Tested adaptation performance with adapted utterances from 10 to 40.

Word to Vector

  • Test a training toolkit Standford University, which can involve global information into word2vector training
  • C++ implementation (instead of python) for data pre-processing, problem encountered
  • Basic wordvector plus global sense
  • Training 100M data (with global sense), memory overflow
  • Split the data into small pieces
  • Improved wordvector with multi sense
  • Prepare scripts
  • Keyword extraction based on wordvectors
  • Using google word vectors
  • Using k-mean to cluster
  • Investigating Senna toolkit from NEC. Intending to implement POS tagging based on word vectors.

LM development


  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • 3hours per iteration
  • For word-based NNLM, 1 hour/iteration for 1024 words, 4 hours/iteration for 10240 words
  • Performance lower than word-based NNLM
  • WordVector-based word and char NNLM training done
  • Google wordvecotr-based NNLM is worse than random initialized NNLM

3T Sogou LM

  • Naive training
  • all-word in lexicon
  • split into 9G text blocks
  • Merge one-by-one
  • Cutting to 110k lexicon
  • Test on QA
  • Performance reduced compared to Liurong's previous LM
  • Improved training
  • re-segmentation by Tencent 110k lexicon
  • re-train with 4G text blocks
  • sub-model training done, ready for merge based Tencent online1 test set.

Embedded development

  • CLG embedded decoder is almost done. Online compiler is on progress.
  • Zhiyong is working on layer-by-layer DNN training.

Speech QA

  • Current N-best results
  • N-best search plus pinyin correction
  • Total 2718 QA requests
  • default 1844 QA correct
  • no-entity 1650 QA correct
  • with-entity 1884 QA correct
  • Analyze error patterns for Nbest match
  • 10.8% song transcriptions errors
  • 18.3% English error
  • 38.7% entity (song name, singer name) recognition lost
  • 32.3% non-entity recognition error
  • Computing complexity
  • 11000 entity has 23000 different pronunciations
  • Use tree to improve efficiency
  • Entity-class LM comparision
  • re-segmentation & re-train
  • SRILM class-based LM
  • Subgraph integration from Zhiyong