2014-06-03

From cslt Wiki
Jump to: navigation, search

Resoruce Building

  • Release management has been started

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity (+++++)

Noise training

  • All experiments completed.
  • Paper writing will be started this week

GFbank

  • Test on Tencent database is done. Better performance observed than Fbank
  • Equal-loudness pre-filter added, slightly better performance was obtained
  • Running into Sinovoice 8k 1400 + 100 mixture training. 9 xEnt iteration completed.


Multilingual ASR

  • Multilingual LM decoding
  • non-tag bug investigation with some digital string recordings
  • Revert to hanzi numbers


English model


(state-gauss = 10000 100000, various LM, beam 13)

1. Shujutang 100h chi-eng 16k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  23.86  |  20.95  |  20.90  |  20.84  |  20.81  |
   cmu   |  22.22  |    -    |    -    |    -    |  18.83  |
   giga  |  21.77  |    -    |    -    |    -    |  18.61  |
  armid  |  20.45  |    -    |    -    |    -    |    -    |


2. Shujutang 100H chi-eng 8k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  26.27  |  23.63  |  23.14  |  22.93  |  23.00  |
   cmu   |  24.11  |    -    |    -    |    -    |  20.36  |
   giga  |  23.11  |    -    |    -    |    -    |  20.11  |
  armid  |    -    |    -    |    -    |    -    |    -    |


3. voxforge pure eng 16k:

  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  21.38  |  24.89  |  24.50  |  23.31  |  23.13  |
   cmu   |  24.00  |    -    |    -    |    -    |  21.33  |
   giga  |  18.75  |    -    |    -    |    -    |  22.45  |
  armid  |    -    |    -    |    -    |    -    |    -    |

4. fisher pure eng 8k:
Not finish yet.
  LM/AM  |  xEnt   |  mpe_1  |  mpe_2  |  mpe_3  |  mpe_4  |
--------- --------- --------- --------- --------- ---------
   wsj   |  40.65  |  36.16  |  35.94  |  35.88  |  35.80  |
   cmu   |  35.07  |    -    |    -    |    -    |  31.16  |
   giga  |  41.18  |    -    |    -    |    -    |  36.23  |
  armid  |    -    |    -    |    -    |    -    |    -    |


Denoising & Farfield ASR

  • Investigating DAE model
  • Kaldi-based MSE obj training toolkit preparation

VAD

  • DNN-based VAD (7.49) showers much better performance than energy based VAD (45.74)
  • Need to test small scale network (+)
  • 600-800 network
  • 100 X 4 + 2

Scoring

  • Bug for the stream mode fixed


Embedded decoder

  • word list graph test passed
  • wlist2LG toolkit checked in
  • Prepare to deliver Android compiler options (.mk)
  • Interface design should be completed in one day
  • Prepare HCLG for 20k LM, decoding on progress.


LM development

Domain specific LM

  • Retrieve both Baidu & microblog
  • PPL testing
  • Need to check into gitLab.

NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Inconsistent pattern in WER were found on Tenent test sets
  • probably need to use another test set to do investigation.
  • Investigate MS RNN LM training