Difference between revisions of "2013-04-26"

From cslt Wiki
Jump to: navigation, search
Line 47: Line 47:
 
|}
 
|}
  
* Tencent NN structure:  
+
* Note
:300*[1200*1200*1200*1200]*1700, #param=700k
+
:#Tencent NN structure:  
:300*[1007*1007*1007*1007]*3xxx  #param=700k
+
::300*[1200*1200*1200*1200]*1700, #param=700k
:*To be done:
+
::300*[1007*1007*1007*1007]*3xxx  #param=700k
::CSLT reproduce phone-clustered based NN
+
*To be done:
::CSLT investigate performance of different epochs.
+
:#CSLT reproduce phone-clustered based NN
::Tencent: feature comparison.  
+
:#CSLT investigate performance of different epochs.
::FBank with PLP. With or without LDA.
+
:#Tencent: feature comparison.  
 +
:#FBank with PLP. With or without LDA.
  
  

Revision as of 06:43, 26 April 2013

Data sharing

  • LM count files are still in transfering.

DNN progress

400 hour DNN training

Test Set Tencent Baseline bMMI fMMI BN(with fMMI) Hybrid
1900 8.4 7.65 7.35 6.57 7.27
2044 22.4 24.44 24.03 21.77 20.24
online1 35.6 34.66 34.33 31.44 30.53
online2 29.6 27.23 26.80 24.10 23.89
map 24.5 27.54 27.69 23.79 22.46
notepad 16 19.81 21.75 15.81 12.74
general 36 38.52 38.90 33.61 31.55
speedup 26.8 27.88 26.81 22.82 22.00
  • Tencent baseline is with 700h online data+ 700h 863 data, HLDA+MPE, 88k lexicon
  • Our results are with 400 hour AM, 88k LM. ML+bMMI.
  • The CSLT structure: 300*[1200*1200*1200*40*1200]*4850.
  • The CSLT feature: MFCC+delta MFCC
  • To be done: compare with the traditional structure 300*[1200*1200*1200*1200*1200]*4850.

Tencent test result

AM: 70h training data
LM: 88k LM
Test case: general
Feature GMM GMM-bMMI DNN DNN-MMI
PLP(-5,+5) [Eryu] 47 38.4 26.5 23.8
PLP+LDA+MLLT(-5,+5)[Jingbo] 47 - 34
  • Note
  1. Tencent NN structure:
300*[1200*1200*1200*1200]*1700, #param=700k
300*[1007*1007*1007*1007]*3xxx #param=700k
  • To be done:
  1. CSLT reproduce phone-clustered based NN
  2. CSLT investigate performance of different epochs.
  3. Tencent: feature comparison.
  4. FBank with PLP. With or without LDA.


GPU & CPU merge

Investigate the possibility to merge GPU and CPU code.
CUDA code merged to CPU.

L-1 sparse initial training

L-1=1e-5, starting from 6th iteration, converged with another 3 iterations. The performance is generally worse than the case where l1=0, except one test suite.
L-1=1e-6, the same results obtained, means le-6 is too small to be effective.
L-1=1e-4, start from the first iteration. crashed. Need more investigation.

Kaldi/HTK merge

  • HTK2Kaldi: hold.
  • Kaldi2HTK: done with implementation. A bug fixed. gConst was computed in a wrong way. The current HDecode result is 14.9%; The tencent model is 11%; Kaldi decoder 7%.
  • Possibly the SP model issue, due to the complicated structure of silence in Kaldi.

Embedded progress

  • PocketSphinx migration done, using PocketSphinx default Chinese model. After migrating to the smart phone, the test shows that the decoding is very slow. RT=7.0.
  • QA LM training, done.
  • Next substitute the LM with JSGF grammar involving 1000 words. Finish the initial test.
  • Need to train a new AM.