2013-08-23

From cslt Wiki
Jump to: navigation, search

Data sharing

  • LM count files still undelivered!

DNN progress

Discriminative DNN

  • Running 1200-3620 NN, graph generation is done. Training is still running stupidly.

Sparse DNN

  • Iterative sparse sticky training runs.

Tencent exps

DNN Confidence estimation

  • Tested on a high WER test set. The distribution curve is still bizzard, for both correct and incorrect words, a high peak is around zero.
  • Accumulated DNN confidence is on development.
  • Generate lattice-based confidence
  • Prepare MLP-based confidence integration


GFCC DNN

  • GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
  • 100 hour GFCC-based DNN, Tencent test results:

No noise-added:

1,MFCC 100_1200_1200_1200_1200_3580
       map: %WER 23.75 [ 3474 / 14628, 134 ins, 373 del, 2967 sub ]
       2044: %WER 21.47 [ 4991 / 23241, 304 ins, 664 del, 4023 sub ]
       notetp3: %WER 13.17 [ 244 / 1853, 10 ins, 26 del, 208 sub ]
       record1900: %WER 8.10 [ 963 / 11888, 217 ins, 299 del, 447 sub ]
       general: %WER 34.41 [ 12943 / 37619, 779 ins, 785 del, 11379 sub ]
       online1: %WER 33.02 [ 9388 / 28433, 522 ins, 1465 del, 7401 sub ]
       online2: %WER 25.99 [ 15363 / 59101, 873 ins, 2408 del, 12082 sub ]
       speedup: %WER 23.52 [ 1236 / 5255, 72 ins, 213 del, 951 sub ]
       ----
2,GFCC 100_1200_1200_1200_1200_3625
       map: %WER 22.95 [ 3357 / 14628, 109 ins, 471 del, 2777 sub ]
       2044: %WER 20.93 [ 4865 / 23241, 387 ins, 748 del, 3730 sub ]
       notetp3: %WER 15.43 [ 286 / 1853, 41 ins, 26 del, 219 sub ]
       record1900: %WER 7.32 [ 870 / 11888, 107 ins, 266 del, 497 sub ]
       general: %WER 31.57 [ 11878 / 37619, 587 ins, 861 del, 10430 sub ]
       online1: %WER 31.83 [ 9049 / 28433, 519 ins, 1506 del, 7024 sub ]
       online2: %WER 25.20 [ 14894 / 59101, 839 ins, 2434 del, 11621 sub ]
       speedup: %WER 22.97 [ 1207 / 5255, 73 ins, 221 del, 913 sub ]
       ----

White noise added into the test data:

1,NOISE LEVEL:about 15db
  1) MFCC 100_1200_1200_1200_1200_3580
    map: %WER 65.24 [ 9544 / 14628, 48 ins, 2841 del, 6655 sub ]
    2044: %WER 48.93 [ 11372 / 23241, 176 ins, 2803 del, 8393 sub ]
    notetp3: %WER 55.91 [ 1036 / 1853, 9 ins, 476 del, 551 sub ]
    record1900: %WER 25.43 [ 3023 / 11888, 27 ins, 1387 del, 1609 sub ]
    general: %WER 70.05 [ 26352 / 37619, 141 ins, 5336 del, 20875 sub ]
    online1: %WER 50.40 [ 14329 / 28433, 431 ins, 3827 del, 10071 sub ]
    online2: %WER 48.45 [ 28632 / 59101, 664 ins, 7930 del, 20038 sub ]
    speedup: %WER 64.78 [ 3404 / 5255, 13 ins, 1084 del, 2307 sub ]
    ----
  2)GFCC 100_1200_1200_1200_1200_3625
    map: %WER 62.99 [ 9214 / 14628, 63 ins, 3113 del, 6038 sub ]
    2044: %WER 46.34 [ 10769 / 23241, 251 ins, 2897 del, 7621 sub ]
    notetp3: %WER 52.46 [ 972 / 1853, 18 ins, 545 del, 409 sub ]
    record1900: %WER 26.62 [ 3164 / 11888, 133 ins, 1181 del, 1850 sub ]
    general: %WER 66.04 [ 24843 / 37619, 404 ins, 5277 del, 19162 sub ]
    online1: %WER 46.61 [ 13254 / 28433, 466 ins, 3725 del, 9063 sub ]
    online2: %WER 44.49 [ 26292 / 59101, 813 ins, 7552 del, 17927 sub ]
    speedup: %WER 60.38 [ 3173 / 5255, 25 ins, 1061 del, 2087 sub ]

  • GFCC is generally better than MFCC, particularly with noise
  • noise impact is significantly high. Need de-noise algorithms
  • Try noise-robust training

Stream decoding

  • The interface for server-side is done. For embedded-side is on development.

To do:

  • global CMN initialization.


Subgraph integration

  • Compress subgraph HCLG is done. The integration is around 1-2 seconds.
  • G.fst integration encounters a problem: after G+L, determinization is halted.


Embedded progress

  • GFCC-based engine test. Just started.