From cslt Wiki
Jump to: navigation, search

Data sharing

  • LM count files still undelivered!

DNN progress

Discriminative DNN

  • Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.

Sparse DNN

  • Iterative sparse sticky training runs. More sparsity is expected.

Tencent exps

  • online support
  • garbage model training
  • VAD optimization

DNN Confidence estimation

  • Distribution graph is obtained. The performance seems bad.
  • A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
  • The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.
  • To be done:
  1. CI phone confidence, on going
  2. No-tone confidence, on going


  • GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
  • GFCC-based DNN training for 100 hour speech data is done. Need to test the noise-robust performance in 2 days.

Stream decoding

  • the code is done. Simple testing is completed.
  • Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
  • Problem 2: balance for posterior-based silence detection.

Subgraph integration

  • G.fst integration is done. Initial test passed. Looks like the zero-probability is better for the NUM class.
  • HCLG integration is done. A bug fixed, passed initial test.
  • Online integration cost is 1 minute. Need to optimize.
  • Need thorough testing with the Tencent test suite.
  • Need to tune the subgraph feeding probability.

Embedded progress

  • GFCC-based engine test. Just started.
  • Attain a performance curve: RT,memory size,package size Vs vocabulary size.
  • A new demo released for 4600 song names. download here