From cslt Wiki
- LM count files still undelivered!
- Running 1200-3620 NN, graph generation is done. DT training should be done in 3 days.
- Iterative sparse sticky training runs. More sparsity is expected.
- online support
- garbage model training
- VAD optimization
DNN Confidence estimation
- Distribution graph is obtained. The performance seems bad.
- A possible reason is that the decoding is LM-based, and the confidence is only acoustic related. So (1) the errors in linguistic layer are not really errors in the acoustic layer (2) the search will automatically choose the almost-correct phones/states.
- The conclusion is that the DNN confidence is most suitable for grammar-based applications, or at least LM information is not very strong.
- To be done:
- CI phone confidence, on going
- No-tone confidence, on going
- GFCC computing is highly slow. 100 hour speech costs 16 hour cpu time. RT is around 0.2. It is intolerable.
- GFCC-based DNN training for 100 hour speech data is done. Need to test the noise-robust performance in 2 days.
- the code is done. Simple testing is completed.
- Problem 1: CMN initialization is not perfect. Need to train a better initial CMN model.
- Problem 2: balance for posterior-based silence detection.
- G.fst integration is done. Initial test passed. Looks like the zero-probability is better for the NUM class.
- HCLG integration is done. A bug fixed, passed initial test.
- Online integration cost is 1 minute. Need to optimize.
- Need thorough testing with the Tencent test suite.
- Need to tune the subgraph feeding probability.