From cslt Wiki
- Optimal Brain Damage(OBD).
- Online OBD held.
- OBD + L1 norm start to investigation.
- Efficient computing
- Conducting rearrangement the matrix structure and compose zero blocks by some smart approaches, leading to better computing speed.
Efficient DNN training
- Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
- Fbank feature used to train GMM+DNN, leads to very high training Acc, but reduces accuracy on test.
- Ch/En training with concatenated phone set is completed.
- Initial test seems reasonable on Chinese. A bit worse than the original test
- Need to compare the two systems both on Fbank
- Need to extend the state number
- Investigating LOUDS FST. On progress.
- Training character-based NN LM, 12134 Chinese chars
- Prepare data for training word2vector on Gigawords CHS 4.0
- CLG embedded decoder is almost done. The graph compilation is highly fast.
- Work on layer-by-layer DNN training, initial model is incorrect.
- Use N-best to expand match in QA. Better performance were obtained.
- 1-best matches 96/121
- 10-best matches 102/121
- Use N-best to recover errors in entity check.
- Design a non-entity pattern to discover the possible place of an entity
- By this position range, search entities within the N-best result
- Use Pinyin to recover errors in entity check. Future work.
- Design a non-entity pattern to discover the possible place of an entity (as above)
- Match the Pinying strings of all the entities, and then match the pinyin strings with the entity pinyin
- Keep the most matched entity based on Pinyin with a threshold
- A bit worse then the original test.
- A possible problem is that the LM is over-strong, thus lead to unmatched Pinyin string in acoustic space
- Liu rong will provide a weak LM to support the research.
- Investigate some errors in entity-based LM.
- Still some errors exist
- Running entity-base LM with a small entity list