From cslt Wiki
ASR Kernel development
- full-lab training is ready. Trained the first full-lab system with 16k/pseduo 48k data.
- re-recording 48k data using F00 (500 sentences) and retrain the model. The quality of the signal sounds better, while the quality of pitch is a bit strange. Need more investigation on parameter settings.
- Check the signal parameters and solve the problem of pitch.
- Prepare the large data training with both all-F 863 data.
- Prepare the large data training with online novel.
- The search system migrated to the custom domain, with significant performance reduction
Customs: n TF TFIDF 1 0.496 0.485 2 0.619 0.615 3 0.676 0.673 4 0.713 0.715 5 0.740 0.738 Agriculture: n TF TFIDF 1 0.75 0.8 2 0.85 0.883 3 0.867 0.917 4 0.867 0.95 5 0.95 0.967
- Two problems:
- short of semantic cluster.
- limited training data for idf.
- Next week
- Analyse the QA database, to extract useful domain dependent data
- Analyse the data to expand the key words & phrases
- Analyse the data to attain better IDF.
- Be familiar with the dragon system. Combing the system and extract the summary-only code.
- Sentence based summary done. But request to migrate to Chinese.
- Start to build the textrank-based keyword extraction. Re-write the Lexrank code to handle word level similarity matrices.
- Test data set: 100 articles
- TextRank Done
- Start to work on the self coding, while some requests have not been considered.
- consider if to use the standard FSM toolkit by next Tuesday.