2014-04-11

From cslt Wiki
Jump to: navigation, search

Resoruce Building

  • Current text resource has been re-arranged and listed

Leftover questions

  • Asymmetric window: Great improvement on training set(WER 34% to 24%), however the improvement is lost on test. Overfitting?
  • Multi GPU training: Error encountered
  • Multilanguage training
  • Investigating LOUDS FST.
  • CLG embedded decoder plus online compiler.
  • DNN-GMM co-training

AM development

Sparse DNN

  • GA-based block sparsity
  • Found a paper in 2000 with similar ideas.
  • Try to get a student working on high performance computing to do the optimization


Noise training

  • More experiments with no-noise
  • More experiments with additional noise types


AMR compression re-training

  • 1700h MPE adaptation done
  • 1700h stream mode adaptation runs into MPE1

GFbank

  • Significant improvement found with GFBank
  • Significant improvement found with FBank + GFBank

Denoising & Farfield ASR

  • Recording done
  • Prepare to construct the baseline


VAD

  • Code ready, need to figure out speech/no-speech smooth

Farfield recognition

Scoring

  • g-score based on MLP is done
  • t-score based on linear regression improves the performance


Word to Vector

  • LDA baseline (sogou 1700*9 training set) done
  • Wordvector classification is much better than the LDA system
word vector: 
           general: dict - 15w;   train_data - ren_ming_ri_bao(5g);  windows-5
           1. size - 50  time=30m 12thread
           2. size - 10  time=10m 12thread

data: class_num=9  document_num=9*2000
      train_num =9*1600
      test_num  =9*200
       dev_num  =9*200

train_set:
                   C000008  C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024     total 
                       财经    IT      健康     体育      旅游   教育      招聘     文化    军事   
       lda_inf      0.845   0.2756  0.698   0.9502   0.63499  0.32   0.8080   0.3505 0.864    0.6385
      lda_inf_10    0.8149  0.0887  0.628   0.9641   0.5739   0.105  0.707363 0.2334 0.8628   0.553167
 w2v_filter_filer   0.7463  0.713   0.657   0.9106   0.68659  0.54   0.74638  0.692  0.84518  0.72638
w2v_filter_filer_10 0.7608  0.4323  0.57394 0.865    0.549    0.335  0.577    0.6129 0.78099  0.609769

test_set:
                     C000008  C000010 C000013 C000014 C000016 C000020 C000022 C000023 C000024   total 
                       财经    IT      健康     体育      旅游   教育      招聘     文化    军事  
w2v_filter_filter    0.6865   0.7263   0.6716  0.84577 0.7462  0.46268 0.6567  0.7114  0.8905    0.71088
w2v_filter_filter_10 0.791    0.4079   0.56218 0.74129 0.62189 0.22885 0.562   0.6766  0.84079   0.603648
    lda_inf          0.8706   0.26368  0.6965  0.8009  0.582   0.2537  0.72139 0.3184  0.82587   0.59259
   lda_inf_10        0.776    0.1044   0.6467  0.9054  0.62189 0.1144  0.56218 0.24378 0.796     0.530127

note:w2v_filter--remove the stop word in traing word vector
note:w2v_filter_filter  -- remove the stop word in traing word vector and remove the documnet stop words

LM development

NN LM

  • Character-based NNLM (6700 chars, 7gram), 500M data training done.
  • Non-boundary char LM is better than boundary char LM
  • Investigate MS RNN LM training


QA

FST-based matching

  • Word-based FST 1-2 seconds with 1600 patterns. Huilan's implementation <1 second. ?????
  • Char-FST Implementation is done.


Speech QA

  • Investigate determinization of G embedding