2013-10-11

From cslt Wiki
Jump to: navigation, search

Data sharing

  • LM count files still undelivered!

DNN progress

Sparse DNN

  • Optimal Brain Damage(OBD). Code ready. Looking for testing.

Tencent exps

N/A


Noisy training

  • Dirichlet noise random corruption done. Performance show significant improved with noisy test.
  • The impact on clean speech is various, some test cases obtained even better performance compared with normal training, e.g., online1 and rec1900.


Continuous LM

1. SogouT 3T data cleaning up. Keep on running. The initial results with 7G training text in terms of PPL:

  • SogouQ test: 292
  • Tencent online1: 578
  • Tencent online2: 475


This means the SogouQ text is significantly different from the online1 and online2 Tencent set, due to the different domain.

2. NN LM

Split the most frequent 10k words into 10 x 1024 sub sets and model in 10 networks.

  • Training data: QA 500M text
  • Test data: Tencent online2
  • Dev data: Tencent online1
short_list   cslm_ppl  cslm_sum  n-gram_sum  all_ppl coverage
  0   -1023    12.12     39.7%     60.30%      122.54   58.86 
  1024-2047    1.75      6.56      93.44       118.92   11.35
  2048-3071    1.41      3.75      96.25       117.16    6.41
  3072-4095    1.23      2.17      97.83       116.24    4.27
  4096-5119    1.26      2.24      97.76       116.13    3.10
  5120-6143    1.18      1.69      98.31       116.82    2.38
  6144-7167    1.15      1.22      98.78       117.19    1.85
  7168-8191    1.13      1.13      98.87       117.34    1.50
  8192-9217    1.07      0.58      99.42       116.06    1.23
  9218-10241   1.06      0.44      99.56       115.86    1.03
  n-gram baseline:                 100%        402
note:coverge  -- the proportion of short-list frequence in the train data
     cslm_sum --  the percent of word predicted by cslm
     n-gram_sum-- the percent of word predicted by n-gram
     cslm_ppl  -- the short-list ppl calculated by clsm


3. CSLM to N-gram failed (with threshold=1e-5), due to the large number of n-grams expanded from the network. So the expansion approach is not suitable. This is something reasonable since the network is highly compacted.

4. Keep on lattice re-scoring with multiple CSLM networks.