Last week:

Improved corpora proprecessing tools (http stripper, num2hanzi), and reprocessed weibo corpora

learned cross-entropy difference based domain specific corpora extraction method.

recorded voice of numbers for testing

This week:

Train new lm with new corpora (weibo)

Compare new in-domain corpora selection method and old topic spotting based method