140428-Xiaoxi Wang

This week:

preprocessed the baiduzhidao and part of weibo data.

wrote a Hanzi2Num tool

sampled corpora from weibo and baiduzhidao (4.4G) and grabbed the keywords from them

classified corpora according to keywords.

Next week:

Train and evaluate lm from classified corpora

make improves on algorithms