14-10-19 Bin Yuan

From cslt Wiki
Jump to: navigation, search

Accomplished this week

  • build HCLG using wsj corpus for Liu Rong
  • learn HIT's LTP tools for segment, pos and ner
  • use LTP to process the BaiduHi and BaiduZhidao corpus(total 365G), program is running(total time cost about 3 days, 20 tasks on JieTong grid)
  • make a report about word2vec code

Planned for next week

  • the address-tag list is very large, find appropriate way to reduce the address-tag list size
  • generate high-frequency address-tag list
  • generate tagged corpus