140512-Xi Ma

From cslt Wiki
Jump to: navigation, search

Last Week:

1.Extract the corpus of related areas from the original corpus by keyword.

2.Mark the pinyin for the keyword list.

This Week:

1. Testing ppl of each sentence from the original corpus and extracting sentences of less than a specific ppl form a new training set.

2. Train language model by using new training set and test the ppl.