1.Extract the corpus of related areas from the original corpus by keyword.
2.Mark the pinyin for the keyword list.
1. Testing ppl of each sentence from the original corpus and extracting sentences of less than a specific ppl form a new training set.
2. Train language model by using new training set and test the ppl.