(Luo Ling 2015-08-17)
Works in This week:
1.Finish training word embeddings via 5 models :
using EnWiki dataset(953M):
using text8 dataset(95.3M):
CBOW,Skip-Gram(SG),C&W,GloVe,LBL and Order(count-based)
2.Use tasks to measure quality of the word vectors with various dimensions:
word similarity(ws)
the TOEFL set
analogy task
text classification
named entity recognition(ner)
sentence-level sentiment classification (based on convolutional neural networks),just call it 'cnn'
part-of-speech tagging(pos)

