name size dir description
SogouQ.full.train.3gram.gz 132M /work/lxs/nlphome/lm/SogouQ-500M trainData=SougouQ(800M);dict=11w-tecent
SogouT-11w-merge2-1.3gram.gz 4.1G /work/lxs/nlphome/lm/SogouT-140G trainData=SougouT(140G);dict=11w-tencent
SogouT-11w-merge2-2.3gram.gz 3.9G /work/lxs/nlphome/lm/SogouT-140G
8w8.3gram.tencent.gz 452M /work/lxs/nlphome/lm/Tencent
musicQuery-ltc.3gram.gz 28M /work/lxs/nlphome/lm/TencentQ/musicQuery use qa15w-singer-songs.wordlist
TencentQ.3gram.gz 1.4G /work/lxs/nlphome/lm/TencentQ/qa15w use qa15w.lexicion
mix-corp1-corp2.3gram.gz 1.3G /work/lxs/nlphome/lm/TencentQ/qa15w-nosinger-song use qa15w-nosinger-song.wordlist
mix-corp1_0.5-corp2_0.5.3gram.gz 1.4G /work/lxs/nlphome/lm/TencentQ/qa15w-singer-song use qa15w-singer-song.wordlist
11w_merge6_kn.3gram.gz 4.3G /work/lxs/nlphome/lm/TencentQA-100G trainData=qa(100G),dict=11w-tencent
8w8_new_merge6_kn.3gram0.gz 4.5G /work/lxs/nlphome/lm/TencentQA-100G trainData=qa(100G),dict=8w8-tencent
Hunhe_zhongzi_and_add_and_PPL_5yuan_3e9.lm.utf8.1e-5.3gram.gz 1.4M /work/lxs/nlphome/lm/jietong
Hunhe_zhongzi_and_add_and_PPL_5yuan_3e9.lm.utf8.1e-9.5gram.gz 389M /work/lxs/nlphome/lm/jietong

lexicion wordlist

name size dir description
singer.lexicion 74k /work/lxs/nlphome/dict/lex-wordlist/music/lr
singer.low.lexicion 74k /work/lxs/nlphome/dict/lex-wordlist/music/lr
singer.pinyin 44k /work/lxs/nlphome/dict/lex-wordlist/music/lr
song.lexicion 255k /work/lxs/nlphome/dict/lex-wordlist/music/lr
song.low.lexicion 255k /work/lxs/nlphome/dict/lex-wordlist/music/lr
song.pinyin 167k /work/lxs/nlphome/dict/lex-wordlist/music/lr
qa15w-ch-sinovoice.lexicion 2.9M /work/lxs/nlphome/dict/lex-wordlist/qa-check
qa15w-ch.pinyin 1.7M /work/lxs/nlphome/dict/lex-wordlist/qa-check
qa15w.lexicion 4.9M /work/lxs/nlphome/dict/lex-wordlist/qa-check
11w.lexicion 3.8M /work/lxs/nlphome/dict/lex-wordlist/tencent
8w8.lexicion 2.5M /work/lxs/nlphome/dict/lex-wordlist/tencent

nolexicion wordlist

name size dir description
singer.wordlist 19k /work/lxs/nlphome/dict/nolex-wordlist/music/lr
song.wordlist 68k /work/lxs/nlphome/dict/nolex-wordlist/music/lr
album.txt 227k /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
area.txt 32bit /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
chart.txt 336bit /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
drama.txt 7.2k /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
language.txt 343bit /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
singer.txt 42k /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
stopwords.txt 6.1k /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
song.txt 408k /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
style.txt 6.6k /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
type.txt 18bit /work/lxs/nlphome/dict/nolex-wordlist/music/ltc
entity.txt 590k /work/lxs/nlphome/dict/nolex-wordlist/music/ltc merge album area chart drama language singer song stopwords style type
qa15w.wordlist 1.2M /work/lxs/nlphome/dict/nolex-wordlist/qa-check
11w.wordlist 888k /work/lxs/nlphome/dict/nolex-wordlist/tencent
8w8.wordlist 666k /work/lxs/nlphome/dict/nolex-wordlist/tencent
scws20w-utf8.wordlist 6.5M /work/lxs/nlphome/dict/nolex-wordlist



description:I settle the data in /nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus

(in this directory,it include 4 subdirectory:ChinaDivision , dict , dict4VOD , document Resource)
  1. Directory:/nfs/corpus/data/corpora/lenvxx/data/text/nlpcorpus/nlp_corpus/dict
    1. include directory :sogou-dict
  • 城市信息:include many provinces' data about the cities' names and places' names in the province,and some localisms,and some cities' information about bus station and the streets' name
  • 电子游戏
  • 单机游戏:include the console games' name from 2001 to 2011,and some game's wordlist.
  • 网游:include the online games' name from 2008 to 2011 and some game's wordlist.
  • 工程与应用科学:include the specialized vocabulary wordlists in project field.
  • 计算机:include the specialized vocabulary wordlists in computer field,and Alibaba's product vocabulary in many fields.
  • 农林鱼畜:include the wordlist about livestock and agriculture.
  • 人文科学
  • 文学:include the wordlist about ancient Chinese literature and masterwork,and some novels' wordlist.
  • 语言:include the wordlists about idiom and Folklore,Network buzzwords.
  • 哲学:include the wordlists about philosophy.for instance,Hegel,Marxism.
  • 宗教:include the wordlists about Taoism,Buddhism,Islam
  • 历史:include the wordlists about the history about Chinese,and Japanese's warring states period,diplomacy.
  • 其他:include the wordlist about the ancient Chinese numerology.
  • 社会科学
  • 法律:include the wordlists about law.
  • 教育:include the wordlists about some universities' architecture,and some wordlist about textbook,list of Chinese univercity and America famous univercity.
  • 金融:include the wordlists about wordlist about financial.
  • 军事:include the wordlists about military.
  • 政治:include the wordlists about Party and government offices,political,and ancient China Official institutions
  • 其他:include the wordlists about public relations,ethics,anthropology
  • 生活:include the wordlists about many fields in our lief.
  • 医学:include the wordlists about medical science.
  • 艺术
  • 书法篆刻:include the wordlists about sculpture and calligraphy.
  • 舞蹈:include the wordlists about dance and Gymnastics Rhythmic.
  • 戏剧:include the wordlists about drama.
  • 音乐:include the wordlists about music major in Chinese and the west.
  • 其他:include the wordlists of tea,sculpture,er ren zhuan,world heritage,artist.
  • 娱乐
  • 电影电视:include the wordlists about science fiction film.
  • 动漫:include the wordlists about some cartoons.
  • 流行音乐:include the wordlists about a novel of A Song of Ice and Fire,fashionable word or phrase.
  • 明星:include the wordlists about some famous person.
  • 汽车:include the wordlists about car field.
  • 收藏:include the wordlists about advertisement.
  • 时尚品牌:the directory is empty.
  • 运动休闲
  • F1赛车:the directory is empty.
  • 奥运:include the wordlists of Olympic.
  • 垂钓:include the wordlists of fishing.
  • 轮滑:include a wordlist of roller skating.
  • 棋牌:include the wordlists about mahjong,go,chinese chess,san guo sha.
  • 气功:include the wordlists about qigong.
  • 球类:include the wordlists about football,basketball,ping-bang ball,golf,badminton.
  • 杀人游戏:the directory is empty.
  • 跆拳道:include the wordlists of taekwondo.
  • 太极拳:include the wordlists of ba gua,tai ji quan.
  • 武术:include the wordlists of wu shu.
  • 自行车:the directory is empty.
  • 其他:include the wordlists about fencing,judo,wrestling,yoga.
  • 自然科学
  • 化学:include the wordlists of chemistry.
  • 生物:include the wordlists of biology.
  • 数学:include the wordlists of math.
  • 天文学:include the wordlists of astronomy.
  • 物理:include the wordlists of physics.
  • 其他:include the wordlists of stone.
    1. include directory :movie(include many wordlists about movie major)