From cslt Wiki
Revision as of 05:44, 19 February 2014 by Lxs (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
name type size dir description
863 speech 51h, 76spk corpora/863 863 reading speech database. 16k,16bit
emotion speech 22h corpora/emotion emotional speech for SID, recorded in CSLT. 16k,16bit
callhome text 9.02Mb corpora/callhome callhome chinese speech database transcription
tcmsd speech 34h,60spk corpora/tcmsd speech database recorded in Tsinghua, 2002. 16k, 16bit
timit speech 5.4h corpora/timit English timit database
gigaword text 668MW corpora/chinese_gigaword Gigaword text for Chinese
ulgur speech&text xju: 141h (tr. 136h) xjnu: 8.54h corpora/ulgur ulgur speech and text data
tvboard speech - corpora/tvboard tv and broadcast no-transcribed archieve
weibo text 10Gb corpora/weibo English weibo text data
qa text 124Gb corpora/qa QA text data
pvad speech 5.4h corpora/puqiang/VAD speech data for VAD, from Pachira
ppoi speech 208h corpora/puqiang/poi 8k telephone speech in poi from Pachira
T400 speech 400h corpora/tencent speech data from Tencent
dt700 speech 700h corpora/tencent/dt700 700 hour reading speech data
legend-vod speech - corpora/legend-vod some test speech and vod
mobil-eng speech 26h corpora/lenvxx/data/wav/mobil-eng english speech of chinese people
legend-online speech 54h corpora/lenvxx/data/wav/real-online online speech data
legend-wakeup speech 1h corpora/lenvxx/data/wav/wake-up wake up test speech
legend-reading speech 21h corpora/lenvxx/data/wav/haitian reading speech
legend-sel-for-test speech 21h corpora/lenvxx/data/wav/sel_for_test reading speech
POI-lexicon lexicon - corpora/lenvxx/data/lexicon lexicon for POI applications
NLPR lexicon,categories - corpora/lenvxx/data/text/nlpcorpus resources of NLP tasks
serviceT text - corpora/lenvxx/data/text/service_text text recorded from online service
sougouText text - corpora/sogou sogouQ and sogouT
wsj speech 100h corpora/wsj wall-street journal speech db
hownet lexicon - corpora/hownet HowNet relation db
casia speech 4000 u corpora/tts/casia male TTS speech
huilan-tts speech 2000 u corpora/tts/huilan male/female TTS speech from Huilan
tts-novel speech 20h corpora/tts/novel speech data download from internet for tts
Sinovoice-tel speech 470h+300h corpora/sinovoice/tel telephone speech data from Sinovoice
Sinovoice-16k speech 6000h corpora/sinovoice/16k mobile 16k speech data from Sinovoice