From cslt Wiki
Jump to: navigation, search


The modern society demonstrates clear mutual influence among languages, e.g., Mandarin to minor languages in China, and English to other languages in the world. This leads to a clear mixlingual phenomenon, i.e., some words of a foreign (or target, embedded) language are embedded in a host (or source, matrix) language. This mixlingual phenomenon results in a serious problem in speech recognition (ASR).

Based on the success of the first MixASR-CHEN 2016 challenge, the MixASR-CHEN 2017 challenge follows the same theme of Mixlingual ASR. The task is more challenging in several ways:

  • There are more utterances that involve multiple English words in the test data
  • There are more English words that are not in the CMU dictionary
  • There are more English phrases that involve multiple English words

Challenge details

  • The database information is here.
  • The challenge plan is here.
  • The tools that can be used are here.
  • The Kaldi baseline will be soon available here.
  • Registration and result submission here.
  • Participants from both academy and industry are welcome.
  • As the first challenge, we will base the challenge on the OCOCOSDA forum and will release the results on OCOCOSDA 2017. Challenge participants are highly recommended to submit their system as a paper to the conference.

Important date

  • May 15th: training/dev dataset release
  • July 15: OC16-CE80 test data release
  • July 18: OC16-CE80 result submission
  • July 29: Paper submission deadline
  • OC2017: challenge result release


  • Dong Wang (Tsinghua University)
  • Zhiyuan Tang (Tsinghua University)
  • Qing Chen (Speech Ocean),