A progress report of the Taiwan Mandarin radio speech corpus project

Yuan Fu Liao*, Yung Hsiang Shawn Chang, Sing Yue Wang, Jhih Wei Chen, Sheng Ming Wang, Jenq Haur Wang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

2 Scopus citations

Abstract

The Taiwan Mandarin Radio Speech Corpus contains 300 (and growing) hours of high-quality recordings selected from Taiwan's National Education Radio (NER) archive. The corpus features speech (of various speaking styles, produced by hundreds of speakers) and their corresponding transcriptions (automatically transcribed and manually corrected) and annotations, which are suitable for speech and language research. In this paper, we report the progress of the corpus development and especially show the experimental results of audio event detection/segmentation and semi-supervised acoustic model training on this corpus.

Original languageEnglish
Title of host publication2017 20th Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1-6
Number of pages6
ISBN (Electronic)9781538633335
DOIs
StatePublished - 13 Jun 2018
Event20th Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2017 - Seoul, Korea, Republic of
Duration: 1 Nov 20173 Nov 2017

Publication series

Name2017 20th Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2017

Conference

Conference20th Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2017
Country/TerritoryKorea, Republic of
CitySeoul
Period1/11/173/11/17

Keywords

  • audio event detection
  • Mandarin speech corpus
  • semi-supervised training

Fingerprint

Dive into the research topics of 'A progress report of the Taiwan Mandarin radio speech corpus project'. Together they form a unique fingerprint.

Cite this