报告题目: Linguistic Representation for DNN-based Speech Synthesis (深度神经网络语音合成中的语言特征表示)


In speech synthesis, input text needs to be converted into a proper format that can be used by acoustic models easily. In traditional HMM-based speech synthesis, linguistic features are normally represented with full context labels, which are discrete values. Now, the DNN-based acoustic model is able to accept continuous input easily. This provides more flexibility for linguistic feature representation. One example is to integrate word embedding into linguistic input. In word embedding, words are represented with low dimensional continuous vectors, which can be learned from text corpus by unsupervised learning. This is especially useful for low-resourced languages, in which the linguistic analysis tools, such as part-of-speech tagger and parser are not available.. In this talk, I will review some of the efforts on linguistic feature representation in the research community, and also talk about our solution for linguistic processing for low-resourced languages. 



Dr.. Minghui Dong is currently a research scientist and the head of Voice Analysis and Synthesis Lab in Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A-Star), Singapore. He serves as the vice-president of Chinese and Oriental Languages Information Processing Society (COLIPS), the editor-in-chief of International Journal of Asian Language Processing (IJALP), and a Member-at-Large of Asian Federation of Natural Language Processing (AFNLP). He received bachelors degree from University of Science and Technology of China (USTC), masters degree from Peking University (PKU), and PhD degree from National University of Singapore (NUS) respectively. He joined I2R in Dec 2004. Prior to that, he also worked as a research engineer in Peking University for 3 years and as a researcher in InfoTalk Technology (Singapore) for 3 years.


His research interests include spoken language processing, natural language processing, language resource building and machine learning methods for language processing. He has co-authored more than 80 research papers in leading conferences and journals. He has been actively contributing to Asian and international research communities by serving as different roles in various conferences and organizations. He has been overseeing the running of the IALP conference series and IJALP journal, which promote the interactions between researchers working on the processing of low-resourced languages. 


He has been working on Text-to-speech (TTS) system for many years. He led the TTS R&D work and built TTS systems for various local languages (English, Chinese, Malay, etc) on various platforms (Cloud, PC, Smartphone, etc). Recently, he is leading the research of natural language understanding for speech synthesis, deep learning technologies for speech and language processing, personalized and expressive speech synthesis, speech synthesis for low-resourced languages, etc. 


