Text analysis is a preprocessing unit of a Chinese Text-to-speech system. It provides linguistic features such as word, part-of-speech (POS), base phrase (BP), and pinyin (or pronunciation), for the following processes of prosody generation (PG) and speech synthesis (SS). This thesis focus on construction of a grapheme to phone system which converts written-form words to spoken-form words (SFWs), and the associated pinyins and tones. To this end, a set of technologies for grapheme to phone is reported, including text normalization (TN), polyphone disambiguation (PD), and tone sandhi labeling (TSL).
For text normalization, this paper proposes two methods: 1) the Keywords Search scheme and 2) the Words Class analysis method. In the Keywords Search method, since spoken-forms of Non Standard Words (NSWs) can be implied by semantics of the current sentence, and semantics of a sentence can be implied by some keywords, we can utilize keywords to convert NSWs to SFWs. Hence, the rules that map from pairs of keyword sets and NSWs to SFWs are created. However, the Keyword Search method requires linguistic expertise to make numerous handcrafted rules, resulting in a very time consuming process to exhaustedly list rule-related keywords. To tackle the above-mentioned shortcomings, the Words Class Analysis method is hence proposed to improve both the efficiency for system construction and system accuracy.
For polyphonic character disambiguation, we constructed the database and advance data driving model combine dictionary hybrid system method. If the word contains more than one character, the found pinyin in the dictionary give the pinyin by dictionary. And the other for the words which cannot be found in the dictionary; we can used the CRF model to get the correct answer.
For tone sandhi, we are not only processed consecutive tone three sandhi, 一 and 不, but also constructed the tone sandhi database. And used words, POS and tone to build CRF model to predict the tone。This paper proposes to label tone changing by human labeler without speech corpus but reading texts in silence in a slow speaking rate. If we use the fast word speed to label the prosodic phrase boundaries were reduced and produced more consecutive three tone sandhi. From experimental results, we can see the slow speaking rate to Label is better than fast speaking rate to label. Slow speaking rate to Label can make pronunciation more naturally。
The performance of experiments, the training and testing data we used from Academia Sinica Balanced Corpus (ASBC) and Internet Article of News. An overall accuracy of 99.37% in Polyphonic Character Disambiguation was achieved; 95.34% in consecutive tone three sandhi was achieved; 98.25%, 98.88% in “一” and “不” was achieved. In addition, overall accuracy of 95.10% in the Keywords Search was achieved; accuracy of 65.61% in the Keywords Search(Words Class Analysis testing set) and accuracy of 95.10% in Words Class Analysis. In this paper not only evaluated the experiments but also analyzed the system errors to provide for future improvement。
Table of Content
Abstract ....................................................................................................... I
Table of Content ................................................................................................................ IV
Table of Tables ................................................................................................................ VII
Table of Figures ................................................................................................................ VIII
Chapter 1 Introduction ........................................................................................... 1
1.1 Bcakground ...................................................................................................................1
1.2 Motivation ...................................................................................................................2
1.3 Literature Review ...................................................................................................................5
1.4 Research Direction ...................................................................................................................8
1.5 Thesis Organization ...................................................................................................................9
Chapter 2 Construction of Text normalization .................................................................................................................. 10
2.1 Analysis on Non-Standard Words ................................................................................10
2.2 The Proposed Text Normalization System ........................................................................15
2.2.1 The BPU Spotting Module .................................................................................... 16
2.2.1.1 The BPU scope tagger ................................................................................. 16
2.2.1.2 The BPU restructurer ................................................................................. 21
2.2.2 The Rule-based System ...................................................................................... 24
2.2.3 Keywords Textnormalization Module .......................................................................... 28
2.3 Words Classification ..........................................................................................31
2.3.1 Words Selection ............................................................................................ 33
2.3.2 Words Filtration ........................................................................................... 33
2.3.3 Construction of Word Classification Matrix ................................................................. 34
2.3.4 Matrix Design .............................................................................................. 35
2.3.5 CRF Format Formulate ....................................................................................... 37
2.4 Experiment results ............................................................................................38
2.5 Comparison between the Words Classification Method and the Keywords Method ....................................49
Chapter 3 Polyphonic Character Disambiguation..................................................................... 53
3.1 Analysis on Polyphonic Character Processing ...................................................................53
3.2 Construction of Polyphonic Character Disambiguation ...........................................................54
3.2.1 Preprocessing .............................................................................................. 55
3.2.2 Manual Labeling ............................................................................................ 58
3.2.3 CRF Model Training ......................................................................................... 58
3.2.4 The Baseline System ........................................................................................ 59
3.2.5 The Dictionary-CRF Hybrid System ........................................................................... 60
3.3 Experiment results ............................................................................................61
Chapter 4 Tone Sandhi ............................................................................................ 66
4.1 Tone Sandhi Processing Analysis ...............................................................................66
4.2 Construction of Tone Sandhi....................................................................................66
4.2.1 Preprocessing .............................................................................................. 67
4.2.2 Manual Labeling ............................................................................................ 69
4.2.3 CRF Model Training ......................................................................................... 70
4.3 Experiment results ............................................................................................71
Chapter 5 Conclusions and Future Research ........................................................................ 76
5.1 Conclusions ...................................................................................................76
5.2 Future Research ...............................................................................................77
References ....................................................................................................... 79
Appendix 1 ....................................................................................................... 80
Appendix 2 ....................................................................................................... 83
[1] A.-H. Lin, Y.-R. Wang, and S.-H. Chen, “Traditional Chinese parser and language modeling for
Mandarin ASR, ” In Proc. O‟COCOSDA‟13, Gurgaon, Idia, 25-27 Nov. 2013, pp. 1-5.
[2]X.-X. Zhou, Z.-Y. Wu, C. Yuan and Y.-Z. Zhong, “Document Structure Analysis and Text
Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis,” in Proc. IITA '08,
Shanghai ,China ,20-22 Dec. 2008 ,pp. 477-481.
[3] Y.-X. Jia, D.-Z Huang, W. Liu, Y. Dong, S.-W. Yu, H.-L. Wang, “TEXT NORMALIZATION IN
MANDARIN TEXT-TO-SPEECH SYSTEM,” in Proc. ICASSP 2008, Las Vegas, Nevada, March 31
2008-April 4 2008, pp. 4693-4696.
[4] J.-K. LIU, W.-G. QU, X.-R. TANG, Y.-Z. ZHANG , Y.-X. Sun, “Polyphonic Word
Disambiguation with Machine Learning Approaches,” in Proc. ICGEC2010, Shenzhen, China, Dec.
13, 2010 to Dec. 15, 2010, pp. 244-247.
[5]H.-H. Dong, J.-H. Tao, B. Xu, “GRAPHEME-TO-PHONEME CONVERSION IN CHINESE
TTS SYSTEM,” in Proc. CSLP2004, Denmark, 15-18 Dec. 2004, pp. 165-168.
[6]Lee, L.-S., Tseng, C.-Y., and Ouh-young, M. The synthesis rules in a Chinese text-to-speech
system. IEEE Transactions on Acoustics, Speech, and Signal Processing 37, 9 1989, 1309–1320.
[7] Ministry of Education, Ministry of Education Polyphonic Character Table, [Online], Available:
http://www.edu.tw/FileUpload/3692-16373%5CDocuments/polyphone10112_1020207updatemail.pdf.