添加链接
link之家
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
字級大小SCRIPT,如您的瀏覽器不支援,IE6請利用鍵盤按住ALT鍵 + V → X → (G)最大(L)較大(M)中(S)較小(A)小,來選擇適合您的文字大小,如為IE7或Firefoxy瀏覽器則可利用鍵盤 Ctrl + (+)放大 (-)縮小來改變字型大小。
:
twitter line
研究生: 劉冠廷
研究生(外文): LIOU, Guan-Ting
論文名稱: 中文文字轉拼音系統之建立
論文名稱(外文): A Study on Construction of Chinese Grapheme to Phoneme Conversion System
指導教授: 江振宇 江振宇引用關係
指導教授(外文): CHIANG, Chen-Yu
口試委員: 王逸如 陳信宏
口試委員(外文): WANG, Yih-Ru CHEN, Sin-Horn
口試日期: 2015-07-24
學位類別: 碩士
校院名稱: 國立臺北大學
系所名稱: 通訊工程研究所
學門: 工程學門
學類: 電資工程學類
論文種類: 學術論文
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 92
中文關鍵詞: 破音字 文字正規化 關鍵詞 詞群 轉調
外文關鍵詞: G2P keyword text normalization polyphonic disambiguation tone sandhi
相關次數:
  • 被引用 被引用:0
  • 點閱 點閱:269
  • 評分 評分:
  • 下載 下載:25
  • 收藏至我的研究室書目清單 書目收藏:0
文字分析模組為一個中文文字轉語音系統之前級處理單元,提供後級之韻律產生模組及語音合成模組所需要之語言參數,語言參數包含了詞、詞類標記、基礎片語、以及發音標記,為了使文字轉語音系統能夠有正確且較自然的發音,本論文對於破音字、轉調、以及文字正規化分別提出改善方法,並進行評估。
文字正規化方面,本論文提出了兩種處理方法,分別為「關鍵詞搜尋方法」與「詞群分析方法」;在關鍵詞搜尋方法方面,句中一小段範圍的語意,可以由重要的關鍵詞引導而出,且文字正規化之處理 和文句之語意有關,因此透過建立關鍵詞與發音處理相關的規則,能夠使需文字正規化範圍轉成正確的 唸法。以「關鍵詞搜尋方法」解決文字正規化問題的過程,衍生出如何避免以人工大量列舉關鍵詞以及 人工大量撰寫處理規則之問題,因此本論文又提出「詞群分析方法」來改善此問題。詞群分析法,使用 自動化的方法由大量語料庫抽出具有輔助文字正規化之關鍵詞及其群組,並利用資料驅動之 CRF 模型建立由詞、詞類以及關鍵詞群預估正確的發音處理方法;由於大部分關鍵詞可運用特定詞性抽取出來,但部分關鍵詞卻隱藏在混淆的詞性中,若直接以關鍵詞本身當作預估發音處理方法之參數,會造成複雜度太高或參數稀疏之問題,又若以詞性當參數預估發音處理方法又顯能力不足,因此我們嘗試用文字距離與文字正規化項目較近之詞,運用詞性、詞與文字正規化項目的亂度關係建立屬於文字正規化項目專屬的詞群,並將分類好的詞群做為特徵參數,以提供更強健之文字正規化發音處理方法。
在破音字處理方面,除了建立破音字的資料庫,也提出一個以發音詞典與資料驅動統計模型混合處理方法,若斷詞結果中包含破音字的詞已被發音詞典收錄,則發音直接以發音詞典之發音標註,而詞典未收錄之詞,再透過 CRF 模型利用斷詞以及詞類標記等資訊預估破音字發音。
轉調方面,本論文處理連續三聲以及“一、不”之轉調,建立轉調的資料庫,並使用 CRF 模型運用 斷詞、詞類以及聲調來預估轉變的聲調。由於從語音資料中得到大量三聲轉調標記的訓練資料極為困難,本論文試驗標註者以慢語速及快語速默唸文本的方式進行大量三聲轉調標註,在過程中發現,透過默唸慢語速的標記,句子的韻律斷點增加,而三聲轉調處理之範圍也不會跨過韻律斷點並且斷點前之三聲保持會保持原調,因此會將一長串的連續三聲切割成較多的三聲轉調群組,而較少三聲被標註成轉調;反之,若使用快語速標記,韻律斷點減少,產生較多三聲變調,實驗結果發現由慢語速默唸標註之
資料庫訓練出的三聲轉調模型,相較於快速默唸之標註訓練資料,可產生較自然的合成語音。
為量化本論文提出之系統表現,我們以中研院平衡語料庫和網路新聞文章作為訓練及測試語料,破音字之標註正確率整體可達 99.37%,三聲轉調正確率 95.34%,“一”及“不”之轉調正確率分別為 98.25%, 98.88%,而文字正規化關鍵詞搜尋系統正確率為 95.10%,關鍵詞系統正確率 65.61%(詞群分 析法測詴資料集),詞群分析法的正確率為 95.10%。除了量化分析外,本論文亦針對系統之錯誤,進行 定性之分析,提供未來系統改善及發展討論。
Text analysis is a preprocessing unit of a Chinese Text-to-speech system. It provides linguistic features such as word, part-of-speech (POS), base phrase (BP), and pinyin (or pronunciation), for the following processes of prosody generation (PG) and speech synthesis (SS). This thesis focus on construction of a grapheme to phone system which converts written-form words to spoken-form words (SFWs), and the associated pinyins and tones. To this end, a set of technologies for grapheme to phone is reported, including text normalization (TN), polyphone disambiguation (PD), and tone sandhi labeling (TSL).
For text normalization, this paper proposes two methods: 1) the Keywords Search scheme and 2) the Words Class analysis method. In the Keywords Search method, since spoken-forms of Non Standard Words (NSWs) can be implied by semantics of the current sentence, and semantics of a sentence can be implied by some keywords, we can utilize keywords to convert NSWs to SFWs. Hence, the rules that map from pairs of keyword sets and NSWs to SFWs are created. However, the Keyword Search method requires linguistic expertise to make numerous handcrafted rules, resulting in a very time consuming process to exhaustedly list rule-related keywords. To tackle the above-mentioned shortcomings, the Words Class Analysis method is hence proposed to improve both the efficiency for system construction and system accuracy.
For polyphonic character disambiguation, we constructed the database and advance data driving model combine dictionary hybrid system method. If the word contains more than one character, the found pinyin in the dictionary give the pinyin by dictionary. And the other for the words which cannot be found in the dictionary; we can used the CRF model to get the correct answer.
For tone sandhi, we are not only processed consecutive tone three sandhi, 一 and 不, but also constructed the tone sandhi database. And used words, POS and tone to build CRF model to predict the tone。This paper proposes to label tone changing by human labeler without speech corpus but reading texts in silence in a slow speaking rate. If we use the fast word speed to label the prosodic phrase boundaries were reduced and produced more consecutive three tone sandhi. From experimental results, we can see the slow speaking rate to Label is better than fast speaking rate to label. Slow speaking rate to Label can make pronunciation more naturally。
The performance of experiments, the training and testing data we used from Academia Sinica Balanced Corpus (ASBC) and Internet Article of News. An overall accuracy of 99.37% in Polyphonic Character Disambiguation was achieved; 95.34% in consecutive tone three sandhi was achieved; 98.25%, 98.88% in “一” and “不” was achieved. In addition, overall accuracy of 95.10% in the Keywords Search was achieved; accuracy of 65.61% in the Keywords Search(Words Class Analysis testing set) and accuracy of 95.10% in Words Class Analysis. In this paper not only evaluated the experiments but also analyzed the system errors to provide for future improvement。
Table of Content
Abstract ....................................................................................................... I
Table of Content ................................................................................................................ IV
Table of Tables ................................................................................................................ VII
Table of Figures ................................................................................................................ VIII
Chapter 1 Introduction ........................................................................................... 1
1.1 Bcakground ...................................................................................................................1
1.2 Motivation ...................................................................................................................2
1.3 Literature Review ...................................................................................................................5
1.4 Research Direction ...................................................................................................................8
1.5 Thesis Organization ...................................................................................................................9
Chapter 2 Construction of Text normalization .................................................................................................................. 10
2.1 Analysis on Non-Standard Words ................................................................................10
2.2 The Proposed Text Normalization System ........................................................................15
2.2.1 The BPU Spotting Module .................................................................................... 16
2.2.1.1 The BPU scope tagger ................................................................................. 16
2.2.1.2 The BPU restructurer ................................................................................. 21
2.2.2 The Rule-based System ...................................................................................... 24
2.2.3 Keywords Textnormalization Module .......................................................................... 28
2.3 Words Classification ..........................................................................................31
2.3.1 Words Selection ............................................................................................ 33
2.3.2 Words Filtration ........................................................................................... 33
2.3.3 Construction of Word Classification Matrix ................................................................. 34
2.3.4 Matrix Design .............................................................................................. 35
2.3.5 CRF Format Formulate ....................................................................................... 37
2.4 Experiment results ............................................................................................38
2.5 Comparison between the Words Classification Method and the Keywords Method ....................................49

Chapter 3 Polyphonic Character Disambiguation..................................................................... 53
3.1 Analysis on Polyphonic Character Processing ...................................................................53
3.2 Construction of Polyphonic Character Disambiguation ...........................................................54
3.2.1 Preprocessing .............................................................................................. 55
3.2.2 Manual Labeling ............................................................................................ 58
3.2.3 CRF Model Training ......................................................................................... 58
3.2.4 The Baseline System ........................................................................................ 59
3.2.5 The Dictionary-CRF Hybrid System ........................................................................... 60
3.3 Experiment results ............................................................................................61
Chapter 4 Tone Sandhi ............................................................................................ 66
4.1 Tone Sandhi Processing Analysis ...............................................................................66
4.2 Construction of Tone Sandhi....................................................................................66
4.2.1 Preprocessing .............................................................................................. 67
4.2.2 Manual Labeling ............................................................................................ 69
4.2.3 CRF Model Training ......................................................................................... 70
4.3 Experiment results ............................................................................................71
Chapter 5 Conclusions and Future Research ........................................................................ 76
5.1 Conclusions ...................................................................................................76
5.2 Future Research ...............................................................................................77
References ....................................................................................................... 79
Appendix 1 ....................................................................................................... 80
Appendix 2 ....................................................................................................... 83
[1] A.-H. Lin, Y.-R. Wang, and S.-H. Chen, “Traditional Chinese parser and language modeling for
Mandarin ASR, ” In Proc. O‟COCOSDA‟13, Gurgaon, Idia, 25-27 Nov. 2013, pp. 1-5.
[2]X.-X. Zhou, Z.-Y. Wu, C. Yuan and Y.-Z. Zhong, “Document Structure Analysis and Text
Normalization for Chinese Putonghua and Cantonese Text-to-Speech Synthesis,” in Proc. IITA '08,
Shanghai ,China ,20-22 Dec. 2008 ,pp. 477-481.
[3] Y.-X. Jia, D.-Z Huang, W. Liu, Y. Dong, S.-W. Yu, H.-L. Wang, “TEXT NORMALIZATION IN
MANDARIN TEXT-TO-SPEECH SYSTEM,” in Proc. ICASSP 2008, Las Vegas, Nevada, March 31
2008-April 4 2008, pp. 4693-4696.
[4] J.-K. LIU, W.-G. QU, X.-R. TANG, Y.-Z. ZHANG , Y.-X. Sun, “Polyphonic Word
Disambiguation with Machine Learning Approaches,” in Proc. ICGEC2010, Shenzhen, China, Dec.
13, 2010 to Dec. 15, 2010, pp. 244-247.
[5]H.-H. Dong, J.-H. Tao, B. Xu, “GRAPHEME-TO-PHONEME CONVERSION IN CHINESE
TTS SYSTEM,” in Proc. CSLP2004, Denmark, 15-18 Dec. 2004, pp. 165-168.
[6]Lee, L.-S., Tseng, C.-Y., and Ouh-young, M. The synthesis rules in a Chinese text-to-speech
system. IEEE Transactions on Acoustics, Speech, and Signal Processing 37, 9 1989, 1309–1320.
[7] Ministry of Education, Ministry of Education Polyphonic Character Table, [Online], Available:
http://www.edu.tw/FileUpload/3692-16373%5CDocuments/polyphone10112_1020207updatemail.pdf.