題名: 即時語音辨識系統
其他題名: Real-time Speech Recognition System
作者: 文偉君
關鍵字: ATK
Matlab
即時語音辨識
MFCC
資料庫
ATK
Matlab
Real-time
Speech recognition
MFCC
Databases
系所/單位: 電子工程學系, 資訊電機學院
摘要: 隨者科技業的蓬勃發展,語音辨識一直是眾人關注的議題,其現今的應用涵蓋甚廣,例如:Apple公司將它拿來製成Siri;Google公司將它拿來發展成各國翻譯程式;以及各家科技業者也都應用語音辨識來執行不同的指令。 本系統是由測試者即時錄製一段數字語音並且進行辨識,錄製完後由Matlab以「過零率」與「音量大小」偵測一段話裡面的數個音節(syllable)端點後,將音節切割,並交由HTK(Hidden Markov Model Toolkit)系統將音節轉換為音素(Phone)以並且抽取特徵值。其中,HTK系統以梅爾倒頻參數法對每個音素截取39維(包含差量及差差量)之特徵向量。當HTK完成測試者的音訊節取特徵後,再與我們事先交由HTK訓練好的特徵隱藏式碼可夫模型範本進行音素(Phone)辨識。辨識音素之後是採用最大似然率決策法,會從音素分群中選擇最接近的音節作為歸類,辨識完成後會完整顯示測試者的數字語音內容。 而研究結果顯示本系統對特定語者辨識比較精準,但對於非特定語者辨識還需要加強辨識率,而本研究對提升非特定語者的辨識率上提出兩項建議:第一項為「分群語者」:將語者進行分群(男、女;長、幼…)後再以不同的分群範本來辨識;第二項為「回饋資料庫」:將辨識錯誤之範本由測試者透過介面改正後傳回建構資料庫。未來之研究面向將以提升辨識率的方面進行。
With those booming technology industries, speech recognition has been the subject of attention, which now covers a wide range of applications. Such as: Apple brings it into Siri; Google develops it into translation program with different countries; and various technology companies also apply speech recognition to perform different commands. In the 4G generations, “Internet of Things” is well known. Through the internet of things, we can save the consumption of human resources. Moreover, it can bring great convenience to our life. As we know, it has a close relationship between networking and speech recognition. This study hopes to learn the speech recognition principle better. So that I can have a deeper understanding about speech recognition technology. Next, I tried different algorithms to understand which the best speech recognition method is. So that users can input digital audio files for real time, and print out the results after identification. Hopefully, it can be used together with internet of things by converting the identification result into operating instructions. The challenging tasks to learn include Matlab programming, understanding its instructions, trying different audio sequence capturing techniques, and identification methods. Hidden Markov Model Toolkit (HTK) is a portable toolkit to build and manipulate hidden Markov model, which provides tools consist of a set of library modules and the C source codes. The tool is an advanced facility which provides speech analysis, HMM training, testing and results analysis. Both continuous density mixture of Gaussian and discrete distributions can be used to build complex HMM systems software support for HMM. The HTK release contains extensive documentation and examples. HTK is mainly used for speech recognition as well as many other applications, including research speech synthesis, character recognition, and DNA sequencing. HTK is commonly used worldwide. The system is composed of users’ instant record digital audio clips and show identification results. After Recording, Matlab starts to use both "Zero Crossing Rate" and "Volume" to detect the number of syllables endpoint which inside passage and cut them. Then, HTK system extracts these syllables feature values. After that, with the features we can train the phonemes HMM models for identifying. I use maximum likelihood decision method to realize the identification of the phonemes in which I select the closest syllable from the phoneme group. Finally, the system prints out the results of the identification of numbers what the users just say. The study shows that the system performs more accurate for identifying particular speakers. But, for recognizing general speakers needs to strengthen the recognition rate. As a result, this study presents two proposals to enhance the general speaker recognition rate. The first one is "grouping speakers" that grouped speakers in terms of male, female, elder, young and so on to identify with the different model. The second one is "feedback library" that transmits the identification error of the model by the user through the interface and then the database can be corrected. I expect this will enhance the recognition performance.
日期: 2016-04-19T02:05:03Z
學年度: 104學年度 第一學期
開課老師: 陳冠宏
課程名稱: 專題研究
系所: 電子工程學系, 資訊電機學院
分類:資電104學年度

文件中的檔案:
檔案 描述 大小格式 
D0182686104101.pdf1.76 MBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。