題名: Chinese Abbreviations and Expansion
其他題名: 中文縮寫與還原
作者: Huang, Chuen-Min
Yang, Chuan-Pu
關鍵字: Chinese abbreviation
Longest Common Subsequence
Feature Selection
Maximum Entropy
期刊名/會議名稱: 2005 NCS會議
摘要: The form of abbreviation is commonly used in the Chinese text. For instance, we often transform “台灣鐵路局" into"台鐵局". This kind of transformation is timesaving and convenient. However, this merit also brings some challenges in Chinese text processing. In keyword-based information retrieval system, using the abbreviated form and the original form as search entry respectively, usually return different results even though both carry the same meaning. In addition, the influences of abbreviation on Chinese word segmentation, automatic documents clustering and weight of terms are obvious. To solve the semantic ambiguity problem in Chinese text processing, we propose an approach to bridge these two forms and construct an abbreviation list automatically without consulting any dictionary. Two major tasks are included in our experiments— finding the best candidate of original forms or abbreviations, and recognizing potential abbreviations and original forms in documents. Besides, there are two kinds of procedures including detecting abbreviation of original forms, and executing expansion of abbreviation. Considering feature selection, it is suggested to combine both nouns and POS rather than only process either of them respectively. Our study shows that the performance of precision with documents from single news category, especially the category of finance, is the best. Thus, our method may be suitable for corpus with static types of words. It is worth further exploring that if the performance could be improved if noise information is removed from contextual information.
日期: 2006-10-18T11:01:19Z
分類:2005年 NCS 全國計算機會議

文件中的檔案:
檔案 描述 大小格式 
ce07ncs002006000012.pdf478.19 kBAdobe PDF檢視/開啟


在 DSpace 系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。