Chinese to Chinese Conversion
Over 700,000 items
Supports SC to TC conversions and vice versa
Supports orthographic and lexemic conversion
Overview
In 1996, CJKI launched a project whose goal was to develop a Chinese to Chinese conversion (C2C) system that supports Simplified to Traditional Chinese (SC>TC) and Traditional to Simplified Chinese (TC>SC) conversions that give near-perfect results. Orthographic conversion is mapping simplified forms to traditional forms on a character and word levels, such as SC 国家 to TC 國家 (‘country’) and vice versa, while lexemic conversion is mapping such as vocabulary items on a semantic level, such as SC 出租车 to TC 計程車 (‘taxi’) and vice versa.
C2C has been a major undertaking that required a considerable investment of funds and human resources. To this end, we have engaged in the following research and development activities:
In-depth investigation of technical and linguistic issues related to C2C
Research on Chinese word segmentation technology
Construction of comprehensive SC-TC mapping tables
Attributes
Code level mappings
In the SC-to-TC and TC-to-SC directions
Semantic classification codes
Such as orthographic or lexemic codes
Lexemic mappings
In the SC-to-TC and TC-to-SC directions
Orthographic mappings
In the SC-to-TC and TC-to-SC directions
Phonological information
Such as pinyin and zhuyin
Grammatical information
Such as part-of-speech codes
SC and TC Conversion
Type | SC | SC Pinyin | TC | TC Pinyin |
---|---|---|---|---|
O | 鲍林 | bào lín | 鮑林 | bào lín |
O | 抱拢 | bào lǒng | 抱攏 | bào lǒng |
O | 报录 | bào lù | 報錄 | bào lù |
OP | 暴露 | bào lù | 暴露 | pù lù |
O | 暴乱 | bào luàn | 暴亂 | bào luàn |
O | 鲍伦 | bào lún | 鮑倫 | bào lún |
O | 鲍螺 | bào luó | 鮑螺 | bào luó |
O | 抱锣 | bào luó | 抱鑼 | bào luó |
OP | 显微镜 | xiǎn wēi jìng | 顕微鏡 | xiǎn wéi jìng |
O | 国家 | guó jiā | 國家 | guó jiā |
OP | 企业 | qǐ yè | 企業 | qì yè |
OP | 危险 | wēi xiǎn | 危險 | wéi xiǎn |
O | 计算机 | jì suàn jī | 計算機 | jì suàn jī |
L | 计算机 | jì suàn jī | 電腦 | diàn nǎo |
O | 电脑 | diàn nǎo | 電腦 | diàn nǎo |
L | 出租车 | chū zū chē | 計程車 | jì chéng chē |
O | 计程车 | jì chéng chē | 計程車 | jì chéng chē |
O | 出租车 | chū zū chē | 出租車 | chū zū chē |
L | 文件 | wén jiàn | 檔案 | dǎng àn |
Practical Applications
C2C is ideal for applications such as:
Machine translation into TC
Such as convert SC to TC instead of English to TC
SC-to-TC and TC-to-SC conversion
Machine translation into SC
Such as convert TC to SC instead of English to SC
Reference Documents
Related Resources

Chinese Hanyu Pinyin Database
Accurate hanyu pinyin data including technical terms and proper nouns

Chinese-English Personal Names
Chinese-English database of CJK and Western personal names

Chinese-English Place Names
Chinese-English database of CJK and Western place names