Chinese NLP Lexicons

Chinese NLP Lexicons

The complexity of Chinese poses special challenges to developers of natural language processing (NLP) applications, such as in the areas of word segmentation, information retrieval, speech technology, named-entity extraction, and machine translation. These challenges are exacerbated by the lack of truly comprehensive lexical resources, especially for proper nouns.

CJKI has developed a wide variety of resources, some very comprehensive, to enhance the accuracy and reliability of Chinese NLP applications. Below are links to the individual product pages of CJKI’s resources for Chinese NLP lexicons. These pages provide descriptions of each resource, explain how the resources are used and include data samples.

Related

CLD

Chinese Lexical Database

Monolingual general vocabulary for NLP applications

YPD

Yue Phonetic Database

Phonemic transcriptions of Cantonese (Yue) vocabulary in jyutping

C2C

Chinese to Chinese Conversion

SC/TC mapping tables supporting orthographic and lexemic conversion

CHD

Chinese Hanyu Pinyin Database

Accurate hanyu pinyin data including technical terms and proper nouns

CPD

Chinese Phonetic Database

Phonemic transcriptions showing differences between PRC and Taiwan