CJKI Data
Lexical Resources
CJKI’s large-scale databases currently have over 50 million entries and are continuously being expanded. They cover general vocabulary, proper nouns, and technical terms, and include a rich set of grammatical, phonological, syntactic, and semantic attributes.
CJKI is one of the world’s prime sources for CJK and Arabic dictionaries and lexical resources. We contribute to AI and natural language processing (NLP) technology, including machine translation, speech technology and named entity recognition, by providing high-quality lexical resources to many of the world’s leading IT companies, including Amazon and Google.
- Our data resources can be quickly located by language below, via the resource guide, or product list.
- More information on CJKI can be found here.
- Details on licensing data and our business model can be found here.
Language Overview
Chinese Resources
Resources for NLP lexicons, proper nouns, technical terms, and general vocabulary.
Japanese Resources
Resources for NLP lexicons, proper nouns, technical terms, and general vocabulary.
Korean Resources
Resources for NLP lexicons, proper nouns and technical terms.
Arabic Resources
Resources for NLP lexicons and proper nouns.
Other Resources
Resources for Vietnamese, Persian and Spanish.