Arabic NLP Lexicons
The complexity of Arabic poses special challenges to developers of natural language processing (NLP) applications, especially in the area of word segmentation (WS), information retrieval (IR), named entity extraction (NER), and machine translation (MT).
Major linguistic issues in the development of NLP applications (such as MT, NER, and TTS) are exacerbated by the lack of truly comprehensive lexical resources, especially for proper nouns, and the lack of a standardized orthography. Our Institute has developed various comprehensive lexical resources to enhance the accuracy and reliability of NLP applications.
Below are links to the individual product pages of CJKI’s resources for Arabic NLP lexicons. These pages provide descriptions of each resource, explain how the resources are used and include data samples.
Resources
Arabic Phonetic Database
Phonemic transcriptions for core Arabic vocabulary
Arabic Full-Form Lexicon
Arabic Full-Form Lexicon Includes all inflected, declined, and conjugated forms
Arabic Plurals
Extensive coverage of regular and irregular (‘broken’) plurals in Arabic
Arabic Dialects Full-Form Lexicon
Full-form lexicon for all major Arabic dialects
Arabic Wordlist
General vocabulary, proper nouns, and technical terms