Background

Since 1991 we had made a great effort to develop a Korean morphological analyzer with high accuracy and robustness for a practical use. After its first version called KOMA (Korean Morphological Analyzer) turned out to be quite a success, its extension project was funded by POSCO (Pohang Iron & Steel Co., Ltd.) from 1995 through 1997 so that the extended KOMA system could be used as a core part of their automatic indexer. In addition, tuning and extending the KOMA system was selected as a part of the National Project called "STEP2000" (Software Technology Enhancement Program 2000), so it was also sponsored by the Government from November of 1994 through September of 1997. Now, it is well-known in Korea as a representative morphological analyzer with high accuracy and robustness that reaches the practical level. Its main characteristics are as follows:

Major Features

  Tabular-Parsing for Morpheme Segmentation
    - Modified CYK algorithm
- Parallel processing to produce all possible analyses
  Finite-State Morphotactics
    - Connectivity table to represent the word structure
- Sophisticated morpho-syntactic categories (more than 500)
  Root-Form Recognition
    - Hybrid of rule-based and dictionary-based processing
  For Robustness
    - Handling symbols, numbers, and non-Korean scripts (Han script, Roman alphabets)
- Handling unknown words and word-boundary errors
  Morphological Dictionary
    - General dictionary (110,000 entries)
- Technical terminology (120,000 entries of science & technology area)

Demo

  If you want to see KOMA's demo, click here.