DESIGN AND IMPLEMENTATION OF AMHARIC SEARCH ENGINE; by Tessema Mindaye, MSc Thesis, Department of Computer Science, Addis Ababa University, July 2007 (Tessema Mindaye), (Advisor: Solomon Atnafu,PhD)


The Web is a huge repository of information in the form text, image, audio, and video. Search engines, such as Google, Yahoo, etc, are the first ports of call for the discovery of resources from this huge repository. These general purpose search engines are designed and optimized for English language. They fell short when they are used for locating resources of interest of other languages on the Web. This is mainly due to the specific features of the language that are not considered by those search engines. Amharic, which is a family of Semitic languages, is the official language of the federal government of Ethiopia. Currently, there is significant number of Amharic documents on the Web. These documents need a search tool that considers the typical characteristics of the language.
In this study an attempt is made to design and implement a search engine for Amharic language web documents. The research came up with a complete language specific (Amharic) search engine that has a crawler, an indexer and a query engine component. These components are optimized for the language they are designed, Amharic language. The crawler (Language Specific Crawler) crawls the Web and collects Amharic web documents with Unicode encoding and stores them in a repository. The next component, the Indexer, processes the documents and stores them in a structure that is efficient and appropriate for searching. The Query Engine component gives an interface that the user can enter his/her information need in Amharic language using Ethiopic script. It then parses the query, search the index for the query, select the documents that are relevant to the query, and return the relevant documents according to their rank. Two runs of crawling have been done to test the crawler. Moreover, to measure the effectiveness of the system, retrieval experiments have been performed for some queries and a Precision-Recall test is done. The system is tested with selected queries that reflect some specific features of the language. The developed system considers the typical features of the language and meets its design requirements.

Key words: Information Retrieval, Amharic Search Engine, Amharic language on the Web.

Please put your bio above the ---- < Tunde Adegbola | Participants | Martin Benjamin >