A Survey of Localisation in African Languages, and its Prospects: A Background Document

2. Background

This section introduces the importance of African language use in ICT?, defines localisation as used in this paper, discusses the regional context of the research, and outlines how the state of localisation in Africa will be discussed.

2.1 Importance of African languages and ICT

As the information revolution worldwide becomes increasingly multilingual, and as the presence of the new ICTs? in Africa extends to larger areas beyond the capital cities, there is an ever greater need to accommodate use of diverse African languages and greater potential to tap the linguistic wealth of the continent for development and education.

It is generally agreed that availability of software and content in the language(s) most familiar to users is an essential element in their adoption and optimal use of computers and the Internet. One might add that in a context where people speak several languages – as one often finds in Africa – the option to use different languages is also empowering.

Accommodation of people’s most familiar languages is moreover a consideration of primary importance for any effort to use ICT? for development. This should be of no surprise, as education and communication in the first languages in general is easier for people than in languages they acquire later. Also, on a community or societal level, first languages are considered an indispensable and central aspect of social and cultural systems.2

However, ICT? has been introduced to Africa and Arabic-speaking regions in English, French, and in some countries south of the Sahara, Portuguese and Spanish – the same languages of European origin that were used in colonisation of these regions, which have served as official languages since their independence (especially south of the Sahara), and which also serve as what will be referred to here as "European (or Europhone) languages of wider communication" (ELWCs).3 One problem with reliance on ELWCs is that a large majority of people on the continent either do not speak these languages or do not speak them well.4

At the same time the sheer number and diversity of languages on the continent – over 2000 languages according to Ethnologue (Gordon, ed., 2005), which represents about a third of all living languages in the world – poses a challenge for localisation efforts and indeed educational programs that would support them. The fact that many of what are counted as separate languages also fall into clusters of very closely related and interintelligible tongues shows that Africa’s linguistic complexity has many dimensions.

Initiatives aiming to expand the use of ICT? in Africa for development, education, or other purposes are beginning to recognise the necessity of responding to these sociolinguistic realities. Such efforts are benefiting also from advances in internationalisation of the technology, from greater use of Unicode (ISO-10646)5 for handling diverse scripts and extended characters, to availability of utilities for creation of keyboard layouts, etc.

However, there are still a number of hurdles. Some are technical, relating for instance to use of extended character sets and Unicode on older computer systems and European keyboards. Some hurdles relate to economic factors such as cost for translation of content. Others are social, relating to education levels, and there are also sometimes negative attitudes towards African languages among foreign development and education experts and even native language speakers.6 Also, there are countries in which government language and education policies disfavour African languages, which in turn has an impact on ICT? usage.

In any event, the use of ICT? in Africa's indigenous languages should not be seen merely as a compensation for people lacking knowledge of ELWCs, let alone as a second-best or interim solution for such people until the rate of knowledge of ELWCs is greater.7 It is also a question of fairness in access, a long-term practical issue (since it is hard to imagine that Africans any more than the populations of any other region will universally be as comfortable or efficient using ELWCs in ICT? to the exclusion of their first languages) and a solution that opens up new possibilities for more effective use of the technology by even the most highly educated, thus complementing and expanding upon the potentials offered by use of applications in ELWCs.

2.2 What is localisation?

The term "localisation" is used in various contexts relating to ICT?, but the definitions revolve around the adaptation of user interfaces and digital information to the local modes of communication, culture and standards. Daniel Yacob? (2004) offers a broad interpretation that defines the object of localisation: "the transfer of cultural consciousness into a computer system, making the computer a natural extension of the society it serves."

"Localisation" is a concern that was arguably inherent or latent in computer technology itself from its very beginning. In other words, it was inevitable that computing would eventually enable the handling of human language and that questions would then arise about choice of languages and that use of additional ones would be raised by users who come from diverse linguistic backgrounds. Then, as computers became able to more readily convey images, sounds, and styles of presentation, issues of cultural appropriateness would naturally follow.

In practice, localisation is both a technical set of approaches and techniques for adapting software and content to particular languages and cultures, and also, more broadly, an enterprise activity that incorporates those technical dimensions, planning, linguistic information, and the organisation necessary to make it happen. Altogether localisation aims at facilitating use of target languages in ICT? and can further be understood as an active component of wider efforts to adapt science and technology to diverse societies and cultures.

Localisation as a technical concern

Computer systems, and ICT? in general, involve two levels of consideration: hardware and bits (binary encoding). Together these define the technical possibilities for localisation.

At its simplest, the hardware side of ICT? can be understood as involving devices and connections. The devices – computers but also increasingly powerful handheld devices – can operate independently for certain purposes including storage and manipulation of data, like text, spreadsheets, and other files. They also can connect to a network that links to other devices – the internet – for retrieval and exchange of information (email, webpages, streaming media). Localisation relates to both of these aspects (independent and networked).

In order for the user to make use of the technology, the bits in which information at the most basic level is encoded and manipulated, and the soft tools for facilitating that are written, are organised in forms that permit interface with the hardware and network, and storage and transmission of information. In other words there are two aspects: interface (accessing and using the technology) and information content (documents, data, etc.)

Table 1 illustrates cross indexes these two levels – the two fundamental categories of hardware and the two fundamental kinds of use to which the technology is put. In effect, by considering the two in a matrix it is easier to understand the aspects of ICT? that we are concerned with in localisation.

 Interface/Access (how we interact with the technology)Information/Storage (what we use the technology for)
Computer (individual piece of hardware)Operating system, software for various purposes, keyboard, displayDocuments and files of various sorts, created by user(s)
Internet (the network of connections)(The above plus...) specialised software resident on servers such as search engines, databasesWeb content, remote storage
Table 1: Dimensions for localisation

From this analysis we can identify three separate but overlapping concerns. These are listed and then discussed below:

  • Equipping systems deployed in various localities – or actualising their existing capacities – to handle local language needs. This facilitates production of documents and also display of multilingual web content.
  • Production of web content for diverse audiences in languages and formats that they can understand.
  • Localisation of user interfaces on individual devices and the internet

All three of these concerns are a focus of the PAL project, but the localisation of interfaces (particularly software) is pivotal, as it both is the logical extension of efforts to equip systems to handle local language needs and has the potential to facilitate production of localised content.

Equipping systems: This is mainly a matter of actualising the potential of computer systems to handle local languages in various ways, notably non-ASCII text.8 The main issues are fonts, input and display.

Many languages of Africa are written with an extended Latin script and a number of others, like Arabic, use non-Latin scripts. For all of these languages, unlike another group of languages that use basically the same character set as those of Western Europe (including many in Southern and Eastern Africa), the advent of Unicode represents a new era of possibilities. However, basics such as an adequate choice of complete fonts and standardised and user-friendly input (mainly keyboard layouts, but eventually also speech recognition software9) are necessary. The first step of localisation for these languages is in effect this "last mile" of internationalisation (which in turn refers to the process of improving computers and systems to be able to accommodate diverse language needs of the world).

Fonts are really the first issue, since without fonts that include the necessary extended characters or non-Latin scripts, software applications will not fully or correctly display many languages. This means Unicode fonts – fonts in which all characters are encoded according to the Unicode standard – since legacy 8-bit fonts, while they may be able to display the characters and diacritics used in whatever language(s) they were designed for, are not readable on systems without those fonts installed. Basically, 8-bit fonts are not intercompatible since each uses the limited number of codepoints for characters in a different way while Unicode in principle provides a single codepoint for each character in every writing system (this is discussed further below, 6.2).

For input of text in languages that use non-ASCII characters, specialised keyboard layouts are also necessary, and these may be created for languages or groups of languages for which there does not yet exist localised software. Beyond capacities to handle text, the capacity of systems to permit users to create and use multimedia that does not rely solely on text is another important, though sometimes overlooked, consideration.

At the same time it is recognised that there do exist many older computers in Africa, often the result of donations of used equipment, whose systems cannot handle Unicode and may be limited in other aspects. (See below, 5.1)

Content: "Content" is usually taken to mean "web content" – the information conveyed on the pages of sites on the World Wide Web (WWW). More broadly, we may take it to include information stored as documents or data on computers or conveyed over the internet by other means, such as e-mail. The latter is of interest in measuring the use of and demand for ability to use diverse languages.

The production and display of information via the web that is relevant and accessible to users is facilitated by the considerations under the previous point, above. Choice of languages is obviously a key consideration, since it is via comprehensible idioms that ICT? can convey information for development or other purposes. Additional considerations such as the cultural appropriateness of themes and images, and the approach to communication within the language (dialects, contemporary vs. formal styles, etc.) are also important aspects of localised content.

One can divide localised web content into two parts based on origin: that produced locally and that produced elsewhere but targeted to the local audience. Both are important, but our main concern is the former. Ballantyne (2002) discusses content in terms of this division and also the target of the content. It may be useful to seek to develop collaboration among local and international content developers wherever they may be in terms of how best to address their common intended audience.

At present there is little web content developed in African languages either in Africa or elsewhere, and relatively little content of any sort coming from Africa. The issue of localised content is discussed in more detail below (see 7.1).

Localisation of user interfaces: This of course includes the translation of basic computer software such as browsers and word processors into different languages, including commands, dictionaries, help files etc. (the capacities of software to handle diverse language needs is considered under the first point above). In addition to translation, other issues such as conventions for display of certain information and the culturally appropriateness of themes used in the software are also important considerations. In many cases, localisation may be some but not all of the above. Localisation in Arabic, and to a very limited degree in some more widely-spoken African languages, is being undertaken by commercial software companies, notably Microsoft. While this helps in the overall picture of localisation, it concerns only major languages and large markets. It may also entail higher costs for users than can be supported, especially in ICT4D contexts. Therefore, in the interests of inclusion of languages and local expression, and of lower cost solutions, free/open-source software (FOSS) is the focus of this project.

Localisation as project

Localisation in its broader sense of a process and enterprise takes into consideration several other matters such as:10

  • factors necessary to localise
    • a standardised orthography
    • locale data
    • organisation and resources to accomplish localisation in the stricter technical sense
  • aspects of sustainability in the long term
    • follow-through and marketing of localised software
    • follow-up with the user community and for updates
  • attention to issues of user skills (from basic literacy to computer literacy)
  • impacts of localisation of ICT? on other aspects of society, economy and culture.

All of these are important to take into consideration. A framework to facilitate that is proposed in the following section (3).

2.3 Overlapping regional contexts: Localisation where?


Africa is a multilingual continent but there is no software and even internet content in the vast majority of its many languages – even in most of the major and more widely-spoken ones. Every country on the continent has some linguistic diversity, resulting from the history of population movements and the overlay of colonial languages, most countries, especially south of the Sahara, have no single majority language. A few countries have scores or even hundreds of languages. The ELWCs introduced during colonisation – English, French, and Portuguese – serve as official languages and facilitate communication to one degree or another across wide areas, but are primarily second languages of the more educated and urbanised segments of society, and do not have the same connection with African cultures as indigenous African languages.11 Most of the people who have less facility in the ELWCs are in rural areas, and include a higher percentage of women than of men.

Software and content in ELWCs, therefore, cannot satisfy the needs of the majority of the African population, and even in the limited (mainly urban) locations where they might, many people would effectively still be excluded linguistically, and everyone would have their language options restricted by the lack of content and software in African languages. In addition, any effort to use ICT? for development purposes would be hindered to the extent the working languages are limited to ELWCs. Put another way, the needs for localisation in Africa correspond to the hopes for ICT? to play a full and effective role in development on the continent.

Currently as the numbers of computers and the quality of internet connections on the continent is increasing, so too is interest in localising software and content in African languages. This is not only for reasons related to development, naturally, but also for the same reasons one sees such interest elsewhere. The amount of material in African languages on the internet is increasing slowly, and there are active efforts to localise software, particularly in South Africa, Eastern Africa, and Nigeria. However there is limited connection among the efforts, and limited knowledge beyond that of a few specialists.

This comes at a time, though, when there is increased interest in localisation around the world and in both commercial software companies on the one hand and the FOSS movement on the other. The time is opportune given the need, the budding local interest, and the international resources potentially available, to facilitate localisation in Africa.

Arabic-speaking world

The need for localisation in Arabic is very real but of a different nature, even though many countries in this region were also colonised, had a similar overlay of English and French languages, and were first introduced to ICT? in those tongues. Unlike the case with sub-Saharan Africa and its languages, however, there is by now already a significant amount of localised software and content in Arabic, as one would expect for a major international language. Also, the Arabic-speaking world in general, including the countries of North Africa, has better infrastructures and ICT? indicators than Africa south of the Sahara. So the challenges in this region are less daunting than in Sub-Saharan Africa.

Nevertheless, the range of localised software is still arguably limited, and there is not a corresponding level of localisation dealing with local themes and idioms. Building the capacity of developers to localise Arabic software and content for their diverse user communities, particularly those outside of the major cities, is a goal in such cases, and will be the focus of the project’s work on this language. There is also a need to produce Arabic electronic dictionaries for FOSS applications like OpenOffice – these could be localised to countries, much as English or French dictionaries differ among locales.

Within the Arabic-speaking world, this report focuses on the countries of North Africa, while acknowledging the important cultural and historic connections it has with the rest of the Middle East.

2.4 Who localises?

The question of who benefits, or potentially might benefit, from localisation has already been touched upon above in discussing why localisation is important (2.1). It is also useful to briefly consider who does localisation and would thus more immediately benefit (in terms of information, networking, tools) from this project.

The question has as many dimensions as there are types of localisation. Yet a simple answer to all of that might be that anyone who is motivated to connect African languages with the content and interactive language of ICT?, has the means to do it, and actually initiates or participates in some aspect of localisation is a localiser. The profile of a localiser would also include higher than average education, working knowledge of ICT?, and knowledge of at least two languages – a dominant languages used in ICT? and the one in which to localise. This is a very select group in any case, and all the more so in Africa.

In terms of origin and location of localisers, one might identify there are three broad categories: Africans in Africa; Africans residing in other parts of the world; and non-Africans who have a strong knowledge of (including language) and interest in African localisation. In some parts of the continent that are better off economically, educationally, and in terms of technical infrastructure, such as in the North or in South Africa, the first category is stronger. However, the latter two categories can in some ways reinforce the first. On the other hand, in some contexts the categories of localisers from outside of Africa may initiate or drive localisation efforts – for instance African expatriates developing one or another kind of project in their home languages, commercial interests, or international development organisations.

Another valid characterisation would be to say that content localisers require language skills but less depth in technical skills, while software localisation requires both. The fundamental concern of equipping systems – whether on the level of designing an ICT4D/E project, for instance, or of managing a cybercafé – requires mainly an awareness of internationalisation issues and familiarity with the local language needs.

People who localise also range in skill sets, such that groups of individuals with complementary skills in language and technology make a logical team. This implies some level of organisational skills to coordinate efforts and plan actions. Since localisation implies products destined for a market (whether those products are free or not), marketing is another concern. This means that "who localises" may also involve people who bring primarily skills mentioned above to a collaborative effort. Motivation to work on localisation can thus be considered as the first defining characteristic of people who localise.

2.5 What is the current state of localisation across this region?

One of the purposes of this report is to give a better idea of what is happening with localisation in Africa, both as information for localisers, policymakers, and ICT? for development experts, and also as a benchmark of sorts to evaluate the effectiveness of future localisation efforts.

In general one can say that the potential for localisation is great, but despite growing interest in Africa, the current level of activity varies, and is generally small, with some differences between regions in the degree and character of localisation initiatives and related local or multilingual ICT? efforts.

Section 7 of this paper will discuss recent and current localisation efforts. The intervening sections will, in addition to exploring the need and potential for localisation, set the context for understanding the current localisation situation.

< 1. Introduction | Survey Document | 3. Introducing "Localisation Ecology" >