A Survey of Localisation in African Languages, and its Prospects: A Background Document

4. Linguistic Context

For purposes of this study, the linguistic context to be considered concerns the distribution of speakers of the various African languages, current trends, attitudes of social groups toward use of different languages, the effect of language and education policies on use of and change in African languages, and development of terminologies. In other words, this includes sociolinguistics, policies, geography of language, and contemporary culture. On the level of individual languages, issues of dialect variation, degree of interintelligibility with related languages, and development of terminology also are key considerations.

There are other scholarly studies of the languages of Africa from perspectives of the field of linguistics that treat many of these issues in more depth than will be attempted here. This section will attempt to broadly characterise the main elements of the linguistic environment in which African language localisation efforts are taking place, with attention to how those affect localisation.

4.1 Languages, dialects, and linguistic geography

Four indigenous language families occupy the continent: Afroasiatic (formerly called Hamito-Semitic); Niger-Kordofanian (including Niger-Congo, which is by far the largest group in this family); Nilo-Saharan; and Khoisan (or Click). Within these there are interspersed subgroups, overlapping of territory by speakers of different languages, and more or less gradual gradations in dialect differences within many languages. In addition there are indigenised languages from the Malayo-Polynesian (Malagasy) and Indo-European (Afrikaans) families.

This creates in many areas a complex kaleidoscope of language distributions, which tends to be more intricate in certain parts of the continent such as the forested regions of coastal West Africa and Central Africa (Nigeria for instance counts more than 400 languages, Cameroon over 280, and the Democratic Republic of the Congo over 200, according to Ethnologue), and less so in savannah regions like the Sahel. Also, over vast distances where languages like Arabic, Berber languages (Tamazight, Tamasheq, etc.), Fula, and Swahili, are spoken, dialect/vernacular differentiation has occurred. In the past, communication among diverse groups was facilitated by either the interintelligibility of similar dialects in certain languages or by the use of vehicular languages.

European colonisation, which imposed arbitrary borders that split many language communities, and overlaid English, French, Portuguese, and Spanish (ELWCs) as administrative languages, added to Africa’s linguistic complexity. ELWCs assumed the role of linguae francae for official use and contact/exchange across the continent and with the rest of the world. They have also tended to divide elites from the mass of society, or at least to be used as markers of elite status (see Mazrui and Mazrui 1998). To a certain extent these languages have over the years also become vehicular in some regions and urban areas, and in some cases adopted as the home language among elites. In addition, creoles based on ELWCs have become established either as vehicular languages, such as Krio in Sierra Leone, or as the first language, such as the creoles of Cape Verde to the west and several Indian Ocean island states to the east.

Languages, dialects and "macrolanguages"

The complex linguistic situation in Africa also raises questions about the "borders" of languages and the very definitions of language and dialect. This is a contested terrain in the study of linguistics, with those tending to emphasize distinctions among languages sometimes characterised as "splitters" and those emphasising the commonalities called "joiners" or "lumpers."

The researchers of SIL International who have compiled the well known Ethnologue listing of languages (Gordon, ed. 2005) may fairly be considered splitters. In the language profiles section of this document, very often several Ethnologue listings for variants of a given tongue are consulted for the listings we use for discussion of localisation. By their count, Africa has 2092 languages (ibid.).

On the other end of the spectrum one might place the Centre for Advanced Study of African Languages (CASAS), which has been researching groups of languages with the idea that, functionally Africa has far fewer separate languages than is often claimed. Its director, Kwesi Kwaa Prah (2002; 2003), suggests that 75-85 percent of Africans speak as a first or additional language, 12-15 "core languages," which in fact are clusters of more or less closely related languages.

Even splitters acknowledge that there may be degrees of interintelligibility of different tongues, and beyond that sometimes the speakers of related tongues may hold ideas (one might call ideologies) of their fundamental unity. For such situations, SIL has adopted a category of "macrolanguages."19

The terminology, new and old – including such as "language cluster" (closely related but not highly interintelligible), "dialect continuum" (language variation over territory such that communication gets progressively difficult over distance), "dialect levelling" (reduction of differences within a language due to contact), etc. – reflect the complexity of the situation.

What all this means for localisers is that in some cases, perhaps many, they will have to negotiate different sets of categories in deciding what to localise for. For instance in a tongue like Fula (Fulfulde/Pulaar) that is spoken across much of West Africa, though by a minority in each country where it is present, there are clear differences among variants of the language, but also enough similarity to permit communication by speakers of most of its different variants. What should software be localised for – one language, the nine that Ethnologue divides it into, or some set of groupings of close dialects? And what should be the approach for localised content in Fula – the same as for software or a different set of criteria?

The case of Arabic also merits mention, as Ethnologue has sixteen separate listings for northern and eastern Africa alone (the total number of listings is 40). The difference in this case is that there is an established common standard form – Modern Standard Arabic – unlike the case with many other African languages.

In other cases, languages that are linguistically closely related may be spoken by groups that may actually emphasize mutual differences (such as is apparently the case with Teso and Turkana).

This topic is discussed further below in the context of a system of codes used in ICT? to designate specific languages or groups of languages (section 6.3).

Linguistic geography and localisation

In addition to the complex patterns of location of speakers of certain languages, and the overlapping of territory where various languages are spoken, the borders inherited from the colonial partition of Africa have also divided linguistic communities. The latter has in many cases led to additional changes and divergence within languages.

There has been discussion over the years of developing a linguistic atlas of Africa and of specific countries, and in some cases maps with language distributions have been produced. Such an atlas project could be incorporated into larger localisation efforts, first to understand the ranges of potential use of localised software and keyboard standards, and second to compare the language distributions with computer and internet access.

4.2 Sociolinguistics and language change

African societies are by and large multilingual and very often individuals in them master several languages to varying degrees, and use them in different contexts or together in what linguists call codeswitching. These societies are also seeing a significant degree of change in how people use language, including changes in urban areas, dialect levelling,20 and in in the case of less widely spoken languages, impoverishment or contraction,21 endangerment, and extinction. It is not an exaggeration to say that the African linguistic terrain is experiencing many changes, with some of those working in countervailing directions.22

Of major importance is the role of attitudes and perceptions, among Africans as well as among foreigners working for development and education in Africa. It is not uncommon to encounter more or less negative attitudes concerning African languages, from their utility as mere "local languages" vis-à-vis ELWCs, to doubts about their capacity for expression of complex thought and scientific concepts. On the other hand there is commitment among many to the use and development of their maternal languages. (See further related discussion under Terminology below, section 4.5.)

Such divergence of opinions is of course to be expected in any complex society regarding any number of topics, and to expect otherwise in the case of language is not realistic.23 Although it has never been surveyed to our knowledge, there is certainly a significant interest – perhaps latent – in localised software and content among a significant number of people.

Negative attitudes, however, represent a potential discouragement to people seeking to use the languages in various ways, including localisers. For the latter it is not so much that one needs to convince potential users who have no interest in localised software and content, but negative attitudes may discourage projects.24 In addition, negative attitudes may retard progress that can benefit potential users who are interested.

Beyond that there are other attitudes that affect localisation potential, such as the notion that use of one language precludes use of another, or that if software or web content is provided in one of the languages that many people speak, there is little or no need to think about other languages.25

Altogether, such attitudes point to an importance of education in localisation efforts. Another area of concern is the status and future of many of Africa's less widely spoken languages, a number of which are considered endangered.26 Many other less widely spoken languages, however, also are experiencing declines – or degrees of contraction – without being close to extinction.27

4.3 Language and language in education policies

Policies within African countries and indeed between and among them affect the possibilities for localisation and as such represent a critical factor in localisation ecology. ICT policies are treated in the next section (5.2), but here policies related to language are considered. In particular, two overlapping policy areas are of concern:28

  • Language policy and related concerns of language planning and management; of particular practical interest for localisation efforts are decisions concerning orthographies
  • Education policy, with specific attention to issues of African languages as media of instruction and as subject matter.

Language policy in Africa

Halaoui (2001) distinguishes between language policy and language management, and in fact, a thorough consideration of this subject would merit such a nuanced analysis. However, for purposes of understanding the localisation ecology, it is sufficient to treat the subject as a single problem while acknowledging an underlying complexity.

Language policy therefore is taken broadly to mean the set of legal and administrative mandates and guidelines concerning language use in public life, including such matters as: denoting particular tongues as "official" or "national" languages; the use of particular languages in government, legal systems, development, and education (the latter is discussed below under language in education policy); and standards, such as official orthographies (this is also treated separately, below).

Since independence from European colonisation, one might characterise African language policy concerns as being marked by two features: First, reliance on the former colonial languages – ELWCs – for government administration and education, whether that reliance was codified in law or constitution or not at all;29 and second, much discussion on the role of indigenous languages and how to use them.

A factor favouring use of the ELWCs was the concern in many cases of African governments with having a single a common language for "nation building." Bambgose (1996) characterises this as involving "two complementary myths: the first being that having several languages in a country (multilingualism) always divides; the other being that having only one language (monolingualism) always unites..." (see also Bamgbose 1991:14).

At the same time the discussions on national and regional levels of use of African languages has tended to result in proposals but little follow-through.30 This reflects perhaps a tension with the focus on one language mentioned above, as well as ambivalent attitudes about the value of African languages vis-à-vis ELWCs.

In any event, there has been a low level of attention given to language policies and planning in Africa, such that Okombo (2001), for instance, calls it a "forgotten dimension in governance and development" (see also Gadelli 1999). Bamgbose (1991: 6) finds that this is due to "a general feeling that language problems are not urgent and hence solutions to them can wait." He goes on to characterise the situation in these terms:

Language policies in African countries are characterised by one or more of the following problems: avoidance, vagueness, arbitrariness, fluctuations and declaration without implementation. (Bamgbose 1991:111)

In recent years there has been some more attention to this area, such as in education (see below), the formation on a regional level of the African Academy of Languages (or Académie Africain des Langues ACALAN), and the vigorous exploration of some of these issues in post-apartheid South Africa.

In the area of ICT? and the potential for localisation, the absence of language policies that actively support African language computing means that localisation will likely depend on initiatives from individuals, organisations and companies.

Language institutions and agencies in Africa

Agencies and organisations for research and applied linguistics exist in in one form or another in each country, often as a part of government or a university (information on these, where available, is given in the country profiles, Appendix III, section 12.3). There are also some continental and regional institutions (see Appendix IV, section 12.4).

On the continental level, ACALAN, based in Bamako, operates under an African Union mandate to facilitate work with African languages. The most notable regional institutions are several set up to deal mainly with oral histories. These are listed below with more information in Appendix IV (section 12.4):

  • CELHTO - Centre d'études linguistiques et historiques par tradition orale (Center for linguistic & historical study of oral tradition)
  • CERDOTOLA Centre régional de recherche et de documentation sur les traditions orales et pour le développement des langues africaines (Regional center for research and documentation of oral traditions and for the development of African languages)
  • CIDLO - Centre d'investigation et de documentation sur l'oralité (Center for investigation and documentation of orality)
  • EACROTANAL - Eastern African Centre for Research on Oral Traditions and African National Languages (Centre est africain pour la recherche sur les traditions orales et les langues nationales africaines) (currently closed)

There are non-governmental organisations concerned with language and culture in many countries. On a continentwide scale SIL International is the most prominent, with offices in many countries.


One area of language policy that has received attention in many African states is that of orthographies for at least the major languages spoken by their populations. In these countries this has usually meant setting rules for transcription in a Latin-based alphabet.

The process of arriving at such orthographies has generally involved various actors beginning in the colonial period – including missionaries and colonial administrators – with the post independence states building on that history. To accommodate the phonetics of these languages, which often had sounds unfamiliar to Europeans, diacritics or modified letters (the latter corresponding often to characters of the International Phonetic Alphabet) to represent sounds not distinguished or present in European tongues An early example to standardize usage on a continental level was the "Africa Alphabet" of the International Institute for African Languages and Cultures (1928, 1930). This history of has been explored in some cases but neither extensively nor for all African languages.31

Several factors are important to note in considering the topic of orthographies for African languages today:

  1. While orthographies are relatively set for some major languages, they are still apparently in flux for others.
  2. One of the reasons for changes in orthographies for some languages in recent years has been a mismatch between what fonts on early computer systems (and even current ones) offer and the characters or diacritics adopted for use in print and writing.32
  3. There are in some cases separate systems of writing for the same language, sometimes with a degree of competition or even conflict among their respective advocates. This is the case for some languages with old written traditions (see below) and some that have been written only more recently.
  4. There are still many languages, mostly less widely spoken ones, that do not have formal writing systems.
  5. Conventions and policies concerning orthographies are generally set at the country level by governments or researchers, without much coordination with other states where the same languages are spoken.

With regard to the latter issue, it is interesting to note that following independence, a number of African countries in collaboration with UNESCO began serious consideration of aspects of language policy including the possibility of common rules for transcription of the many languages that cross borders. Among these the experts' meetings in Bamako in 1966 (which made reference to the Africa Alphabet mentioned above) and Niamey in 1978 (which produced its own "African Reference Alphabet") deserve mention.33

This process has yielded a certain level of standardisation which benefits current localisation efforts, at least in West Africa. There have been other efforts along these lines, such as a conference in Okahandja, Namibia in 1996 and current work in several parts of the continent by CASAS.

In most of the numerous countries where writing systems for indigenous languages existed before colonisation, Latin transcriptions were adopted "officially" in their place. The exceptions were Arabic script for the Arabic language and Ethiopic/Ge'ez for several languages in what are now Ethiopia and Eritrea. This meant that use of several scripts was marginalised from the outset. These include Arabic script (as "Ajami") for several languages of the Sahel, Yoruba, and Swahili, the Tifinagh script for Berber languages, and minority scripts such as Vai, Mende Kikakui, and Bamum. All of these continue in use to some degree but Ajami transcriptions are apparently in widespread use in some areas. However language policy has by and large ignored these practices.34

There was an effort sponsored by the Islamic States Educational, Scientific, and Cultural Organization (ISESCO) in the 1980s to develop a standardised Ajami for several languages in Africa south of the Sahara (see Chtatou 1992). This was based to a certain extent on usage in non-Arabic languages of the Middle East and has apparently not had wide acceptance in the African languages it was intended for.35

To this picture must be added African scripts of more recent origin. While some of these have become popular – the N'Ko script for Manding languages has been adopted by an increasing number of people over the half century since its creation and is now included in Unicode, and the Mandombe script for Kongo has spread in in the three decades since its creation – others have not enjoyed much success.36

The relationship of the issues of writing systems, orthographies and Unicode is discussed below (section 6.2).

Educational policy and languages of iInstruction

There is increasing attention to the issue of use of first languages in education and bilingual pedagogy in Africa, on the country and interAfrican levels.37 An extensive discussion of the rationale for first language and bilingual (or multilingual) education is beyond the scope of this paper, but it is generally agreed that this is beneficial for schoolchildren. However implementation is not simple. This is a set of issues that on the one hand can arouse some debate within countries, and on the other involves problems with teacher training, availability of materials, etc.

How this might affect localisation is another issue. One interesting case mentioned above is how Morocco's decision in 2003 to use Tifinagh in Tamazight education spurred efforts to complete encoding of the script in the Unicode standard. It may be that first language education and localisation efforts are mutually supportive, especially to the extent that ICT for education programs are introduced (such as computers for schools, the One Laptop Per Child (OLPC) project, or the involvement of ICT4D centres in adult literacy).

4.4 Basic literacy, pluriliteracy, and user skills

Among the basic factors that contribute to the intractability of the digital divide is literacy (some others are discussed in section 5). In multilingual contexts, such as what one finds over most of the African continent, the subject is perhaps more appropriately put as "pluriliteracy" – being literate in more than one language38 – though it is seldom discussed officially in these terms. User skills in terms of literacy include several possible profiles:

  • Fully pluriliterate – that is able to read and write in all languages they speak (which have a writing system)
  • Literate in an ELWC but not their mother tongue – this being the usual outcome of schooling being conducted entirely in the ELWC
  • School-leavers with varying but not complete levels of literacy in the language of schooling
  • People with little or no schooling who have learned to read their mother tongue or a local lingua franca to some level of proficiency from literacy classes given by national programs, development projects, or traditional education (such as Koranic schools)
  • Illiterates and functional illiterates.

The potential user communities of localised content and software in Africa, therefore, tend to be quite uneven in their ability to take advantage of the opportunities that these present. Moreover, this fact points to a fundamental link between education (including both literacy training and policy as pertains to languages of instruction) and localisation. Localisation efforts may do well to associate themselves with, for instance, literacy training programs – both traditional and using ICT? where public telecentres have been set up for local development – and computer for schools projects, so that students have the opportunity to encounter software and content in their first languages. In terms of localisation ecology, the factors of language, technology and education each have relevance in developing user skills.

There is also a relationship between orthography and developing reading and writing skills. This is a subject matter that is beginning to get more attention39 and has a bearing on computer use in the numerous languages for which writing systems are not well established.

All this said, literacy (pluriliteracy) rates will take time to increase in many countries of Africa. This would seem to indicate the desirability of making more effective use of audio and image in content and user interfaces.

4.5 Terminology and accommodation of ICT concepts

One aspect of language change and planning that has particular relevance to localisation is terminology. The broader area of terminology development concerns many fields of which ICT is only one. There is some attention to computer and internet terminology in this field, though it is often left to technical specialists and not linguists to find or develop terms necessary for localisation.

The ways that languages develop or borrow terminology for new and foreign concepts, the process Coulmas (1992) refers to as "language adaptation" involves several considerations. In some cases terms arise from out of the community of speakers, but where the technology or details of it are not familiar to most people terms are either borrowed from another language or invented, often by individuals or groups either self-appointed or designated by some authority. Most of the theory of this is beyond the concern of people who are working on translating software and need only to have ways of referring to various concepts.

Putting together terminology is a process that usually relies on experts in the language who have some familiarity with the technical areas for which the terminology is needed. Indeed, Microsoft has, for its localisation efforts in major African languages, used panels of experts to develop terminology and dictionaries.

The focus of localisation projects with respect to terminology is somewhat narrow – as it should be to address its specific needs. However the efforts of localisation initiatives should be informed by and in turn participate in larger terminology efforts. At the same time it should be noted that there is some debate among linguistic experts about the value of efforts to develop terminology for less widely spoken languages in all scientific domains.

The development of terminology for localisation is also a subset of the larger concern with dictionaries for use with software.

< 3. Introducing "Localisation Ecology" | Survey Document | 5. Technical Context >