A Survey of Localisation in African Languages, and its Prospects: A Background Document


Section 1 - Introduction

1. Aside from being the maternal language for a large population in northern Africa, Arabic is also a major world language with significant speakership outside the continent, so some localisation issues implicate large markets and can draw on significant and diverse resources.

Section 2 - Background

2. This observation is frequently made. Herbert (1992:1) is among the recent sources.
3. The term "European language of wider communication" (ELWC) was introduced by Eyamba G. Bokamba (1995). "Europhone" is a more recent coinage, sometimes used to refer to European languages and speakers of them in Africa. "Language of wider communication" (LWC or sometimes LOWC) is an established term that refers to any language used vehicularly, generally in contexts where it is a second or additional language. Many African languages including Arabic serve as LWCs or as local linguae francae. ELWCs of course dominate in web content and software worldwide.
4. According to one estimate, up to 90% of the people in some countries do not speak the official languages (Mackey 1989:5, quoted in Robinson 1996:5).
5. ISO is the International Organisation for Standards. In effect, this standards organisation and the Unicode Consortium, begun as an industry association, coordinated their efforts in the mid-nineties to have a single coding system. It is sometimes called the "Universal Character Set?" (UCS) but is commonly referred to simply as Unicode. This paper will follow the latter practice.
6. This is a subject that cannot be treated in depth here but merits a brief elaboration. Minimisation of the value of all aspects of indigenous cultures in Africa was a fundamental feature of European and North American interaction with Africa for centuries during which the slave trade and colonisation were rationalised. But while such attitudes are no longer acceptable today, and indeed there is a greater appreciation of African cultures elsewhere in the world today, African languages have had little value attributed to them outside the limited circles of linguistic specialists. As late as the 1970s, a major introductory text on Africa gave little attention to African languages other than to suggest their future was in doubt (Bohannan and Curtin 1971; this statement was modified in later editions – see Bohannan and Curtin 1995). Chaudenson (2004) notes that the subject of language has been almost entirely absent from the discourse on development in Africa. And Brock-Utne (2005) calls attention to the negative attitude of foreign donors towards multilingualism in Africa, who see it as a "hindrance" to development.
7. In education and literacy training in Africa, one strategy has been to use instruction in first languages primarily as a "bridge" to learning in the official language (this is sometimes called a "subtractive bilingual" approach). Localisation of ICT in this report is not conceived with such a limited end in mind, although it is certainly true that people who learn computer use in a more familiar language would be able to acquire computer skills in an additional language more readily.
8. The focus here is mainly on the written languages, but it is important to acknowledge the importance of audio and non-text images – whether alone or in combination with text – in localisation and multilingual computing. These include some applications that will be discussed later.
9. We will mention keyboards briefly in this section. A more in-depth treatment, including discussion of speech recognition and speech-to-text is a topic below (section 7.6).
10. Even more broadly, on a "meta" level, one might also include development of tools to facilitate the process of localisation. This is different than the internationalisation of the technology. Such tools are discussed below, Section 5.4.
11. Which is not to suggest that ELWCs in Africa have no connection, but that it is different and for obvious reasons less deep.

Section 3 - Introducing "Localisation Ecology"

12. Some examples include models proposed by Duncan (1959), Rambo (1983), and Campbell and Olson (1991).
13. One might note that the South African, Jan Smuts, articulated the concept of "holism" in 1926.
15. In French: "aménagement linguistique intégré."
16. Ghana News Agency. 2005. "About 230 rural communities to get ICT centres." http://www.ghanaweb.com/GhanaHomePage/NewsArchive/artikel.php?ID=94999
17. Nsengiyumva and Stork, "Rwanda" in Gillwald (2005).
18. The marketing for the Konyin keyboard for instance includes the phrase “Does not change how you type!”No cryptic codes to remember! No training required!” - an explicit recognition of the importance of this "sociocultural" factor.

Section 4 - Linguistic Context

19. Macrolanguages, which "joiners" might in some cases simply call "languages" but which in other cases may approximate language clusters, is a category that arose in the process of reconciling different parts of the ISO-639 standard for codes representing languages.
20. This process involves in effect a blurring of dialect differences due to factors like marriage, movement of people, and broadcast media.
21. The phenomenon of speakers not mastering the language fully and in the extreme no speakers or group of speakers mastering the full range of the language.
22. Among recent sources that survey language change in contemporary Africa is one by Batibo (2005).
23. H. Russell Bernard (1996) mentions such diversity of opinions in a discussion of whether linguists should work to preserve indigenous languages.
24. The author encountered the opinion that there is "no huge demand" in Ghana for Ghanaian language interfaces or software from at least two sources. The expectation that there must be large scale demand manifest before providing interfaces or beginning localisation work for various languages fails to understand the issue of latent demand.
25. The author has encountered this attitude among some development professionals.
26. The UNESCO Red Book of Endangered Languages lists over 180 languages in Africa it considers endangered at http://www.tooyoo.l.u-tokyo.ac.jp/Redbook/Africa/AF_index.cgi .
27. There have been references for instance to Igbo – a language of Nigeria spoken by at least 18 million people – as being "endangered" based on perceptions of how the language is and is not being used and passed on (see Daily Champion 2004, Lotanna 2005). This obviously stretches the definition of endangered too far, but it also reflects popular interest and concern among many Igbo speakers.
28. A third area might be suggested in the context of localization for "digital divide" or ICT4D projects, and that is language in development more broadly. This has been treated only to a limited degree in the literature, for instance by Robinson (1996), Prah (2000), Simala (2002), and Ongarora (2002). For this study, however, language in development will be considered as part of the broader issue of language policy.
29. Many African countries do not have a legislated official language (Gadelli 1999). This fact is borne out by a country by country research of language policy (see the site L'aménagement linguistique dans le monde http://www.tlfq.ulaval.ca/axl/afrique/afracc.htm , which was one of the references used in compiling the country profiles in Appendix 3 [12.3] of this document). This is not particular to Africa, as numerous countries elsewhere (such as the United States) have not found it necessary to legislate any official language.
30. The website of the African Academy of Languages (ACALAN) has a recapitulation of how many of the declarations and plans of action issued by conferences and meetings in Africa have not been acted on. See http://www.acalan.org .
31. See for example John E. Philips (2000) on the history of Hausa orthographies. In terms of developing alphabets for multiple languages in a country, particular note should be made of the process in Cameroon where an effort to develop an alphabet has apparently met with some success (see Tadadjeu and Sadembouo 1984; Tadadjeu 1993).
32. Roger Blench (personal correspondence, 2006) notes, for instance, that much of what Kay Williamson (1984) compiled on orthographies for several Nigerian languages may not be in current use.
33. Some of the documents from these conferences are available online at http://www.bisharat.net/Documents/
34. For instance Naira currency notes in Nigeria include the amount of the note in Hausa, written in Ajami. It is the only indigenous language represented on the currency.
35. There are a number of experts who Fallou Ngom (personal correspondence, 2006).
36. See Appendix 2 (section 12.2) for more information on major scripts. Concerning the unsuccessful proposals, there have been for instance at least three writing systems proposed over the years for Hausa but not widely used, and in 2005 there was a retired professor in Senegal (Agence de Presse Sénégalaise 2005) and a merchant in the Gambia (Secka 2005) who each announced they had created new scripts for African languages.
37. It is worth noting that there have been numerous conferences and meetings over the years to discuss aspects of use of African languages in education. Two of the earliest were in 1964 in Abidjan and Ibadan (Sow 1977). Two of the most recent include one on bilingual education in Windhoek, Namibia in August 2005 and one on languages and education in Africa scheduled for Oslo, Norway in June 2006. A partial list is available at http://www.bisharat.net/Documents .
38. There are actually two terms used for this. One, "multiliteracy," is also and perhaps more frequently used to describe literacy in multiple media. The other, "pluriliteracy," has been used in some European literature in the more strict sense of the ability to read more than one language. The latter term is used here.
39. For example, see Joshi and Aaron (2005). Ethnologue has a list of resources in this topic area at http://www.ethnologue.com/LL_docs/index/Orthography(Literacy).asp .

Section 5 - Technical Context

40. Access is also an important issue where disabilities are involved, but this report will not address that dimension of access in Africa.
41. This presentation is no longer on their site, but can be viewed at http://web.archive.org/web/20041119054155/http://www.bridges.org/digitaldivide/realaccess.html .
42. It is, of course, remembered that skilled users may also have an interest in or preference for localized interfaces.
43. Other, non-technical, factors that impinge on levels of ICT? usage in Africa include literacy (mentioned above, section 4.4) and income level.
44. The Simputer project began several years ago in India as a way to address the digital divide. See http://www.simputer.org/
45. Spearheaded by Nicholas Negroponte and the Massachusetts Institute of Technology Media Laboratory, this project has a web presence at http://laptop.media.mit.edu/ and http://laptop.org/
46. The Leland Initiative "Africa Global Information Infrastructure Project" formally began in 1995 with a target of extending "full internet connectivity" to 20 or more African countries. See http://www.usaid.gov/leland/ . The IIA was founded in 1996. The two coordinated their efforts to extend connectivity to the maximum number of counties possible (Okpaku 2003).
47. The Balancing Act? (2004-2005) reports on the internet in Africa discuss these cables as does the IDRC (2005) Acacia Atlas.
48. See http://l10n.openoffice.org/languages.html . In general, open source software and operating systems have been localised to a greater degree than proprietary software (see "Open Source's Local Heroes." The Economist 4 Dec. 2003).
49. A brief internet article entitled "A Brief History of Free/Open Source Software Movement" at http://www.openknowledge.org/writing/open-source/scb/brief-open-source-history.html gives some background.

Section 6 - Africa and the Internationalisation of ICT

51. There are also three-letter and three-number codes. See http://userpage.chemie.fu-berlin.de/diverse/doc/ISO_3166.html . In addition there is a draft for country territories. See http://en.wikipedia.org/wiki/ISO_3166 for more information.
53. See http://www.ietf.org/rfc/rfc4646.txt . Earlier versions were RFC-1766 and RFC-3066. [NB- RFC 4646 was replaced by RFC 5646 in Sept 2009; see http://tools.ietf.org/html/rfc5646 ]
54. American Standard Code for Information Interchange.
55. American National Standards Institute. ANSI is a bit of a misnomer as the institute never formally adopted drafts of this standard. Nevertheless they were used as "Windows ANSI" and the term is commonly used.
56. There are fifteen in all. See http://en.wikipedia.org/wiki/ISO_8859 ; an older detailed description is available at http://czyborra.com/charsets/iso8859.html .
57. ISO-6438 is copyrighted and not available for viewing online. Uncopyrighted versions can be viewed at http://www.itscj.ipsj.or.jp/ISO-IR/039.pdf (before ISO-6438 was adopted in 1983) and http://anubis.dkuug.dk/jtc1/sc2/open/02n3129.pdf (after 1983). (NB- This is not in current use.)
58. Although largely the same, there were a number of differences in character form between the two. See for instance http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=AfrGlyphVars ; also footnote 65. The forms in ISO-6438 were retained in Unicode.
59. One exception is the Senegalese non-governmental organization ARED, headed by Sonja Fagerberg-Diallo. An early example of the kind of use possible with Macintosh computers of that era was a a learning manual for the Pular of Guinea in the extended-Latin orthography that Dr. Fagerberg-Diallo produced in 1986.
60. A web-based presentation entitled "Arabic on the Internet" (2004) gives a succinct history of this development along with other information. See http://baheyeldin.com/arabization/introduction-to-arabic-on-the-internet.html
61. Information from Yacob (personal correspondence 2006).
62. Unicode Transformation Format. There are also other UTFs, such as UTF-16 and UTF-32 (the number indicates the number of bits). Some background is given at http://en.wikipedia.org/wiki/UTF-8 .
63. Réseau international francophone d'aménagement linguistique (International Francophone Network for Language Management). See http://www.rifal.org .
64. The author is indebted to Mark Davis, Doug Ewell, and Steve Summit for their clarifications on this matter on the Unicode list (September 2006).
65. A recent example was the sample glyph used for the upper case Y with hook (used for the ejective y sound in Fula and Hausa), in which the side on which the hook is shown was changed to reflect local usage in West Africa. A discussion of this aspect of this character can be read at http://scripts.sil.org/HooktopYVariants . This was apparently an inheritance from the divergence years before between what the current practice was in Africa (as reflected in the African Reference Alphabet) and the glyph form retained in ISO documents (per ISO-6438). See http://en.wikipedia.org/wiki/%C6%B3 .
67. This outline benefitted from information from Cunningham (personal communication, 2006) and Hoskins (2003).
68. An example of different assignment of keys is the set of differences between the QWERTY? and AZERTY keyboards. The placement of the A, Z, Q, and W keys, among others, differ between the two layouts. Similarly, one can, in a keyboard driver, reassign keys without changing what is printed on them in a customised keyboard layout.
70. This description is from a webpage on the IBM site entitled "Globalize Your On Demand Business," http://www-306.ibm.com/software/globalization/topics/keyboards/iso.jsp .
71. According to the website at http://www.artlebedev.com/portfolio/optimus/ : "Every key of the Optimus keyboard is a stand-alone display showing exactly what it is controlling at this very moment."
72. Each of the profiles in the Major Languages section of this report (Appendix I) includes information on ISO-639 codes for that language.
75. Its director, Dr. Nii Quaynor of Ghana, also served from 2000-3 as At-Large Director of ICANN.
76. An organisational meeting was held in Dakar on 7 September 2005 to launch this effort. Mouhamet Diop of the Senegalese company Next SA organised the meeting.
77. This involves testing of two main alternative ways of handling non-ASCII characters and scripts (Crawford 2006).
79. A list of such resources is available at http://opensourcegis.org/

Section 7 - Current Localisation Activity

80. One of the hopes of this study is that continuing to gather progressively more specific information on the country level will facilitate more detailed cross-comparisons of technical possibilities and linguistic needs.
81. Several Yahoogroups with significant Hausa content are one example, and a Senegalese forum in which there is Pulaar and Wolof content is another.
82. There was even a "web-page by e-mail" service hosted for several years by Kabissa.org, in recognition of the fact that many people in Africa could not access the web but did have limited e-mail access.
83. It would be interesting to know more about the experience of these services. Unfortunately inquiries have yielded no replies.
84. A simple survey of websites by language done in 2000 by Vilaweb, the website of a Barcelona newspaper (Pastore 2000), listed no African languages among the 31. A follow-up to the Vilaweb survey which ranked the top 48 languages on the web found Afrikaans 42nd after languages such as Basque and Slovenian, and Swahili last following, among others, Frisian and Faeroese (Mas 2003).
85. A more recent survey by UNESCO (2005) on linguistic diversity on the internet recapitulates the information summarised in this section for Africa.
87. ALI stands for "Apprentissage des Langues africaines par l'Internet" (learning African languages on the internet). See http://www.kabissa.org/archives/a12n-forum/msg00187.html . This is not to be confused with another online program for second language learners of Akan called "ALI Akan" based in Switzerland (ALI there standing for meaning African languages on the internet).
88. This effort uses a Yahoogroup at http://tech.groups.yahoo.com/group/afrophonewikis/
89. There are still occasionally new ones created, even though Unicode makes them unnecessary. For example in 2006 a new 8-bit font was announced for the Ewe language in Togo (Togocity.com 2006).
90. Williamson (1984:66) mentions some typewriter keyboards for Nigerian languages along with strategies for typing with English keyboards. In the 1980s, the IBM company developed some typeballs with what we now call extended Latin characters for its Selectric typewriter. Mann and Dalby (1987) proposed a lower-case only keyboard for typewriters and computers based on the Niamey African Reference Alphabet, but this never caught on; see http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=IntlNiameyKybd (there is actually one keyboard layout that is based on the Mann-Dalby Niamey keyboard, but it includes upper case characters as well).
91. See for instance the Tavultesoft site http://www.tavultesoft.com or the keyboard projects links at http://www.bisharat.net/A12N/Projects .
92. This was the case for instance in Mali where the 8-bit fonts Bambara Arial and Bambara Times were developed by a project facilitated by the French agency ACCT during the late 1990s.
93. These include several by Andrew Cunningham of the OpenRoad? project at http://www.openroad.net.au/languages/files/ (these are also listed in Appendix 5).
94. See http://www.konyin.com/ . It is designed for use with Microsoft Windows software.
95. In one large cybercafé in Bamako in 2000, for instance, the author encountered French, English, and German language keyboards.
101. Nokia has localised "menu text and predictive input" for at least one phone model in Afrikaans, Arabic, and Swahili, and "menu text only" in Zulu, Xhosa, Sesotho, Yoruba, Hausa, and Igbo. See http://www.europe.nokia.com/A4160009 .
102. Markus Neteler (personal communication, 2005). For more information on GRASS, see http://grass.itc.it/ .
103. The International Association for Machine Translation (IAMT), for instance, is composed of three regional associations, one each for the Americas, Europe, and the Asia-Pacific region, but none in Africa, a continent that by itself accounts for about a third of the world's languages.
105. Jeff Allen, personal communication, 2006.
106. These are accessible via http://mokennon2.albion.edu/language.htm
109. There are three e-mail lists, one each in the working languages of English, French, and Portuguese, and a machine translation mechanism to facilitate following all discussions in each language. See http://lists.panafril10n.org/mailman/listinfo/pal-en, http://lists.panafril10n.org/mailman/listinfo/pal-fr, and http://lists.panafril10n.org/mailman/listinfo/pal-pt .

Section 8 - Needs of Localisers and for Sustainable Localisation

110. In the framework of the PLETES model this would refer to two points in the localization dynamic.

Section 9 - Summary and Recommendations

111. Helen Ladd, professor of public policy and economics at Duke University, proposed a similar question regarding the South African government: "...part of the broader language policy they need to grapple with is should all eleven [official] languages remain as viable languages?" (Aziz 2004). In other words, she was not talking even about languages in danger of extinction, for which such questions cannot be escaped, but rather the official and widely used languages of the country. This report is proposing similar questions by other African countries.
113. It is our understanding from previous experience that many people working on localization of Arabic often use English or French as working languages. Nevertheless, the possibility of working in Arabic is considered.

Section 10 - Conclusion

< 12.5 Localisation Resources | Survey Document