Warning: include_once(cookbook/pmfeed.php): failed to open stream: No such file or directory in /misc/32/000/115/128/8/user/web/bisharat.net/wikidoc/local/config.php on line 63

Warning: include_once(): Failed opening 'cookbook/pmfeed.php' for inclusion (include_path='.:/usr/share/pear') in /misc/32/000/115/128/8/user/web/bisharat.net/wikidoc/local/config.php on line 63
PanAfriL10n - PanAfrLoc - AfricanLanguageTextIssues

Rendering African language text in ICTs

This section, adapted from Osborn (2001) discusses the pivotal issue of how text in African languages is handled in computing and the internet. It is also included in modified form in the PAL project Document.

Generation and transmission of text, of course, remains the primary use of computers and the internet, and an important one in mobile technology. Text of course means characters, but for many African languges this concerns more than the common Western character set or an entirely different one. The larger the number of characters outside of the simple Latin alphabet (see also LatinScript) used in the main languages of IT, the more complicated the problem becomes.

This is something that has not been adequately addressed yet in the context of Africa even though internationalisation supposedly has resolved the issues. Although Unicode in theory - and increasingly in practice - facilitates use of the main writing systems used for African langugages, there are still a range of practices. That is to say that the transition to use of Unicode is uneven and in the meantime there are various workarounds.

Since all African languages are not the same in their orthographies, it is useful to group them in three categories in order to consider what is involved and is actually being done in generating and transmitting text electronically:

  1. those that use basically the same characters one finds in the major languages of West European origin;
  2. those which use basically that same Latin alphabet but with some added letters or "extended characters"; and
  3. those which use non-Latin alphabets.

Before discussing these, it is important to point out that in more than a few cases, the orthographies are not yet fully set or accepted.

1. For African languages of the first category - that use the Latin alphabet of European languages - there are no special technical problems to working with text, production of web content, or even software localization. This is especially the case for languages like Swahili, Somali, and many in Southern Africa that use only ASCII characters (i.e., no accents). Even languages such as Sango that use several accented characters common to major European languages can be readily used in word-processing and on the web (see for example http://sango.free.fr/).

2. However, many African - and most West African - languages in their officially adopted orthographies use the Latin alphabet with a few extra or different characters/letters or less-common digraphs to represent sounds not found in major European languages. The extended alphabet adopted by many countries for their maternal languages had its genesis in an alphabet proposed in 1930 and later discussed in detail at a conference of African language experts held in Bamako in 1966. For using the languages whose orthographies have these extended characters on computers and the internet several approaches have been observed:

(a) The "correct" one. That is, in a word processor to have a font that includes these characters. These days that usually means a Unicode font, though there are still many 8-bit fonts in use, often created to meet specific needs on a local level or as part of a commercial line of multilingual software. The latter of course, like keyboard arrangements for them, are as a rule, not intercompatible.

For the web, that means being able to have these added characters in a text with a standard code for each character, a single code set including these, and some standard set of glyphs on the receiving end that a browser would call up to represent them. In other words, Unicode. However lack of appropriate fonts, incorrect settings, or old browsers may nullify the utility of Unicode (utf-8). In looking at the Fula (Peulh, Pulaar), Ewe, Kabye, and Maninka versions of the Universal Declaration of Human Rights at http://www.unhchr.ch/udhr/navigate/region.htm. If you get a lot of empty boxes in the texts then you can see why people still are using workarounds such as follow below to create and share text in these languages.

(b) The "old-correct" or obsolete one. That is, for some languages such as Bambara, Ewe, or Fula (Pular/Fuuta Jalon) some digraphs or accented characters used in European languages were employed before the special characters of the extended alphabet were officially adopted (e.g., "ny" or "n tilda" for the "n with left hook"; "o accent grave" or "underlined o" for the "open o"; "dh" for the "hooked d"). This approach lets one produce and present text in almost any environment (wordprocessor, e-mail), but is not satisfactory to those who have learned in and/or are used to using the current orthography. Also, accents might be confused with tone indicators used in texts for some of the tonal languages (Bambara, Yoruba).

(c) The substitute solution: Use something that stands for the extended characters. For instance use capital letters in place of the special characters (e.g., "E" for the "open e"). An example in Bambara can be seen at an older site: http://callisto.si.usherb.ca/~malinet/index_ba.html . Another example is digraphs for modified consonants, such as "’d" or "’k" for the "hooked d" and "hooked k" as is the approach used for text in a Hausa page (see esp. the part named "Mawallafan Littattafan Hausa": http://www.gumel.com/Littattafan-Hausa.htm Yet another is to substitute similar-looking letters from other alphabets, such as the Greek letter "?" for the "hooked b" used in Fula and Hausa. Some of the texts on the Fula (Pular/Fuuta Jalon) site mentioned above (b) show this.

(d) The "little image file" solution where little image files are used for the special characters inserted as needed in the text. This is very cumbersome except for short texts. A site where that was done, for Bambara, is http://www.djembe.com/bambara_1.cfm .

(e) The "big image file" solution. Where text in proper orthography is turned into image files (.jpg or .pdf), usually for the web.

(f) The "whatever works easiest" (or "fast & dirty") solution. That is, just use the closest standard Latin letter for each special character (e.g., "e" for the "open e"). This was done with Bambara at: http://www.bok.net/pajol/index.ba.html. Examples for Hausa include http://www.unhchr.ch/udhr/lang/gej.htm and most of the site http://www.gumel.com/. The advantage is that it gets the material out there in readable form quickly, rather than working on the technical solutions or settling on a substitute solution. As a consequence, it is the method apparently used most for e-mail in African languages (and even sometimes in the case of French text, which some e-lists/groups and at least one e-newsletter disseminate without accents). The disadvantage, of course, is that many words can thus be misread.

(g) "Hybrid" solutions are a mix of a couple of the above. For example, Wolof text at http://www.bok.net/pajol/index.wo.html uses accented characters but not the letter "eng." And two sites with Fula (Pular/Fuuta Jalon) deal in different ways with the transition from the old transcription to the new, the one cited in (b) above and http://www.ibamba.net/pular/default.htm.

3. For Arabic and African languages with their own script, such as the Ethiopic/Ge'ez used in Ethiopian and Eritrean languages or Tifinagh used in Tamasheq and Amazight (Berber), special coding is necessary. For many of these some kind of 8-bit encodings (generally more than one, and mutually incompatible) were developed some years ago. The process of encoding these in Unicode is completed for Arabic, Ethiopic/Ge'ez, and recently for Tifinagh. Among the less widely used alphabets, encoding for N'ko is nearly completed and accepted, and that of Vai is in process. In any event, unlike the case for languages using extended Latin alphabets, there are no shortcut solutions - either you have the full orthography in text (or image file), or you substitute a transcription or transliteration in Latin characters.

The technical issues relating to producing and sharing text in extended-Latin or non-Latin characters are of course not the only impediments to increased African language use on computers and the internet. And to a certain degree these problems can be got around in one way or another when people have a mind (and means) to do so. However, as increasing numbers of people in Africa encounter the new technologies, greater attention should be given to fuller and more complete use of Unicode.