Normalisation and equivalence have to do with how characters with the same appearance but different code points are treated.

"Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting standard character sets, which often included similar or identical characters." (Wikipedia)

"The Unicode Standard defines two formal types of equivalence between characters: canonical equivalence and compatibility equivalence. Canonical equivalence is a fundamental equivalency between characters or sequences of characters which represent the same abstract character, and which when correctly displayed should always have the same visual appearance and behavior. ... Compatibility equivalence is a weaker type of equivalence between characters or sequences of characters which represent the same abstract character (or sequence of abstract characters), but which may have distinct visual appearances or behaviors." (Unicode Standard Annex 15)

"Unicode normalization is a form of text normalization that transforms equivalent sequences of characters into the same representation, called a normalization form in the Unicode standard.... For each of the two equivalence notions, Unicode defines two canonical forms, one fully composed, and one fully decomposed, resulting in four normal forms, abbreviated NFC, NFD, NFKC, and NFKD...." (Google)

  • NFC: Normalisation Form Composite, or Normalization Form C (it is under this heading that precomposed? or composite characters fall)
  • NFD: Normalisation Form Decomposite, or Normalization Form D
  • NFKC: Normalisation Form Compatibility Composite, or Normalization Form KC
  • NFKD: Normalised Form Compatibility Decomposite, Normalization Form KD

References & links