RXML parse error: No current scope.
 | <set eval="<date part=second>" variable="start_s">

RXML parse error: No current scope.
 | <set eval="<date part=minute>" variable="start_m">

RXML parse error: No current scope.
 | <set eval="<date part=hour>" variable="start_t">

RXML parse error: No current scope.
 | <set eval="<countdown seconds since iso=1997-12-01>" variable="surfer_time">

RXML parse error: No current scope.
 | <if variable="accept_index is 0">
 | <if variable="accept_index is 0">
The Nordic graphemes FAQ (the s.c.nordic FAQ)
nordic flags
The home pages for the Usenet newsgroup soc.culture.nordic
RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">

---->

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">
 | <if not="not" variable="print is 1">

The Nordic graphemes FAQ

This section is obsoleted and removed from the s.c.nordic FAQ,
...but kept online (at least for a while) as other documents are linked to it.

 



Subject: 1.8 

What are Nordic graphemes?

(by Tor Slettnes)

Nordic graphemes can in this context be described as:

Graphical representations of the letters that exist in the various Nordic (i.e. Icelandic, Norwegian, Danish, Swedish and Finnish) alphabets, beyond those that exist in the English alphabet.
Each of the Nordic written languages uses some additional letters compared to English. These are, in order of appearance in the alphabets:

    Letter:        Languages used:      Pronounced like:  character:
    ________________________________________________________________

    a acute        is                   'ou' in "loud"          á
    eth            is                   'th' in "there"         ð
    e acute        is (dk, no, se, fi)  'ea' in "yeah"          é
    i acute        is                   'e'  in "he"            í
    o acute        is                   'o'  in "home"          ó
    u acute        is                   'ou' in "you"           ú
    y acute        is                   'e'  in "he"            ý
    thorn          is                   'th' in "thumb"         þ
    ae             is                   'i'  in "hi"            æ
		   dk, no               'a'  in "bad"           æ
    o-slash        dk, no               'i'  in "bird"          ø
    a-ring         dk, no, se (fi)      'o'  in "bored"         å 
    a diaeresis    se, fi               'a'  in "bad"           ä
    o diaeresis    se, fi, is           'i'  in "bird"          ö
    u diaeresis    (se, fi, dk, no)     'ue' in french "rue"    ü

A set of parentheses around the country code indicates that the letter is rarely used in the corresponding language, typically only for loan words or names originating from another language. Other accents, such as (circumflex) and accent-grave are now and then used in foreign names and words in all Nordic languages.

In Denmark and Norway the alphabet is ordered:
a b c d e f g h i j k l m n o p q r s t u v w x y z æ ø å

In Icelandic eth is regarded as a variant of 'd',
while thorn, ae-ligature and o-diaeresis comes in the end:
a á b c d ð e é f g h i í j k l m n o ó p q r s t u ú v w x y ý z þ æ ö

For Finland and Sweden the order is:
a b c d e f g h i j k l m n o p q r s t u v w x y z å ä ö

If your curiosity isn't satisfied by the pronounciation guide above, there are more extensive comments in the various language sections of this faq.

 

1.8.1 How are these represented in Usenet postings and E-mail?

The "mother" of all modern character sets for computers is the original ASCII character set, now renamed to US-ASCII. (ASCII = "American Standard Code for Information Interchange"). This is a 7-bit set containing the characters needed to write American English without accents or special letters, and little more. No "foreign" letters are included.

Various standards exist for representing extra characters, some of which are: Digraph, LaTeX, ISO-646, ISO-8859-1, and the IBM codepages 437, 850, and 865. All of these sets, except the IBM codepages, are usually considered acceptable on soc.culture.nordic, e-mail, and the internet in general.

Digraphs are two-character combinations used for simplicity, and are often the most universally understood notation on soc .culture .nordic. However, when using these to non-Nordics, one should be careful to explain that these are digraphs, not two separate characters. Also, some information may get lost by using digraphs, since a filtering program will not be able to determine whether it is really a digraph or two separate characters.

LaTeX notation comes from the typesetting program by the same name, where a sequence starting with '\' may be substituted with a given character. For instance, the a-ring is written as "\aa" or "{\aa}" in LaTeX.

ISO-646 (really ISO-646-NO and ISO-646-SE) are 7-bit sets similar to US-ASCII, but with national characters substituted in place of the following characters: {, |, }, [, \, ]. This is the oldest one of the "true representation" standards mentioned here; it was used in e.g. the Nordic versions of the CP/M operating system, prior to MS-DOS. Today, it is mostly used in Sweden and Finland (although the ordering of the letters, for the sake of compability with the Danish /Norwegian /German equivalents, are not correct in these languages).

ISO-8859-1, also called ISO Latin-1, is the first of several 8-bit character sets described in International Standards Organization's document 8859. (ISO is the maintainer of the meter, the kilogram, etcetera.) This sets include all characters needed for all West European languages, leave Sámi and Esperanto. Latin-1 is a superset of US-ASCII, hence all ASCII characters maintain their original position in this set. Rather than trying to accomodate positioning in any spesific language, the letters in ISO-8859-1 are ordered according to the alphabetical position of their US-ASCII lookalikes. Latin-1 is supported through modern standardizations like MIME (RFC 1521).

The IBM codepages 437, 850, 861 and 865 are used on Personal Computers in "text" mode, and is also the default set on many MS-Windows ® communication programs. Out of the Big Blue, they were created to provide text-based PC programs with a means to create low-cost graphics, and the addition of extra characters came as a nice side effect. (Certain Nordic characters were not represented in the original codepage 437, with the consequence that in Iceland, Denmark and Norway, computers would occasionally be sold with cp 861 or 865 in the hardware. Today, alternative codepages can be downloaded to the video card via software). The Danish /Norwegian character o-slash is not represented in cp 437, and in 850 /861 /865 it is positioned with the dangerous code 155 (9B hex) -- "Upper Escape". Certain terminal types will interpret this code as the initial character of a escape command, and may e.g. clear the screen depending on the next letter. Further, it is incompatible with the established 8-bit standard Latin-1, and should be avoided.

The various notations of the Nordic graphemes follow:

    Letter   Digraph   LaTeX ISO-646   ISO-8859-1 
						   HTML          Octal  Char 
    _________________________________  _____________________________________
						      	                
    a acute       A'    \'{A}     -    alt-0193   &#193; &Aacute;  \301   Á
                  a'    \'{a}     -    alt-0225   &#225; &aacute;  \341   á
    eth           TH              -    alt-0208   &#208; &ETH;     \320   Ð
                  th              -    alt-0240   &#240; &eth;     \360   ð
    e acute       E'    \'{E}     -    alt-0201   &#201; &Eacute;  \311   É
                  e'    \'{e}     -    alt-0233   &#233; &eacute;  \351   é
    i acute       I'    \'{I}     -    alt-0205   &#205; &Iacute;  \315   Í
                  i'    \'{i}     -    alt-0237   &#237; &iacute;  \355   í
    o acute       O'    \'{O}     -    alt-0211   &#211; &Oacute;  \323   Ó
                  o'    \'{o}     -    alt-0243   &#243; &oacute;  \363   ó
    u acute       U'    \'{U}     -    alt-0218   &#218; &Uacute;  \332   Ú
                  u'    \'{u}     -    alt-0250   &#250; &uacute;  \372   ú
    y acute       Y'    \'{Y}     -    alt-0221   &#221; &Yacute;  \335   Ý
                  y'    \'{y}     -    alt-0253   &#253; &yacute;  \375   ý
    thorn         TH              -    alt-0222   &#222; &THORN ;  \336   Þ
                  th              -    alt-0254   &#254; &thorn;   \376   þ

    u diaeresis   U"    \"{U}    ^     alt-0220   &#220; &Uuml;    \334   Ü
                  u"    \"{u}    ~     alt-0252   &#252; &uuml;    \374   ü
    ae            AE    {\AE}    [     alt-0198   &#198; &AElig;   \306   Æ
                  ae    {\ae}    {     alt-0230   &#230; &aelig;   \346   æ
    o-slash       OE    {\OE}    \     alt-0216   &#216; &Oslash;  \330   Ø
                  oe    {\oe}    |     alt-0248   &#248; &oslash;  \370   ø
    a-ring        AA    {\AA}    ]     alt-0197   &#197; &Aring;   \305   Å
                  aa    {\aa}    }     alt-0229   &#229; &aring;   \345   å
    a diaeresis   A"    \"{A}    [     alt-0196   &#196; &Auml;    \304   Ä
                  a"    \"{a}    {     alt-0228   &#228; &auml;    \344   ä
    o diaeresis   O"    \"{O}    \     alt-0214   &#214; &Ouml;    \326   Ö
                  o"    \"{o}    |     alt-0246   &#246; &ouml;    \366   ö
The ISO-646 charsets for Denmark/Norway [ iso-646-NO ] and Finland/Sweden [ iso-646-SE ] are in practice obsolete, and there never existed one for Icelandic, but you may run into older 7-bits text files using them. It is to be noted that 'Ü' is not represented in iso-646-NO for Denmark/Norway.

 

---->

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">
 | <if not="not" variable="print is 1">
the Nordic languages


1.8.2 Pros and cons of the different representations

If you have been a reader of this group for a while, you may have noticed that discussion about characters and their representations occasionally accounts for quite a bit of bandwidth. It often does not take more than a question about the issue from a new reader, or someone posting an article with an IBM character set, to get a new thread going on the issue. Some want to keep 7-bit ISO-646 (be aware that they may call it "true ASCII", although strictly speaking, is not), since 7-bit codes will always get though with any setup; others want ISO-Latin-1 since it is more universal; and yet others promote digraphs as the greatest common denominator between the two.

Some pros and cons for each set:

    Character set:    Advantages:             Disadvantages:
    __________________________________________________________________

    Digraphs          * Requires 7-bit only   * Ambiguous
						("oe" or "o-slash"?)
					      * Non-optimal compromise

    LaTeX             * Non-ambiguous 7-bit   * Made for typesetting;
			representation.         somewhat cryptic for
						regular text.
					      * Non-optimal compromise

    ISO-646-SE,       * Only 7-bit "true"     * Different standards
    ISO-646-DK          representation.         for each language
    <[\]{|}>          * No data loss even     * Getting harder to
			with old hardware/      find font support
			software/setup.         (Dying out).
					      * Shadows the brace,
						sqare bracket, pipe,
						and backslash chars.

    ISO Latin 1       * Utilizes all 8 bits   * Requires 8-bit clean
    (ISO-8859-1)        in a byte; yet avoids   connection; older
    <ÐÞÆØÅÄÖðþæøåäö..>  dangerous codes.        systems may cause
		      * Universal for all       data loss.
			Western European      * May require some
			languages.              setup.
		      * Supported by ISO and  * In case of stripping,
			MIME; true subset of    becomes "FXEDVfxedv";
			Unicode.                difficult to read.

    IBM CodePages     * Uses all 256 codes;   * Uses all 256 codes;
    Machintosh set      more characters         incl. dangerous ones.
    <Unacceptable>    * Often used in PC      * Incompatible with
			environments such as    the "de-facto" 8-bit
			BBS'es.                 standard ISO-8859-1

    __________________________________________________________________

 

---->

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">
 | <if not="not" variable="print is 1">


1.8.3 How do I set up support for 7-bit ISO-646 representation?
({|}, [\])

The ISO-646 sets are still supported via varoius fonts and translation filters. Possible measures to set up support for them are: The character '[' should not be translated, because it is used in ANSI escape sequences.

Note that if you use this kind of translation, you will no longer see any of the characters '\]{|}'; in most cases this outweighs the benefits from seeing the national letters.  

---->

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">
 | <if not="not" variable="print is 1">
the Nordic languages


1.8.4 How do I set up support for 8-bit ISO-8859-1 representation?
(æøåäö, ÆØÅÄÖ)

The ISO-8859-1 (Latin 1) set is currently the most common character representation standard on soc.culture.nordic, and is also quite frequent in e.g. soc.culture.german, personal e-mail etc. However, on many systems, the ability to view these characters is not provided as "default", so you may need to configure some things on your own.

 

---->

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">
 | <if not="not" variable="print is 1">


1.8.5 References

For an index to other literature on internationalization, try: <http://www.vlsivie.tuwien.ac.at/mike/i18n.html>

I am: Tor Slettnes. 



RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if variable="print is 1">
- Is the text above really reliable?
- See the discussion in section 1.2.2!
---->
RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">
 | <if not="not" variable="print is 1">
 | <else>
the Nordic languages

© Copyright 1996-2001 by Tor Slettnes.
You are free to quote this page as long as you mention the URL.
The line of flags is modified after a picture at det Åländska skoldatanätet.
This page was last updated March the 19th in the year of 2001.

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">

RXML parse error: No current scope.
 | <if variable="print is 1">
 | <if not="not" variable="print is 1">
RXML parse error: No current scope.
 | <insert variable="start_t">
&scn_m0=
RXML parse error: No current scope.
 | <insert variable="start_m">
&scn_s0=
RXML parse error: No current scope.
 | <insert variable="start_s">
&scn_y=2024&scn_m=12&scn_d=22&scn_f=/nordic/scn/faq18.html&scn_r=https://lysator.liu.se/nordic/scn/faq18.html">