[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

No Subject



>From kaihsu@ugcs.caltech.edu Sat Sep 14 22:21:06 1996
Return-Path: <kaihsu@ugcs.caltech.edu>
Received: from envy.ugcs.caltech.edu (root@envy.ugcs.caltech.edu [131.215.128.135]) 
        by CS.UTK.EDU with ESMTP (cf v2.9s-UTK)
	id WAA21293; Sat, 14 Sep 1996 22:21:03 -0400
Received: from nanigani.dabney.caltech.edu by envy.ugcs.caltech.edu with SMTP 
	(8.7.5/UGCS:4.43) id TAA07959; Sat, 14 Sep 1996 19:20:58 -0700 (PDT)
Message-ID: <323B6814.32B2@ugcs.caltech.edu>
Date: Sat, 14 Sep 1996 19:21:08 -0700
From: Kai-hsu Tai <kaihsu@ugcs.caltech.edu>
Reply-To: kaihsu@ugcs.caltech.edu
Organization: California Institute of Technology
X-Mailer: Mozilla 3.0Gold (Win95; I)
MIME-Version: 1.0
Newsgroups: tw.bbs.soc.taiwanese,tw.bbs.soc.hakka
CC: teoh@cs.utk.edu
Subject: Unicode & Taiwan/Tai-oan Hak-fa & Ho-lo-oe
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Status: RO

Tai5-oan5 Piau1-chun2 Po3-ko3: 
Tai5-oan5 Hak6-fa4 kap4 Ho7-lo2-oe7 sou2 
iong7 e5 ji7-goan5 kap4 in1 e5 Unicode ho7-be2 
**********************************************

Tai-oan Piau-chun Report: 
Characters used in Taiwanese Hak-fa 
and Ho-lo-oe and their Unicode encodings
****************************************

TE3, Khai2-su7
kaihsu@ugcs.caltech.edu
1996-09-14


Soat4-beng5
===========
Thau5-cheng5 sia2 chit8-e5 "k" e5 si7 tai7-piau2 chit4-e5 ji7-goan5 
ai3 the5-chhut4 kng3--jip8-khi3 Unicode lai7-te2 e5 iau1-kiu5; sia2
chit8-e5 "ch" e5 si7 chit4-ma2 to7 e7-sai2 iong7 Unicode e5 cho1-hap8 e5
hong1-hoat4 (combining) sia2--chhut4-lai5 e5 ji7-goan5 (ma7 ai3
the5-chhut4, kng3--jip8-ki3 Unicode).

Na7-si7 u7 sim2-mih8 m7-tioh8 iah8-si7 lau3-kau1 e5 sou2-chai7, chhiaN2
ka7 goa2 kong2.

Introduction
============
Characters marked with "k" are those which should be proposed to be
encoded in Unicode; those marked with "ch" are currently able to be
encoded as combining characters in Unicode (but should still be
proposed).

Please tell me if I missed anything or got anything wrong.


Si7 an2-choaN2 beh4 the5-chhut4 cho1-hap8-ho2 e5 ji7-goan5--neh4?  
=================================================================
In1-ui7 Ho7-lo2-oe7 kap Hak6-fa4 teh4 iong7 siaN1-tiau7 ki3-ho7, m7-si7
kap4 Au1-chiu1 gu2-gian5 kang5-khoan2, sam1-put4-go7-si5 chiah4 iong7,
hoan2-tng3-si7 chha1-put4-to1 10-e5 ji7-goan5 to7 u7 chit8-e5
siaN1-tiau7 ki3-ho7.  ChhiuN7 chit4-toaN7 bun5-ji7, tu5-liau2 te7 
1 siaN1 kap4 te7 4 siaN1 i2-goa7, long2 ai3 iong7 siaN1-tiau7 ki3-ho7. 
Na7-si7 bo5 hou7 cho1-hap8-ho2 e5 ji7-goan5 ho7-be2, kng3 chu1-liau7 e5
sou2-chai7 to7 ai3 cheng1-ka7 khong1-kan1.

The reason for proposing precomposed characters to be encoded:
==============================================================
Ho-lo-oe and Hak-fa is different from European languages in which
diacritics are only used occasionally.  Ho-lo-oe and Hak-fa use
diacritics to indicate the tones of every syllable.  For example, except
for tones 1 and 4, all the other numerals in the the previous passage 
require diacritics.  The space for data storage will increase a
considerable amount if some precomposed characters are not encoded.


------------------------------------------------------------------
Ho7-be2 Mia5
Code    Name
------ -----------------------------------------------------------
Combining Diacritical Marks
===========================
U+0301 COMBINING ACUTE ACCENT
U+0300 COMBINING GRAVE ACCENT
U+0302 COMBINING CIRCUMFLEX ACCENT
U+0304 COMBINING MACRON ACCENT
U+030D COMBINING VERTICAL LINE ABOVE
U+0324 COMBINING DIAERESIS BELOW
 k     COMBINING RIGHT DOT ABOVE

Precomposed Characters
======================
U+0000 -> U+007F Basic Latin [some of these are also listed below]

U+0061 LATIN SMALL LETTER A
U+00E1 LATIN SMALL LETTER A WITH ACUTE
U+00E0 LATIN SMALL LETTER A WITH GRAVE
U+00E2 LATIN SMALL LETTER A WITH CIRCUMFLEX
U+0101 LATIN SMALL LETTER A WITH MACRON
 ch    LATIN SMALL LETTER A WITH VERTICAL BAR

U+0065 LATIN SMALL LETTER E
U+00E9 LATIN SMALL LETTER E WITH ACUTE
U+00E8 LATIN SMALL LETTER E WITH GRAVE
U+00EA LATIN SMALL LETTER E WITH CIRCUMFLEX
U+0113 LATIN SMALL LETTER E WITH MACRON
 ch    LATIN SMALL LETTER E WITH VERTICAL BAR

U+0069 LATIN SMALL LETTER I 
U+00ED LATIN SMALL LETTER I WITH ACUTE
U+00EC LATIN SMALL LETTER I WITH GRAVE
U+00EE LATIN SMALL LETTER I WITH CIRCUMFLEX
U+012B LATIN SMALL LETTER I WITH MACRON
 ch    LATIN SMALL LETTER I WITH VERTICAL BAR

U+006F LATIN SMALL LETTER O
U+00F3 LATIN SMALL LETTER O WITH ACUTE
U+00F2 LATIN SMALL LETTER O WITH GRAVE
U+00F4 LATIN SMALL LETTER O WITH CIRCUMFLEX
U+014D LATIN SMALL LETTER O WITH MACRON
 ch    LATIN SMALL LETTER O WITH VERTICAL BAR

 k     LATIN SMALL LETTER O WITH RIGHT DOT ABOVE
 k     LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH ACUTE
 k     LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH GRAVE
 k     LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH CIRCUMFLEX
 k     LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH MACRON
 k     LATIN SMALL LETTER O WITH RIGHT DOT ABOVE WITH VERTICAL BAR

U+0075 LATIN SMALL LETTER U
U+00FA LATIN SMALL LETTER U WITH ACUTE
U+00F9 LATIN SMALL LETTER U WITH GRAVE
U+00FB LATIN SMALL LETTER U WITH CIRCUMFLEX
U+016B LATIN SMALL LETTER U WITH MACRON
 ch    LATIN SMALL LETTER U WITH VERTICAL BAR

U+006D LATIN SMALL LETTER M
U+1E3F LATIN SMALL LETTER M WITH ACUTE
 ch    LATIN SMALL LETTER M WITH GRAVE
 ch    LATIN SMALL LETTER M WITH CIRCUMFLEX
 ch    LATIN SMALL LETTER M WITH MACRON
 ch    LATIN SMALL LETTER M WITH VERTICAL BAR

U+006E LATIN SMALL LETTER N
U+0144 LATIN SMALL LETTER N WITH ACUTE
 ch    LATIN SMALL LETTER N WITH GRAVE
 ch    LATIN SMALL LETTER N WITH CIRCUMFLEX
 ch    LATIN SMALL LETTER N WITH MACRON
 ch    LATIN SMALL LETTER N WITH VERTICAL BAR

U+1E73 LATIN SMALL LETTER U WITH DIAERESIS BELOW
 ch    LATIN SMALL LETTER U WITH DIAERESIS BELOW WITH ACUTE 
 ch    LATIN SMALL LETTER U WITH DIAERESIS BELOW WITH GRAVE
 ch    LATIN SMALL LETTER U WITH DIAERESIS BELOW WITH CIRCUMFLEX
 ch    LATIN SMALL LETTER U WITH DIAERESIS BELOW WITH VERTICAL BAR

U+0041 LATIN CAPITAL LETTER A
U+00C1 LATIN CAPITAL LETTER A WITH ACUTE
U+00C0 LATIN CAPITAL LETTER A WITH GRAVE
U+00C2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX
U+0100 LATIN CAPITAL LETTER A WITH MACRON
 ch    LATIN CAPITAL LETTER A WITH VERTICAL BAR

U+0045 LATIN CAPITAL LETTER E
U+00C9 LATIN CAPITAL LETTER E WITH ACUTE
U+00C8 LATIN CAPITAL LETTER E WITH GRAVE
U+00CA LATIN CAPITAL LETTER E WITH CIRCUMFLEX
U+0112 LATIN CAPITAL LETTER E WITH MACRON
 ch    LATIN CAPITAL LETTER E WITH VERTICAL BAR

U+0049 LATIN CAPITAL LETTER I
U+00CD LATIN CAPITAL LETTER I WITH ACUTE
U+00CC LATIN CAPITAL LETTER I WITH GRAVE
U+00CE LATIN CAPITAL LETTER I WITH CIRCUMFLEX
U+012A LATIN CAPITAL LETTER I WITH MACRON
 ch    LATIN CAPITAL LETTER I WITH VERTICAL BAR

U+004F LATIN CAPITAL LETTER O
U+00D3 LATIN CAPITAL LETTER O WITH ACUTE
U+00D2 LATIN CAPITAL LETTER O WITH GRAVE
U+00D4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX
U+014C LATIN CAPITAL LETTER O WITH MACRON
 ch    LATIN CAPITAL LETTER O WITH VERTICAL BAR

 k     LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE
 k     LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH ACUTE
 k     LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH GRAVE
 k     LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH CIRCUMFLEX
 k     LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH MACRON
 k     LATIN CAPITAL LETTER O WITH RIGHT DOT ABOVE WITH VERTICAL BAR

U+0055 LATIN CAPITAL LETTER U
U+00DA LATIN CAPITAL LETTER U WITH ACUTE
U+00D9 LATIN CAPITAL LETTER U WITH GRAVE
U+00DB LATIN CAPITAL LETTER U WITH CIRCUMFLEX
U+016A LATIN CAPITAL LETTER U WITH MACRON
 ch    LATIN CAPITAL LETTER U WITH VERTICAL BAR

U+004D LATIN CAPITAL LETTER M
U+1E3E LATIN CAPITAL LETTER M WITH ACUTE
 ch    LATIN CAPITAL LETTER M WITH GRAVE
 ch    LATIN CAPITAL LETTER M WITH CIRCUMFLEX
 ch    LATIN CAPITAL LETTER M WITH MACRON
 ch    LATIN CAPITAL LETTER M WITH VERTICAL BAR

U+004E LATIN CAPITAL LETTER N
U+0143 LATIN CAPITAL LETTER N WITH ACUTE
 ch    LATIN CAPITAL LETTER N WITH GRAVE
 ch    LATIN CAPITAL LETTER N WITH CIRCUMFLEX
 ch    LATIN CAPITAL LETTER N WITH MACRON
 ch    LATIN CAPITAL LETTER N WITH VERTICAL BAR

U+1E72 LATIN CAPITAL LETTER U WITH DIAERESIS BELOW 
 ch    LATIN CAPITAL LETTER U WITH DIAERESIS BELOW WITH ACUTE
 ch    LATIN CAPITAL LETTER U WITH DIAERESIS BELOW WITH GRAVE
 ch    LATIN CAPITAL LETTER U WITH DIAERESIS BELOW WITH CIRCUMFLEX
 ch    LATIN CAPITAL LETTER U WITH DIAERESIS BELOW WITH VERTICAL BAR

U+207F SUPERSCRIPT LATIN SMALL LETTER N
------------------------------------------------------------------


Thong2-ke3
==========
U7 kui2-e5 ch:                       34
U7 kui2-e5 k:                        13
Ch kap4 k long2-chong2 u7 kui2-e5:   47

Statistics
==========
Total number of ch's:                34
Total number of k's:                 13
Total number of proposed characters: 47
-- 
hlo: TE3, Khai2-su7 | hak: TAI4, Khai3-si4
http://nanigani.caltech.edu