Skip to main content

kana, hira full/half size

That is because UTF8 alway use 3 byte to store Japaneses characters, no matter kanji or kana half size or kana full size character. You can see the different if you use Japanese encode Shift_JIS which will use 1 byte for half size and 2 bytes for kanji and full size.

->To see more detail, you can use notepad++ editor, use 3 text file to store Japaneses character

- 1 file format with encode utf8

- 1 file format with encode shift-jis

- 1 file format with encode ANSI

Then look at detail of file size in window explorer, you will see the difference

-> other way, i am java programmer, so here is my code to check bytes in difference encode

    public static void main(String[] args) {
        // TODO Auto-generated method stub
        String fullKana = "ナ";
        String halfKana = "ï¾…";
        String kanji = "使";

        try {

            System.out.println("=====fullKana==Shift_JIS===" + fullKana.getBytes("Shift_JIS").length);
            System.out.println("=====halfKana==Shift_JIS===" + halfKana.getBytes("Shift_JIS").length);
            System.out.println("=====kanji==Shift_JIS===" + kanji.getBytes("Shift_JIS").length);
           
            System.out.println("=====fullKana==UTF8===" + fullKana.getBytes("UTF8").length);
            System.out.println("=====halfKana==UTF8===" + halfKana.getBytes("UTF8").length);
            System.out.println("=====kanji==UTF8===" + kanji.getBytes("UTF8").length);

           
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
       
       
    }
my results is

=====fullKana==Shift_JIS===2
=====halfKana==Shift_JIS===1
=====kanji==Shift_JIS===2
=====fullKana==UTF8===3
=====halfKana==UTF8===3
=====kanji==UTF8===3

Comments