That is because UTF8 alway use 3 byte to store Japaneses characters, no matter kanji or kana half size or kana full size character. You can see the different if you use Japanese encode Shift_JIS which will use 1 byte for half size and 2 bytes for kanji and full size.
->To see more detail, you can use notepad++ editor, use 3 text file to store Japaneses character
- 1 file format with encode utf8
- 1 file format with encode shift-jis
- 1 file format with encode ANSI
Then look at detail of file size in window explorer, you will see the difference
-> other way, i am java programmer, so here is my code to check bytes in difference encode
public static void main(String[] args) {
// TODO Auto-generated method stub
String fullKana = "ナ";
String halfKana = "ï¾…";
String kanji = "使";
try {
System.out.println("=====fullKana==Shift_JIS===" + fullKana.getBytes("Shift_JIS").length);
System.out.println("=====halfKana==Shift_JIS===" + halfKana.getBytes("Shift_JIS").length);
System.out.println("=====kanji==Shift_JIS===" + kanji.getBytes("Shift_JIS").length);
System.out.println("=====fullKana==UTF8===" + fullKana.getBytes("UTF8").length);
System.out.println("=====halfKana==UTF8===" + halfKana.getBytes("UTF8").length);
System.out.println("=====kanji==UTF8===" + kanji.getBytes("UTF8").length);
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
my results is
=====fullKana==Shift_JIS===2
=====halfKana==Shift_JIS===1
=====kanji==Shift_JIS===2
=====fullKana==UTF8===3
=====halfKana==UTF8===3
=====kanji==UTF8===3
->To see more detail, you can use notepad++ editor, use 3 text file to store Japaneses character
- 1 file format with encode utf8
- 1 file format with encode shift-jis
- 1 file format with encode ANSI
Then look at detail of file size in window explorer, you will see the difference
-> other way, i am java programmer, so here is my code to check bytes in difference encode
public static void main(String[] args) {
// TODO Auto-generated method stub
String fullKana = "ナ";
String halfKana = "ï¾…";
String kanji = "使";
try {
System.out.println("=====fullKana==Shift_JIS===" + fullKana.getBytes("Shift_JIS").length);
System.out.println("=====halfKana==Shift_JIS===" + halfKana.getBytes("Shift_JIS").length);
System.out.println("=====kanji==Shift_JIS===" + kanji.getBytes("Shift_JIS").length);
System.out.println("=====fullKana==UTF8===" + fullKana.getBytes("UTF8").length);
System.out.println("=====halfKana==UTF8===" + halfKana.getBytes("UTF8").length);
System.out.println("=====kanji==UTF8===" + kanji.getBytes("UTF8").length);
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
my results is
=====fullKana==Shift_JIS===2
=====halfKana==Shift_JIS===1
=====kanji==Shift_JIS===2
=====fullKana==UTF8===3
=====halfKana==UTF8===3
=====kanji==UTF8===3
Comments
Post a Comment