inicio mail me! sindicaci;ón

Teach what you make GBK and Unicode to contrast watch

Before paragraph time, the problem that a failure turns between an Unicode and Gb was encountered in participates in project, is the encode of a few not commonly used Chinese characters turned into ” ? ? ” , these Chinese characters were not shown come, then oneself did a few research to pertinent question and make the problem is able to solve finally. Be united in wedlock now in front the fundamental of two Unicode and GB respect, the introduction makes GBK-Unicode encode this kind contrast the method of the watch.

The string String of Java kind the function is powerful, not only can undertake a few basic string are operated, return the string that can appoint character set according to needing construction, the method that article place introduces is benefit this, the main train of thought of this kind of method is:

1, all Chinese characters in alling over all previous GBK to make up stopwatch, the GB code that uses this word is tectonic a string. Cent piece compares the Chinese character of departmental cent in GBK encode watch orderly, all over all previous very easily.

2, the byte array that use GetBytes() method acquires this character, because Java shows character with Unicode, so the Unicode of this Chinese character with respect to amid.

It is code of a paragraph of give typical examples below:

It is to cite below extract:

{

Int Count = 0;

For(int SegIndex=0xb0; SegIndex<=0xf7; SegIndex ) {

For(int CharIndex=0xa1; CharIndex<=0xfe; CharIndex ) {

Byte [] GbkBytes = New Byte[] {(byte)(segIndex) , (Byte)charIndex};

Byte [] UnicodeBytes;

String Str = New String(gbkBytes, “GBK”);

UnicodeBytes = Str.getBytes(”unicode”);

If(unicodeBytes.length==4) {

Count ;

String Buffer = “”;

For (int I=0;i<gbkBytes.length;i )

Buffer = (int)(0×00ff&gbkBytes[i]) " ";

For (int I=3;i>1;i- - )

Buffer = (int)(0×00ff&unicodeBytes[i]) " ";

Buffer = ” “;

Osw.write(buffer);

}

}

}

}

This paragraph is the code that have one by one to the Chinese character of GBK/2 area and processes, limits is in a byte of GBK/2 area [0xb0, 0xf7] , end byte limits is in [0xa1, 0xfe] , the character set that uses when tectonic string is GBK:

It is to cite below extract:

String Str = New String(gbkBytes, “GBK”);

The meeting in the byte array that obtains in use GetBytes() has 4 elements, what do before two do not know do to use, the structure of likelihood and string itself is concerned, next two byte just are right Unicode code. But these two byte pour foreword, want to begin to take from the last byte, concern with Big_endian and Little_endian so, here does not say more.

When every time lining ends circularly, before Buffer string is medium two numbers are code of a GB, two numbers are an Unicode code from the back, wrote it in the file to go.

After such file gets, be record in a file in another programs again, load Unicode value array, with GB the code is index, OK very check so that Unicode piles up by GB code conveniently

Bookmark:Digg Del.icio.us Reddit

Leave a Comment