java - Convert UTF-8 code (e.g., E052E472E04F) to text -


i'm working resource found online contains utf-8 codes instead of real texts (since it's arabic), , have no idea @ how convert real texts in java.

for example:

breakme~e052e472e04fe46ce04ee051e46f0020e027e04be43ee052e484e04ee4370020e052e027e47ee04fe478e050e473e412e04ee4630020e052e472e04fe46ce050e051e421e04ee051e0310020e476e050e4730020e050e051e466e04ee434e052e46fe41ee050e4210020e04fe044e47ee04fe443e04ee051e43ee46fe0270020e04fe472e04fe46be021e41ee04ee42f0020e052e43ae04ee4670020e04fe033e41ee04ee051e478e46fe0270020e41ee04ee47ce04fe051e483e04ee0230020e41ee04ee483breakme

thank you.

edit:

i reverse engineered source code , here's found:

public char[] getunicodestring(string paramstring) {     int j = paramstring.length() / 4;     char[] arrayofchar = new char[j];     int = 0;     (; ; ) {         if (i >= j) {             return arrayofchar;         }         arrayofchar[i] = ((char) integer.parseint(paramstring.substring(i * 4, * 4 + 4), 16));         += 1;     } } 

would help?

that not utf-8. utf-8 encoding bytes either single in range 00-7f, or multiple first byte in range c0-ff, followed 1 3 bytes in range 80-bf. shown sequence not match pattern, cannot utf-8.

it appears 2-byte encoding, 0020 values, appear unicode space characters. if show 2-byte hex codes separated , break lines after 0020 space, more human-readable sequence:

e052 e472 e04f e46c e04e e051 e46f 0020 e027 e04b e43e e052 e484 e04e e437 0020 e052 e027 e47e e04f e478 e050 e473 e412 e04e e463 0020 e052 e472 e04f e46c e050 e051 e421 e04e e051 e031 0020 e476 e050 e473 0020 e050 e051 e466 e04e e434 e052 e46f e41e e050 e421 0020 e04f e044 e47e e04f e443 e04e e051 e43e e46f e027 0020 e04f e472 e04f e46b e021 e41e e04e e42f 0020 e052 e43a e04e e467 0020 e04f e033 e41e e04e e051 e478 e46f e027 0020 e41e e04e e47c e04f e051 e483 e04e e023 0020 e41e e04e e483 

so, although 0020 appears space, rest of values exxx, , entire e000-f8ff range defined in unicode "private use".

so, don't know is, it's not utf-8 or utf-16 encodings of unicode.

it old dbcs (double-byte character set) code page, guess.


Comments

Popular posts from this blog

html - Styling progress bar with inline style -

java - Oracle Sql developer error: could not install some modules -

How to use autoclose brackets in Jupyter notebook? -