Encodings
Content
Links
Definitions
- Symbol encoding
- establishes rule how symbols\pictures correlate with arithmetic numbers.
(e.g. unicode)
- Character encoding
- establishes rule how numbers (signifying some character) will be encoded in bytes (and written somewhere) and vice versa.
(e.g. UTF-8, UTF-16, …)
Exists a lot of abnormal encodings (e.g. cp1251, …), which are messing up two concepts, enclosing both of them: symbol encoding and character encoding.
Different encode types
- URL encode
- (url must be represented by ascii symbols 0 - 126)
Hello World –> Hello%20%57%6f%72%6c%64 (normal ascii symbols can be represented without encode by choice)
` ` –>+
or %20
not ascii symbols: ü –> %C3%BC (utf-8 hex representation)
- HTML entities
- Any symbol can be encoded in decimal
{
or in hexģ
Encoded symbols will be not interpreted by browser as a special symbols.
’ ‘ | non-breaking space | |
  |
< | less than | < |
< |
> | greater than | > |
> |
& | ampersand | & |
& |
¢ | cent | ¢ |
¢ |
£ | pound | £ |
£ |
¥ | yen | ¥ |
¥ |
€ | euro | € |
€ |
© | copyright | © |
© |
® | registered trademark | ® |
® |
etc. |
Encoding tricks
-
Encodings latin1, gbk and character escaping
In latin1 string=
%BF%27
=¿'
After escaping symbol
%27
='
with%5C
=\
string=%BF%5C%27
In gbk encoding string=
%BF%5C%27
=縗'
If mysql
SET NAMES gbk;
was set, then this encoding trick will help to bypassmysql_real_escape_string
php function.Similar tricks can be done with next encodings:
big5
,cp932
,gb2312
,gbk
andsjis
. -
\x90
- assembler’s nop-code
Special characters
Special unicode symbols:
- unicode replacement symbol - “\ufffd”
- RTLO - RLO - Right-To-Left override - “0x202E”
|
|
|