»
008. H5 Special Characters
HTML must use special symbols
English spaces( ):  
And symbol(&): & &
Less than sign(<): < <
Greater than sign(>): > >
Half width double quotation marks("): " "
Half width single quotation mark ('): ‘ '
� is a decimal representation ofUCS-2 encodingcharacters,&#x0000; is a hexadecimal representation ofUCS-2 encodedcharacters. UCS-2 encodingcharacter compatibilityASCII code, is a subset ofUTF-16.
Required special symbols in JS
Half width double quotation mark ("): \ u0022
Half width single quotation mark ('): \ u0027
\u represents UCS-2 in hexadecimal in JS
Required special symbols in CSS
Half width double quotation mark ("): \ 0022
Half width single quotation mark ('): \ 0027
\ represents UCS-2 in hexadecimal in CSS
With the code representation of these special symbols, HTML pages can represent any content.
In JavaScript and CSS, characters are encoded using the UCS-2 encoding scheme, which is actually a subset of UTF-16 instead of the complete UTF-16 encoding scheme. Characters that cannot be represented by two bytes, JavaScript represents through proxy pairs, which means that two UCS-2 characters combined represent one character. This character encoding method is the UCS-2+surrogate pair encoding method. The proxy character surrogate starts with& #xD800.
In JavaScript, proxy character pairs are treated as two characters, and often cause the result incorrect..
And ES6 treats proxy character pairs as a single character. So, ES6 can accurately process any character.
For example, the processing of extended characters in native JS is as follows:
"bytes:💩".split("")
The result is:
The ES6 method for processing extended characters is (ES6 syntax, expanded into an array):
[..."bytes:💩"]
The result is:
Namely: Native JS handles proxy character pairs as two characters; ES6 processes proxy character pairs as a single character.
So, the search and splitting of special characters or strings in JS is quite special, and it is necessary to consider the handling method of 4-byte characters.
For example, the following processing method is incorrect:
"bytes:💩".substring(0,7)
The result is:
The correct method is:
[..."bytes:💩"].splice(0,7).join("");
The result is:
So, ES6 has also extended regular expressions by adding the u symbol. When recognizing 4-byte text after D800, ES6 treats the 4-byte text as a single character to match ES6.
The correct way to add a JS or ES6 to handle international extended character set surnames:
var surname=[..."💩No"][0];
The result is:
-- www.v-signon-com Learner Encouragement