3.1.python

https://stackoverflow.com/questions/5552555/unicodedecodeerror-invalid-continuation-byte

Because UTF-8 is multibyte and there is no char corresponding to your combination of \xe9 plus following space.

Why should it succeed in both utf-8 and latin-1?

Here how the same sentence should be in utf-8:

o.decode('latin-1').encode("utf-8") 'a test of \xc3\xa9 char'

https://stackoverflow.com/questions/3942888/unicodeencodeerror-latin-1-codec-cant-encode-character

http://blog.xuite.net/ebeaoi/beast/9836928-%E7%B7%A8%E7%A2%BC%E5%95%8F%E9%A1%8C--charset%E5%8F%8Acodepage

https://stackoverflow.com/questions/878972/windows-cmd-encoding-change-causes-python-crash

https://www.ptt.cc/bbs/Python/M.1303532664.A.3D6.html https://www.v2ex.com/t/104648

Indicate a vertex component is detached or not

=================Check 'charmap' codec can't decode byte 0x8f in position 17: character maps to

=================Check 'charmap' codec can't decode byte 0x90 in position 4559: character maps to

https://stackoverflow.com/questions/30750843/python-3-unicodedecodeerror-charmap-codec-cant-decode-byte-0x9d

In Python 3, files are opened text (decoded to Unicode) for you; you don't need to tell BeautifulSoup what codec to decode from.

If decoding of the data fails, that's because you didn't tell the open() call what codec to use when reading the file; add the correct codec with an encoding argument:

=================Check 'utf-8' codec can't decode byte 0xc7 in position 17: invalid continuation byte

=======================building line [2017-09-11 18:47:56]Export 'D:\DB_FILE\125098_75810_2017911_184748\line.txt' begins... [2017-09-11 18:47:56]Total Line count: 175 [2017-09-11 18:47:56]FormDBFormat end, Time spend: 0:00:00.080714 'gbk' codec can't encode character '\xf4' in position 262: illegal multibyte sequence

=======================building line [2017-09-11 19:06:30]Export 'D:\DB_FILE\125098_75812_2017911_19621\line.txt' begins... [2017-09-11 19:06:30]Total Line count: 175 [2017-09-11 19:06:30]FormDBFormat end, Time spend: 0:00:00.091773 'latin-1' codec can't encode characters in position 17-19: ordinal not in range(256)

Last updated