根据文字显示乱码猜测当前编码 | Blurred code

根据文字显示乱码猜测当前编码

2022/04/14

LastMod:2022/04/14

Categories: win32

用UTF-8编码的这句话来测试。这个字符串包含emoji,emoji部分无法被编码到gbk,big5等ANSI编码。

这是编码测试fontTest👿

编解码的结果写入到UTF-8编码的txt,并通过支持UTF-8的编辑器查看。

# -*- coding: utf-8 -*-
test_str = "这是编码测试fontTest👿"


def print_test(encoding,decoding):
    print("{} encoding {} decoding".format(encoding,decoding))
    # 对于encoding过程中出现的字符串,用?替代
    # 对于decoding过程中出现的字符串,用�(U+FFFD)替代
    print(test_str.encode(encoding,errors='replace').decode(decoding,errors='replace'))


print("UTF-8编码被错误用其他编码解释")
print_test('utf-8','gbk')
print_test('utf-8','utf-16le')
print_test('utf-8','utf-16be')
print_test('utf-8','big5')
print_test('utf-8','euc-jp')
print_test('utf-8','ascii')

print("其他编码被错误用UTF-8解释")
print_test('gbk','utf-8')
print_test('utf-16le','utf-8')
print_test('utf-16be','utf-8')
print_test('big5','utf-8')
print_test('euc-jp','utf-8')
print_test('ascii','utf-8')

print("GBK编码被错误用其他编码解释")
print_test('gbk','utf-8')
print_test('gbk','utf-16le')
print_test('gbk','utf-16be')
print_test('gbk','big5')
print_test('gbk','euc-jp')
print_test('gbk','ascii')

结果输出

UTF-8编码被其他编码解码

utf-8 encoding gbk decoding
杩欐槸缂栫爜娴嬭瘯fontTest馃懣

utf-8 encoding utf-16le decoding
뿨꾘볧膠뗦閯潦瑮敔瑳鿰뾑

utf-8 encoding utf-16be decoding
駦颯雧ꂁ诨꾕景湴呥獴醿

utf-8 encoding big5 decoding
餈���舐�����瘚�霂�fontTest����

utf-8 encoding euc-jp decoding
菴����膽����羌�莚�fontTest����

utf-8 encoding ascii decoding
������������������fontTest����

其他编码被UTF-8解码

gbk encoding utf-8 decoding
���DZ������fontTest?

utf-16le encoding utf-8 decoding
ُ/fxKmՋf o n t T e s t =��

utf-16be encoding utf-8 decoding
��f/xmK�� f o n t T e s t�=�

big5 encoding utf-8 decoding
?�O????fontTest?

euc-jp encoding utf-8 decoding
?��????fontTest?

ascii encoding utf-8 decoding
??????fontTest?

GBK编码被其他解码

gbk encoding utf-8 decoding
���DZ������fontTest?

gbk encoding utf-16le decoding
쟊퓊潦瑮敔瑳�

gbk encoding utf-16be decoding
헢쫇뇠싫닢쫔景湴呥獴�

gbk encoding big5 decoding
涴岆晤鎢聆彸fontTest?

gbk encoding euc-jp decoding
宸頁園鷹霞編fontTest?

gbk encoding ascii decoding
������������fontTest?

观察几个特点: