Bytes Str Unicode的区别

为什么会写这篇文章呢，这其实是自己在看 Python 相关的书籍，里面的确有一些自己不太容易注意的地方，但是又觉得有必要进行记录，所以以博客的形式写下来。突然发现自己的为知笔记好久没有更新技术类的文章了，现在都是写在博客上。
之所以将这篇文章放在 Python 进阶这个栏目中，是因为这算是个开始吧，自己的确该提升了。

区别

Python3表示字符序列：bytes（包含原始的8位值），str（包含Unicode字符）
Python2表示字符序列：str（包含原始的8位值），Unicode（包含Unicode字符）

核心关键点

若想把Unicode字符串转换成二进制的数据，就必须使用encode方法
若想把二进制的数据转换成Unicode字符，就必须使用decode方法

2和3中字符串序列的转换

Python3，可接受str，bytes，总是返回str

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode("utf-8")
    else:
        value = bytes_or_str
    return value

Python3，可接受str，bytes，总是返回bytes

def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value = bytes_or_str.encode("utf-8")
    else:
        value = bytes_or_str
    return value

Python2，可接受str，unicode，总是返回unicode

def to_unicode(unicode_or_str):
    if isinstance(unicode_or_str, str):
        value = unicode_or_str.decode("utf-8")
    else:
        value = unicode_or_str
    return value

Python2，可接受str，unicode，总是返回str

def to_str(unicode_or_str):
    if isinstance(unicode_or_str, unicode):
        value = unicode_or_str.encode("utf-8")
    else:
        value = unicode_or_str
    return value

open的差异

在Python3中如果通过内置的open获取到文件的句柄，那么该句柄的默认编码是utf-8。而Python2中默认是的二进制。例如下面代码在Python2中可以运行，但是Python会出现错误

1 2	with open("/tmp/test.txt", "w") as f: f.write(os.urandom(10))

发生错误的原因是Python3为open函数添加了新的参数encoding，而这个参数的默认值是utf-8，这样在文件句柄上进行read和write的时候，就要求传入的值是包含unicode字符串的str实例，而不接受二进制数据的bytes类型。为了解决这个问题，我们必须用二进制写入模式(wb)来操作文件，而不能像以前一样，使用w。这样就能兼容Python2和3，代码如下：

1 2	with open("/tmp/test.txt", "wb") as f: f.write(os.urandom(10))

同样的道理，从文件中读取文件也适用rb模式，而不要使用r模式。