Python 风格指南笔记

函数与方法装饰器

  • 优点:优雅的在函数上指定一些转换,该转换可能减少一些重复代码,保持已有函数不变(enforce invariants)
  • 缺点:装饰器可以在函数的参数或返回值上执行任何操作,这可能导致让人惊异的隐藏行为。而且,装饰器在导入时执行。从装饰器代码的失败中恢复更加不可能。
  • 结论:如果好处很显然,就明智而谨慎的使用装饰器。装饰器应该遵守和函数一样的导入和命名规则。装饰器的 Python 文档应该清晰的说明函数是一个装饰器。请为装饰器编写单元测试。避免装饰器自身对外界的以来(即不要依赖于文件、socket、数据库连接等)

线程

优先使用 Queue 模块的 Queue 数据类型作为线程间的数据通信方式。另外,使用 threading 模块及其锁原语(locking primitives)。了解条件变量的合适使用方式,这样你就可以使用 threading.Condition 来取代低级级别的锁了。

威力过大的特性

  • Tip:避免使用这些特性
  • 优点:强大的语言特性,能让你的代码更紧凑
  • 缺点:使用这些很 cool 的特性十分诱人,但不是绝对必要。使用奇技淫巧的代码将更加难以阅读和调试。开始可能还好,但当你回顾代码,它们可能比那些稍长一点但是更直接的代码更加难以理解。
  • 结论:在你的代码中避免使用这些特性。

注释

Python 有一种独一无二的注释方式:使用文档字符串。文档字符串是包、模块、类或函数的第一个语句。这些字符串可以通过对象的 __doc__ 成员被自动提取,并且被 pydoc 所用。

组织方式:

  • 第一行以句号、问号或惊叹号结尾的概述(或者改文档字符串只有单纯的一行)
  • 接着是一个空行
  • 接着是文档字符串的剩余部分,它应当与文档字符串的第一行的第一个引号对齐

模块

每个文件应该包含一个许可样板。根据项目使用的许可(例如:Apache 2.0、BSD、LGPL、GPL)选择合适的样板。

函数和方法

一个函数必须要有文档字符串,除非它满足以下条件:

  1. 外部不可见
  2. 非常短小
  3. 简单明了

文档字符串应该包含函数做什么,以及输入和输出的详细描述。通常,不应该描述「怎么做」,除非是一些复杂的算法。文档字符串应该提供足够的信息,当别人编写代码调用该函数时,他不需要看一行代码,只要看文档字符串就可以了。对于复杂的代码,在代码旁边加注释胡比使用文档字符串更有意义

关于函数的几个方面应该在特定的小节中进行描述记录,这几个方面入下文所示,每节应该以一个标题行开始。标题行以冒号结尾,除标题行外,节的其他内容应被缩进 2 个空格。

  • Args:列出每个每个参数的名字,并在名字后面使用一个冒号和空格,分隔对该参数的描述。如果描述太长超过了单行 80 个字符,使用 2 或者 4 个空格的悬挂缩进。描述应该包括所需的类型和含义。如果一个函数接受可变长参数列表或者任意关键字参数,应该详细列出这两者。
  • Returns(或者 Yields,用于生成器):描述返回值的类型和语义,如果函数返回 None,这一部分可以省略。
  • Raises:列出与接口有关的所有异常。

例子:

def fetch_bigtable_rows(big_table, keys, other_silly_variable=None):
    """Fetches rows from a Bigtable.

    Retrieves rows pertaining to the given keys from the Table instance
    represented by big_table.  Silly things may happen if
    other_silly_variable is not None.

    Args:
        big_table: An open Bigtable Table instance.
        keys: A sequence of strings representing the key of each table row
            to fetch.
        other_silly_variable: Another optional variable, that has a much
            longer name than the other args, and which does nothing.

    Returns:
        A dict mapping keys to the corresponding table row data
        fetched. Each row is represented as a tuple of strings. For
        example:

        {'Serak': ('Rigel VII', 'Preparer'),
         'Zim': ('Irk', 'Invader'),
         'Lrrr': ('Omicron Persei 8', 'Emperor')}

        If a key from the keys argument is missing from the dictionary,
        then that row was not found in the table.

    Raises:
        IOError: An error occurred accessing the bigtable.Table object.
    """
    pass

类应该在其定义下有一个用于描述该类的文档字符串。如果你的类有公有属性,那么文档中应该有一个属性段,并且应该遵守和函数参数相同的格式。

class SampleClass(object):
    """Summary of class here.

    Longer class information....
    Longer class information....

    Attributes:
        likes_spam: A boolean indicating if we like SPAM or not.
        eggs: An integer count of the eggs we have laid.
    """

    def __init__(self, likes_spam=False):
        """Inits SampleClass with blah."""
        self.likes_spam = likes_spam
        self.eggs = 0

    def public_method(self):
        """Performs operation blah."""

块注释和行注释

最需要些注释的是代码中那些技巧性的部分。对于复杂的操作,应该在其操作开始前写上若干行注释,对于不是一幕了然的代码,应该在其行尾添加注释。

为了提高可读性,注释至少应该离开代码 2 个空格。

绝对不要描述代码,假设阅读代码的人比你更懂 Python,他只是不知道你的代码要做什么。

如果一个类不继承自其它类,就显式的从 object 继承,嵌套类也一样。

TODO 注释

TODO 注释应该在所有开头处包含「TODO」字符串,紧跟着的是用括号括起来的你的名字,email 地址或者其他标识符。接着必须有一行注释,解释要做什么。

Example:

# TODO(kl@gmail.com): Use a "*" here for string repetition.
# TODO(Zeke) Change this to use relations.

如果你的 TODO 是「将来做某事」的形式,那么请确保你包含了一个指定的日期或者一个特定的时间。

导入格式

每个导入应该独占一行。导入总应该放在文件顶部,位于模块注释和文档字符串之后,模块全局变量和常量之前,导入应该按照从最通用到最不通用的顺序分组:

  1. 标准库导入
  2. 第三方库导入
  3. 应用程序制定导入

每种分组中,应该根据每个模块的完整包路径按字典顺序排序,忽略大小写。

访问控制

字啊 Python 中,对于琐碎又不太重要的访问函数,你应该直接使用公有变量来取代它们,这样可以避免额外的函数调用开销。当添加更多的功能时,你可以用属性(property)来保持语法的一致性。

另一方面,如果访问更复杂,或者变量的访问开销很显著,那么你应该使用像 get_foo()set_foo() 这样的函数调用。如果之前的代码行为允许通过属性(property)访问,那么久不要井新的访问函数与属性绑定。这样,任何试图通过老方法访问变量的代码就没法运行,使用者也就会意识到复杂性发生了变化。

命名

Python 之父 Guido 推荐的规范:

2017/2/15 posted in  Python

搭建 Python 科学计算环境

最近在看《Python for Data Analysis》(利用 Python 进行数据分析)这本书,贴一点笔记,这一篇是关于环境搭建的。另外吐槽一下,书中还是有不少错误的,语法错误就发现了好多处,大概读完了之后会整理出一份勘误表出来,可能是因为写书的时候是 14 年,两年过去了,pandas 库也有了一些变化。

安装虚拟环境

不想把系统的 python 库搞得乱乱的(其实已经很乱了),所以还是建一个独立虚拟环境专门来做科学计算吧。具体的方法我在virtualenv 相关笔记这篇博客中已经详细写了,建议将启动虚拟环境的命令添加到终端的配置文件中去(使用alias),这样就避免每次一打开就输入一长串命令了。

因为科学计算社区的一些库还是基于 Python 2.x 版本的,所以这里我们使用的 Python 版本为 2.7。

然后使用以下命令一键安装所需要的库:

sudo pip install numpy pandas matplotlib jupyter scikit-learn

安装不上的请检查是不是需要翻墙。

IPython

熟悉 Python 的同学应该对这个解释器不陌生,自带的 Python 解释器实在是太弱了。它与传统的“edit-compile-run”(编辑-编译-运行)方式的区别在于,它鼓励使用“execute-explore”(执行-探索),所以特别适合用在计算和数据分析领域,可以方便得使用「试错法」和「迭代法」进行开发。这里主要介绍它基于 Web 的交互式笔记本功能(命令行中大同小异)。

开启 IPython Notebook

使用以下命令来打开 IPython Notebook:

(ENV2.7)$ jupyter notebook

这样 server 就启动了,浏览器会自动打开一个目录树。

Note:记住在启动了虚拟环境的状态下使用这条命令,要不然就会使用系统的 IPython 版本来运行。

然后我们新建一个 IPython Notebook 用作演示:

14776647413275

In [1] 中的命令是为了能让我们直接在 IPython Notebook 中集成显示 matplotlib 画的图片,所以如果是用作科学计算的话,首先先执行以下这条命令再说。

内省

在变量的前面或后面加上一个 ? 就可以将有关该对象的一些通用信息显示出来。

14776649884990

基本上什么都能看。

%run 命令

使用 %run 可以运行本地的 Python 脚本,并可以在 IPython 中访问脚本中定义的所有变量。

如果想要脚本能够访问 IPython 中的命名空间,可以使用 %run -i 命令。

测试代码的执行时间

使用 %time%timeit 可以用来测试代码的执行时间。

14776653395450

Example

下面使用一个具体的例子来演示 IPython Notebook 的使用。

使用到的数据可以在Beyond the Top 1000 Names下载到,这是一份包含1880-2015年每年出生婴儿姓名出现次数的数据表。

由于该数据按年份被分割成了好多文件,所以第一步我们需要把所有数据组装到一个 DataFrame 中去。

14776659007521

不知道这些 Python 代码没关系,因为这里只是用来演示 IPython Notebook。

然后我们按照性别和年度统计总出生数:

14776660937109

然后绘制出表格:

14776661445090

2016/10/28 posted in  Python

virtualenv 相关笔记

记录一些有关 virtualenv 的使用笔记。

安装

直接使用 pip 来进行安装:

$ sudo pip install virtualenv

用途

主要用来创建隔离的 Python 开发环境,比如说一个项目需要用到 2.7 的库,另一个项目需要用到 3.0 的库,我们就可以使用 virtualenv 来分别给这两个项目创建虚拟的 Python 环境,这样可以有效的避免冲突。

virtualenv 会创建一个拥有独立安装目录的 Python 环境,该隔离环境不会与其他 virtualenv 环境共享模块(可以选择是否访问全局安装目录)。

使用

创建虚拟环境

最基本的使用:

$ virtualenv ENV

其中 ENV 是用来存放虚拟环境的目录。

$tree -L 1 ENV
ENV
├── bin
├── include
├── lib
└── pip-selfcheck.json

其中 libinclude 目录是用来存放新的虚拟 Python 环境的依赖库,Package 被安装到 lib/pythonX.X/site-packages/ 中,bin 目录中是新的 Python 解释器。pipsetuptools 默认被安装的。

active script

进入虚拟环境:

$ source ENV/bin/active

(如果 source 命令不存在可以使用 . 命令。)

退出虚拟环境:

$ deactivate

Removing an Environment

(ENV)$ deactivate
$ rm -r /path/to/ENV

--system-site-packages 选项

使用 virtualenv --system-site-packages ENV 将会继承全局 packages。并不是很常用的功能。

指定 Python 版本

使用 -p PYTHON_EXE 选项在创建虚拟环境的时候制定 Python 版本。

Python 2.7:

$ virtualenv -p /usr/bin/python2.7 ENV2.7

Python 3.5:

$ virtualenv -p /usr/local/bin/python3.5 ENV3.5

生成可打包环境

某些情况下,我们可能需要在别的地方使用这个已经配置好的虚拟环境,可以使用 virtualenv --relocatable 将 ENV 修改为可迁移的。

(ENV)$ virutalenv --relocatable ./
2016/7/17 posted in  Python

The Python Tutorial Reading Notes

The Python Tutorial is a good stuff to learn Python. I have already read it sketchily before, but didn't foucs on some detail things. This time I read it again and write some notes.

Data Structures

More on Lists

  • list.append(x): Equivalent to a[len(a):] = [x]
  • list.extend(L): Extend the list by appending all the items in the given list. Equivalent to a[len(a):] = L
  • list.insert(i, x): The first argument is the index of the element before which to insert. a.insert(len(a), x) == a.append(x).
  • list.remove(x): Remove the first item form the list whose value is x.
  • list.pop([i]): Remove the item at the given position in the list, and return it. If no index is specified, a.pop() removes and returns the last item in the list.(parameters with square brackets are optional)
  • list.clear(): Remove all item from the list. Equivalent to del a[:]
  • list.index(x): Return the index in the list of the first item whose value is x.
  • list.count(x): Return the number of times x appears in the list.
  • list.sort(key=None, reverse=False): Sort the items of the list in place.
  • list.reverse(): Reverse the elements of the list in place.
  • list.copy(): Return a shallow copy of the list. Equivalent to a[:].

Using Lists as Stacks

Use append and pop.

Using Lists as Queues

Lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from beginning of a list is slow.

Better to use collections.deque.

List Comprehensions

  • x = [item for item in series]
  • x = [do_something(item) for item in series if expression]

Nested List Comprehensions

The initial expression in a list comprehension can be any arbitrary expression, including another list comprehension.

Example: [[row[i] for row in matrix] for i in range(4)].

The del statement

Remove an item from a list given its index. (Do not return a value) It can also remove slices from a list.

del can also be used to delete entire variables: del a.

Tuples and Sequences

Tuples are immutable, and usually contain a heterogeneous sequence of elements that are accessed via unpacking or indexing. List are mutable, and their element are usually homogeneous and are accessed by iterating over the list.

  • Empty tuples are constructed by and empty pair of parentheses: empty = ()
  • A tuple with one item is constructed by following a value with a comma: sigleton = 'hello',

The statement t = 1, 2, 'hello' is an example of tuple packing: the values are packed together in a tuple. The reverse operation is also possible: x, y, z = t.

Sets

{} or set() function can be used to create sets. Note: to create an empty set you have to use set(), not {}; the latter creates an empty dictionary.

Example:

a = set('abracadabra')
b = set('alacazam')
  • a - b: letters in a but not in b
  • a | b: letters in either a or b
  • a & b: letters in both a and b
  • a ^ b: letters in a or b but not both

Similaryly to list comprehensions, set comprehensions are also supported.

Dictionaries

Dictionaries are indexed by keys, which can be any immutable type; strings and numbers can slways be keys. Tuples can be used as keys if they contain only one kind of item. You can't use use lists as keys, since lists can be modified in place using index assignments, slice assignments, or method like append() and extend().

It is best to think of a dictionary as an unordered set of key: value pairs.

  • del can delete a key: value
  • list(d.keys()) on a dictionary returns a list of all the keys used in the dictionary, in arbitrary order (if you want it sorted, use sortted(d.keys()) instead).
  • To check whether a single key is in the dictionary, use the in keyword. (in or not in)
  • Dict comprehensions can be used to create dictionaries from arbitrary key and value expressions: {x: x**2 for x in range(10)}
  • When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments: dic(sape=1, guido=2, jack=3) => {'sape': 1, 'jack': 3, 'guido': 2}

Looping Techniques

When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the items() method.

knights = {'gallahad': 'the pure', 'robin': 'the brave'}
for k, v in knights.items():
    print(k, v)

When looping through a sequence, the position index and correspoding value can be retrieved at the same time using the enumerate() function.

for i, v in enumerate(['tic', 'tac', 'toe']):
    print(i, v)

To loop over two or more sequences at the same time, the entries can be paired with the zip() function.

numbers = [1, 2, 3]
names = ['one', 'two', 'three']
for number, name in zip(numbers, names):
    print('Number: {0}, Name: {1}'.format(number, name))

To loop over a sequence in sorted order, use the sorted() function which return a new sorted list while leaving the source unaltered. for item in soted(list)

It is sometimes tempting to change a list while you are looping over it; however, it is often simple and safer to create a new list instead.

More on Conditions

  • in and not in: check whether a value occurs (or not) in a sequence.
  • is and is not: compare whether two objects are really the same object; this only matters for mutable objcts like lists.
  • Comparisons can be chained. a < b == c
  • and and or are short-circuit operators

Comparing Sequences and Other Types

The comparison uses lexicographical ordering.

Modules

A module is a file containing Python definitions and statements. The file name is the module name with the suffix .py appended. Within a module, the module's name (as a string) is available as the value of the global variable __name__.

More on Modules

Note that in general the practice of importing * from a module is frowned upon, since it often causes poorly readable code. (It ok to use in interactive sessions.)

It's one module you want to test interactively, use importlib.reload().

import importlib
importlib.reload(modulename)

Executing modules as scripts

if __name__ == "__main__":
    code

This is often used either to provide a convenient user interface to a module, or for testing purposes (running the module as a script executes a test suite).

The Module Search Path

When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path, it is initialized from these locations:

  • The directory containing the input script (or the current directory when no file is specified).
  • PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH).
  • The installation-dependent default.

After initialization, Python programs can modify sys.path. The directory containing the script being run is placed at the beginning of the search path, ahead of the standard library path. This means that scripts in that directory will be loaded instead of modules of the same name in the library directory.

"Compiled" Python files

Python caches the compiled version of each module in the __pycache__ directory. It generally contains the Python version number. This naming convention allows compiled modules from dirrerent release and different version of Python to coexist. (Example: __pycache__/fib.python-27.pyc)

Python check the modification date of the source against the compiled version to see if it's out of date and needs to be recompiled.

Standard Modules

>>> import sys
>>> sys.ps1
'>>> '
>>> sys.ps2
'... '
>>> sys.ps1 = 'C> '
C> print('Yuck!')
Yuck!

The variable sys.path is a list of strings that determines the interpreter's search path for modules. You can modify it using standard list operations.

The dir() Function

The built-in function dir() is used to find out which names a module defines.

Without arguments, dir() lists the names you have defined currently.

It list all types of names: variable, modules, functions, etc.

Input and Output

Methods of File Objects

It is good practice to use the with keyword when dealing with file objects. This has the advantage that the file is properly closed after its suite finishes, even if an exceptiohn is raissed on the way. It is also much shorter thatn writing equivalent try-finally blocks:

with open('workfile', 'r') as f:
    read_data = f.read()
2016/7/17 posted in  Python