Learn python from shadowsocks
Shadowsocks的libc版本貌似很庞大,最新版本的Shadowsocks和ShadowsocksR对我来说也不容易理解。于是我打算学习下早期的版本,顺便锻炼一下git命令。
2012年版的Shadowsocks只有local.py和server.py两个代码文件。
Server.py中有这样一段代码,现在不太理解。
def get_table(key):
m = hashlib.md5()
m.update(key)
s = m.digest()
(a, b) = struct.unpack('<QQ', s)#把密码的md5值分为两个unsi
table = [c for c in string.maketrans('', '')]#
for i in xrange(1, 1024):
table.sort(lambda x, y: int(a % (ord(x) + i) - a % (ord(y) + i)))
return table
在命令行下测试了一下,结果如下:
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import string
>>> import struct
>>> import hashlib
>>> key="foobar!"
>>> m=hashlib.md5()
>>> m.update(key)
>>> s=m.digest()
>>> (a,b)=struct.unpack('<QQ',s)
>>> table=[c for c in string.maketrans('','')]
>>> for i in xrange(1,1024):
... table.sort(lambda x,y: int(a %(ord(x)+i)-a%(ord(y)+i)))
...
>>> table
['<', '5', 'T', '\x8a', '\xd9', '^', 'X', '\x17', "'", '\xf2', '\xdb', '#', '\x0c', '\x9d', '\xa5', '\xb5', '\xff', '\x8f', 'S', '\xf7', '\xa2', '\x10', '\x1f', '\xd1', '\xbe', '\xab', 's', 'A', '&', ')', '\x15', '\xf5', '\xec', '.', 'y', '>', '\xa6', '\xe9', ',', '\x9a', '\x99', '\x91', '\xe6', '1', '\x80', '\xd8', '\xad', '\x1d', '\xf1', 'w', '@', '\xe5', '\xc2', 'g', '\x83', 'n', '\x1a', '\xc5', '\xda', ';', '\xcc', '8', '\x1b', '"', '\x8d', '\xdd', '\x95', '\xef', '\xc0', '\xc3', '\x18', '\x9b', '\xaa', '\xb7', '\x0b', '\xfe', '\xd5', '%', '\x89', '\xe2', 'K', '\xcb', '7', '\x13', 'H', '\xf8', '\x16', '\x81', '!', '\xaf', '\xb2', '\n', '\xc6', 'G', 'M', '$', 'q', '\xa7', '0', '\x02', 'u', '\x8c', '\x8e', 'B', '\xc7', '\xe8', '\xf3', ' ', '{', '6', '3', 'R', '9', '\xb1', 'W', '\xfb', '\x96', '\xc4', '\x85', '\x05', '\xfd', '\x82', '\x08', '\xb8', '\x0e', '\x98', '\xe7', '\x03', '\xba', '\x9f', 'L', 'Y', '\xe4', '\xcd', '\x9c', '`', '\xa3', '\x92', '\x12', '[', '\x84', 'U', 'P', 'm', '\xac', '\xb0', 'i', '\r', '2', '\xeb', '\x7f', '\x00', '\xbd', '_', 'b', '\x88', '\xfa', '\xc8', 'l', '\xb3', '\xd3', '\xd6', 'j', '\xa8', 'N', 'O', 'J', '\xd2', '\x1e', 'I', '\xc9', '\x97', '\xd0', 'r', 'e', '\xae', '\\', '4', 'x', '\xf0', '\x0f', '\xa9', '\xdc', '\xb6', 'Q', '\xe0', '+', '\xb9', '(', 'c', '\xb4', '\x11', '\xd4', '\x9e', '*', 'Z', '\t', '\xbf', '-', '\x06', '\x19', '\x04', '\xde', 'C', '~', '\x01', 't', '|', '\xce', 'E', '=', '\x07', 'D', 'a', '\xca', '?', '\xf4', '\x14', '\x1c', ':', ']', '\x86', 'h', '\x90', '\xe3', '\x93', 'f', 'v', '\x87', '\x94', '/', '\xee', 'V', 'p', 'z', 'F', 'k', '\xd7', 'd', '\x8b', '\xdf', '\xe1', '\xa4', '\xed', 'o', '}', '\xcf', '\xa0', '\xbb', '\xf6', '\xea', '\xa1', '\xbc', '\xc1', '\xf9', '\xfc']
>>> s
'\xbb\xc3\xd5/\xe5\xb7\xb3\x1d`c\xb1\xde\t\xb9\x8c\x80'
>>> a
2140256442909049787
>>> b
9262981985636279136L
>>>
计算md5值时,python中hexdigest()与命令行下echo "string" | md5sum
返回的值不一样,原因是echo在字符串后加了个换行符号\n,使用echo -n "string" | md5sum
的话与python中返回的一样。
>>> test=hashlib.md5()
>>> test.update("string")
>>> test.hexdigest()
'b45cffe084dd3d20d928bee85e7b0f21'
$ echo "string" | md5sum
b80fa55b1234f1935cea559d9efbc39a -
$ echo -n "string" | md5sum
b45cffe084dd3d20d928bee85e7b0f21 -
The line config = json.loads(f.read().decode(‘utf8’), object_hook=_decode_dict) in shadowsocks-2.6/shadowsocks/utils.py
I don’t understand what json.loads mean, what are f.read().decode(‘utf8’) doing, what is object_hook and what is _decode_dict. In this page, I find the following information.
json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]]) Deserialize s (a str or unicode instance containing a JSON document) to a Python object using this conversion table.
JSON | Python |
---|---|
object | dict |
array | list |
string | unicode |
number (int) | int, long |
number (real) | float |
true | True |
false | False |
null | None |
object_hook is an optional function that will be called with the result of any object literal decoded (a dict). The return value of object_hook will be used instead of the dict. This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting).
We run shadowsocks server with ssserver -c config.json and run client with sslocal -c config.json, I think function json.loads() here must be trying to convert a JSON document to a python dict. And object_hook is an optional function, but I find no matter I use object_hook=_decode_dict or not, this line return the same value. So, I still don’t understand very well.
I try to test f.read().decode(‘utf8’) with the following line:
>>> with open('config.json', 'rb') as f:
... test=f.read().decode('utf8')
...
>>> test
u'{\n "server":"my_server_ip",\n "server_port":8388,\n "local_address": "127.0.0.1",\n "local_port":1080,\n "password":"mypassword",\n "timeout":300,\n "method":"aes-256-cfb",\n "fast_open": false,\n "workers": 1\n}'
>>>
The original code block:
My test file:
Related configure file config.json:
{
"server":"my_server_ip",
"server_port":8388,
"local_address": "127.0.0.1",
"local_port":1080,
"password":"mypassword",
"timeout":300,
"method":"aes-256-cfb",
"fast_open": false,
"workers": 1
}
When I run python test.py, I get the following result:
{u’server_port’: 8388, u’local_port’: 1080, u’workers’: 1, u’fast_open’: False, u’server’: ‘my_server_ip’, u’timeout’: 300, u’local_address’: ‘127.0.0.1’, u’password’: ‘mypassword’, u’method’: ‘aes-256-cfb’}
when I run python3 test.py, I will get this result:
{‘password’: b’mypassword’, ‘server’: b’my_server_ip’, ‘method’: b’aes-256-cfb’, ‘server_port’: 8388, ‘fast_open’: False, ‘local_address’: b’127.0.0.1’, ‘workers’: 1, ‘timeout’: 300, ‘local_port’: 1080}
文件名 | 功能 | 备注 |
---|---|---|
asyncdns.py | 异步处理dns请求 | 提供远端dns查询 |
common.py | 工具函数 | 进行header和addr数据包处理 |
utils.py | 工具函数 | 实现config配置检查 |
daemon.py | linux守护进程 | |
encrypt.py | 加密解密 | 简单易懂,不涉及到底层算法,封装很好 |
eventloop.py | 事件循环 | 实现IO复用,封装成类Eventloop,多平台 |
local.py | 客户端 | |
server.py | 服务端 | 支持多用户 |
lru_cache.py | 缓存 | 基于LRU的Key-Value字典 |
tcprelay.py | tcp的转达 | |
udprelay.py | udp的转达 | local <=> client <=> server <=> dest |
eventloop.py
事件循环类
使用select、epoll、kqueue等IO复用实现异步处理。
优先级为epoll\>kqueue\>select
Eventloop将三种复用机制的add,remove,poll,add_handler,remve_handler接口统一
程序员只需要使用这些函数即可,不需要处理底层细节。
当eventloop监测到socket的数据,程序就将所有监测到的socket和事件交给所有handler去处理
每个handler通过socket和事件判断自己是否要处理该事件,并进行相对的处理:
udprelay.py / tcprelay.py / asyndns.py
数据包的转发处理
三个文件分别实现用来处理udp的请求,tcp的请求,dns的查询请求
将三种请求的处理包装成handler。
对于tcp,udp的handler,它们bind到特定的端口,并且将socket交给eventloop,并且将自己的处理函数加到eventloop的handlers
对于dns的handler,它接受来自udp handler和tcp handler的dns查询请求,并且向远程dns服务器发出udp请求
协议解析和构建用的struct.pack()和struct.unpack()
lru_cache.py
实现缓存,英文:Least Recently Used Cache
先找访问时间_last_visits中超出timeout的所有键
然后去找_time_to_keys,找出所有可能过期的键
因为最早访问时间访问过的键之后可能又访问了,所以要_keys_to_last_time
找出那些没被访问过的,然后删除
其他
这个项目有一些比较有趣的代码,目的是兼容python3标准
- 每一个py文件前面都有导入3.x的package: future ,如果使用print “hello”,会提示出错,因此使用print(“hello”)
- 给字符串赋值时,总是带有一个前缀’b’,比如:v4addr=b’8.8.4.4’,是为了兼容py3标准,参考这里