A programmer's site
点点滴滴:生活,技术
点点滴滴:生活,技术
儿童节那天,对比了python内置的dict和binary array search查找的性能,dict大幅领先,有些出乎意料:
所以,6月2日,用go语言来对比一下,也熟悉一下go语言。 binary search有几种写法:
recusive
: 递归3-branch
: 迭代,每次迭代进行2次判断 ==
,<
(或者>
),三个分支。找到后,立刻退出。最好情况为O(1)2-branch
: 迭代,每次迭代进行1次判断 <
(或者>
),二个分支。最好情况也为O(log n)library
: go标准库,它是3的通用版本,需要提供一个函数,用来提供>
或<
的依据
3的想法比较有意思,它是先计算出对于要查找的数,如果放入这个已经排好序的数组array,下标是多少。如果在array里,有重复,返回的是则是第一个的下标。最后再对比一下数组里面的数,是否正是需要查找的key
func binary_search2(arr []int, key int) int {
lo, hi := 0, len(arr)-1
for lo < hi {
mid := (lo + hi) / 2
if arr[mid] < key {
lo = mid + 1
} else {
hi = mid
}
}
if lo == hi && arr[lo] == key {
return lo
} else {
return -(lo + 1)
}
}
4的做法很好玩,实现:
func Search(n int, f func(int) bool) int
go语言没有模版,没有像java那样的Comparator,没有运算符重载,还是静态类型。但这个接口完全不需要这些,并且非常通用(虽有一点性能的损失)
1和2是常规解法了。
// recusive
func binary_search0(arr []int, key int) int {
return binary_search_(arr, 0, len(arr), key)
}
func binary_search_(arr []int, begin, end, key int) int {
if begin > end {
return -1
}
mid := (begin + end) / 2
if arr[mid] == key {
return mid
} else if arr[mid] > key {
return binary_search_(arr, begin, mid-1, key)
} else {
return binary_search_(arr, mid+1, end, key)
}
}
// 3-branch
func binary_search(arr []int, key int) int {
lo, hi := 0, len(arr)-1
for lo <= hi {
mid := (lo + hi) / 2
if arr[mid] == key {
return mid
} else if arr[mid] < key { // search upper subarray
lo = mid + 1
} else {
hi = mid - 1
}
}
return -(lo + 1)
}
随机生成2k到500k个int,进行1500次查询,纪录下时间。并以内置的map做为对比
递归,看上去还行,但易于编写,不易出错。很好
代码,把结果画成比较好看的柱状图
ps:这个周末除了和朋友聚聚,为女朋友的侄子采购玩具,剩下就是折腾binary search了。
今天是国际儿童节。 跟我没有什么关系。 祝他们都能开心,快乐的成长。
dict可以用来判断一个元素是不是在集合里面,binary search也可以。一直想对比一下他们的性能:
测试代码在。其中binary search的实现为:
def binary_search(arr, key):
lo, hi = 0, (len(arr) - 1)
while lo <= hi:
mid = (lo + hi) >> 1
k = arr[mid]
if k == key:
return mid
elif k < key:
lo = mid + 1 # search upper subarray
else:
hi = mid - 1 # search lower subarray
# (-(insertion point) - 1)
return -(lo + 1)
随机生成测试数据的代码
COUNT = 1000 * 200
# key is string
key_str = {}
for i in range(COUNT):
key_str[random_string(random.randint(2, 15))] = True
# key is int
key_int = {}
for i in range(COUNT):
key_int[random.randint(0, 1 << 32)] = True
线程池接受task,并执行他们。
T1
T2 *
T3
T4
T5 *
T6
不做特殊处理,线程池不能保证T1-T6的执行顺序,有可能它们是并发的。但有特殊要求,比如需要T2和T5需要顺序执行,应该怎么实现呢?
问题来源于,在写http-kit的时候,我对它加入了WebSocket协议的支持。 处理WebSocket时,由于一个client可能发多个message给服务器,服务器应该需要保证它们的时序:先到先被处理,后到的要等待前面的被处理后才能被处理。
http-kit的线程模型是一个异步的IO线程,负责接受客户端的连接,读取并解析协议为request,扔到一个线程池,由它们计算response,然后IO线程负责把response写到客户端的TCP socket buffer。 http-kit需要处理HTTP和WebSocket协议,HTTP协议是单一的request => response,不实现Pipeline则没有这个问题(http-kit选择不实现HTTP的pipeline)。
问题变为:如何高效实现线程池中部分task有序执行
简单并高效的代码总是很招人喜欢,我甚至愿意用一点点效率来换取简单。http-kit用3000来行代码,从零高效实现HTTP server,支持HTTP长连推送和Websocket,异步的HTTP client,Timer,以及漂亮的API。实际经验告诉我,简单也会换来性能,比如http-kit具有和Nginx相似的性能特点,这里有一南非朋友做的测试,这里有一个C600K的试验,被推上了Hacker News的首页首行。
最先的考虑是需要在线程池上做文章。试着去实现一个特殊的线程池:多个线程,共享一个全局queue,每个还有一个单独的queue,通过单独的queue保证顺序。因为2个queue,需要解决Task被饿死的情况。 试着写了几个实现,总觉不太对,感觉自己能力有限,试着发了封邮件到,那里面有很多这方面的专家,比如大名鼎鼎的j.u.c的作者Doug Lea,希望能得到点指点。第二天,有几个人给了回复,其中Jeff Hain的回复让我茅塞顿开:
It seems to me you don’t need a particular thread pool to have some of your tasks ordered, if you can keep a reference to last submitted runnable for each ordered sequence of runnables (for example using a map in IO thread).
I’m thinking about using a specific “LinkingRunnable”, containing an “impl” Runnable and an AtomicReference to a “next” Runnable.
When LinkingRunnable.run() is called, it first runs impl.run(), and then does CAS(null,this) (using “this” as a tombstone): if CAS fails, that means someone added a runnable to run next, and it runs it.
When subsequent task arrives, you try to “enqueue” it doing CAS(null,next): If CAS succeeds, that means it’ll be executed just after previous task (and in same thread), and if it fails that means that the previous task already completed, so you can just submit your next task to the pool.
You can also test “get() == null” before doing CASes, which should make them rare, and reduce the usual overhead to a volatile read.
我立刻做了实现:
class LinkingRunnable implements Runnable {
private final Runnable impl;
AtomicReference<LinkingRunnable> next = new AtomicReference<LinkingRunnable>(null);
public LinkingRunnable(Runnable r) {
this.impl = r;
}
public void run() {
impl.run();
if (!next.compareAndSet(null, this)) { // has more job to run
next.get().run();
}
}
}
WSFrameHandler task = new WSFrameHandler(channel, frame);
LinkingRunnable job = new LinkingRunnable(task);
// channel.serialTask 为一个指针,先取出原来的,并更新
LinkingRunnable old = channel.serialTask;
channel.serialTask = job;
if (old == null) { // No previous job
execs.submit(job);
} else {
if (old.next.compareAndSet(null, job)) {
// successfully append to previous task
} else {
// previous message is handled, order is guaranteed.
execs.submit(job);
}
}
更详细的代码,以及上下文,请参看
简单高效的实现了局部有序,非常喜欢,符合我对好代码的要求:简单,高效
I and Peter were working on a new unified API for WebSocket, HTTP streaming and long-polling with protocol Channel
and macro with-channel
.
The new API is fully working now. I just get the time to write a page to explain it a little bit: http://http-kit.org/channel.html.
It has a few benefits:
If interested, please have a try, and tell me how do you think.
Original: http://http-kit.org/600k-concurrent-connection-http-kit.html
Edit(2013/1/29): http-kit is a event driven HTTP server & client for Clojure. It’s open source, https://github.com/http-kit
Inspired by Scaling node.js to 100k concurrent connections! and Node.js w/250k concurrent connections!. I did some test for http-kit!
http-kit manages to make 600k concurrent connections, on PC!
The server read the length param from the request, generate a string of that length.
;; main.clj
;; ~20k string
(def const-str (apply str (repeat 200 "http-kit is a http server & client
written from scrach for high performance clojure web applications,
support async and websocket")))
(defn handler [req]
(let [length (to-int (or (-> req :params :length) 1024))]
{:status 200
:headers {"Content-Type" "text/plain"}
:body (subs const-str 0 (max (min 10240 length) 1))}))
(defn -main [& args]
(run-server (-> handler wrap-keyword-params wrap-params)
{:port 8000})
(println (str "Server started. listen at 0.0.0.0")))
Start the server:
java -server -Xms3072m -Xmx3072m -cp `lein classpath` clojure.main -m main
The server need to set max allowed open file to a much larger value. The default value is ~1024
echo 9999999 | sudo tee /proc/sys/fs/nr_open
echo 9999999 | sudo tee /proc/sys/fs/file-max
# edit /etc/security/limits.conf, add the following line, need logout and login again
* - nofile 4999999
# set before run the server and test code
ulimit -n 4999999
More ports for test code to use
sudo sysctl -w net.ipv4.ip_local_port_range="1025 65535"
The server and test code are both run on my desktop:
A IP can only issue most 65536 connections to a server, since socket port is unsigned short. But we can bypass this limit. On Linux, it’s quite easy to set up virtual network interface:
for i in `seq 200 230`; do sudo ifconfig eth0:$i 192.168.1.$i up ; done
Then your computer have many IPs, from 192.168.1.200
to 192.168.1.230
. The server bind to 0.0.0.0
, the client can connect 192.168.1.200
, or 192.168.1.201
, etc. Per IP can have about 60K concurrent connections. Then the client can issue as many concurrent connections as it needed.
The client opens 600k concurrent keep-alived connections to the server, request the server to return string of length randomly between 1 ~ 4096 bytes, read the response, idle 5s ~ 45s (randomly pick a value between), request again.
time 0s, concurrency: 100, total requests: 0, thoughput: 0.00M/s, 0.00 requests/seconds
time 40s, concurrency: 164000, total requests: 230142, thoughput: 11.78M/s, 5688.28 requests/seconds
...
time 89s, concurrency: 340100, total requests: 788985, thoughput: 18.23M/s, 8812.23 requests/seconds
...
time 179s, concurrency: 595166, total requests: 2483174, thoughput: 28.61M/s, 13837.77 requests/seconds
time 180s, concurrency: 597853, total requests: 2506378, thoughput: 28.71M/s, 13888.67 requests/seconds
time 183s, concurrency: 600000, total requests: 2529020, thoughput: 28.52M/s, 13788.14 requests/seconds
time 185s, concurrency: 600000, total requests: 2537212, thoughput: 28.20M/s, 13680.42 requests/seconds
...
time 930s, concurrency: 600000, total requests: 17457773, thoughput: 38.64M/s, 18763.53 requests/seconds
time 931s, concurrency: 600000, total requests: 17477678, thoughput: 38.69M/s, 18764.73 requests/seconds
Issue this command from command line:
ab -n 100000 -c 10 -k http://127.0.0.1:8000/
output: ab output
Server Software: http-kit
Server Hostname: 127.0.0.1
Server Port: 8000
Document Path: /
Document Length: 1024 bytes
Concurrency Level: 10
Time taken for tests: 3.184 seconds
Complete requests: 100000
Failed requests: 0
Write errors: 0
Keep-Alive requests: 100000
Total transferred: 117000000 bytes
HTML transferred: 102400000 bytes
Requests per second: 31405.53 [#/sec] (mean)
Time per request: 0.318 [ms] (mean)
Time per request: 0.032 [ms] (mean, across all concurrent requests)
Transfer rate: 35883.27 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 0 0 9.3 0 913
Waiting: 0 0 9.3 0 913
Total: 0 0 9.3 0 913
Percentage of the requests served within a certain time (ms)
50% 0
66% 0
75% 0
80% 0
90% 0
95% 0
98% 0
99% 0
100% 913 (longest request)
jvisualvm’s snapshot file.
The complete test code is available on github. Checkout and run it yourself!
To report a bug, or general discussion: https://github.com/http-kit/scale-clojure-web-app/issues