A programmer's site
About programming...
About programming...
java -XX:+PrintFlagsFinal
作为一个程序员,行业要求要无止境的学习,并且我还很喜欢这个行业
《深入理解Java虚拟机:JVM高级特性与最佳实践》 不管怎样, 我很喜欢java和JVM上的Clojure, 有必要深入了解一下。
《Computer Systems, A programmer’s perspective》 去年就开始翻了, 希望今年能看完。
《The c programming language》 大一就读了这本书, 现在有必要重读一下
《a global history from prehistory to the 21st century》 袁某人推荐的。去年就买了,可是没有看。 很是喜欢袁老师
用C写一个Web Server, 用epoll,自己写内存分配,妥善处理超时,和各种意 外情况。 看看能到多快。
用纯JAVA, 从头写一个web server 和http client,给Rssminer用。 能给 production用。 妥善处理超时, 各种意外,控制latency, 控制内存使用。
写完Rssminer。 这个已经写了快一年了。 今年6月份以前应该publicly available
我参与的项目用了很多javascript。比如
依赖Javascript有很多好处,不细说。缺点也有一堆,比如“没有namespace和private”, 他们是模块化和封装的基础,我感觉尤为关键。package,private是javascript的关键字,但是作用却是:不小心用作变量名, object的key,程序出错,仅此而已。但是上帝在把门关上的时候,留了一个窗户。用函数可以实现他们(从某种意义上说)
// utils.js
(function () {
// private. given by closure
var private_var = 1;
var helper1 = function () { };
// namespace: YOUR_NS. given by javascript's global object
window.YOUR_NS = window.YOUR_NS || {};
window.YOUR_NS.utils = {
helper1: helper1 // export, public
};
// create an anonymous function, execute it immediately
})();
// app.js
(function () {
// like java's import, c++'s using namespace
var utils = window.YOUR_NS.utils;
var utils2 = window.YOUR_NS.utils; // rename
// direct import, java's import static
var helper1 = utils.helper1;
// you app's logic here
utils.helper1(); // smaple usage
utils2.helper1(); // smaple usage
helper1(); // smaple usage
})();
<!-- app.html -->
<!-- browser load and execute them in order -->
<script src="utils.js"></script>
<script src="app.js"></script>
I wrote an online dictionary in pure C in Spring festival.
The dictionary data(about 8.2M, file dbdata) is compressed and loaded into memory using mmap, an index is build on top of it for fast lookup using binary search. The RES is about 9M when concurrent connection is not high, say, blow 1k.
I handcrafted the web server in pure C with epoll. It serves static file and word lookup request. The performance is amazing, 54.3k req/s, when 800k socket connections are kept
Server and test app are run on the same computer.
Mem: 16GCPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHzOS : GUN/Linux Debian 3.1.0-1-amd64Several config for Linux:
# set up virtual network interface,
# test client bind to these IP, then connect
for i in `seq 21 37`; do sudo ifconfig eth0:$i 192.168.1.$i up ; done
# more ports for testing
sudo sysctl -w net.ipv4.ip_local_port_range="1025 65535"
# edit /etc/security/limits.conf, add line
# * - nofile 999999
Test code, written in java, Memory usage: RES: 142M (from htop)
public class MakeupIdelConnection {
final static int STEPS = 10;
final static int connectionPerIP = 50000;
public static void main(String[] args) throws IOException {
final Selector selector = Selector.open();
InetSocketAddress locals[] = {
new InetSocketAddress("192.168.1.22", 9090),
new InetSocketAddress("192.168.1.23", 9090),
new InetSocketAddress("192.168.1.24", 9090),
new InetSocketAddress("192.168.1.25", 9090),
new InetSocketAddress("192.168.1.26", 9090),
new InetSocketAddress("192.168.1.27", 9090),
new InetSocketAddress("192.168.1.28", 9090),
new InetSocketAddress("192.168.1.29", 9090),
new InetSocketAddress("192.168.1.30", 9090),
new InetSocketAddress("192.168.1.31", 9090),
new InetSocketAddress("192.168.1.32", 9090),
new InetSocketAddress("192.168.1.33", 9090),
new InetSocketAddress("192.168.1.34", 9090),
new InetSocketAddress("192.168.1.35", 9090),
new InetSocketAddress("192.168.1.36", 9090),
new InetSocketAddress("192.168.1.37", 9090),
};
long start = System.currentTimeMillis();
int connected = 0;
int currentConnectionPerIP = 0;
while (true) {
if (System.currentTimeMillis() - start > 1000 * 60 * 10) {
break;
}
for (int i = 0; i < connectionPerIP / STEPS && currentConnectionPerIP < connectionPerIP; ++i, ++currentConnectionPerIP) {
for (InetSocketAddress addr : locals) {
SocketChannel ch = SocketChannel.open();
ch.configureBlocking(false);
Socket s = ch.socket();
s.setReuseAddress(true);
ch.register(selector, SelectionKey.OP_CONNECT);
ch.connect(addr);
}
}
int select = selector.select(1000 * 10); // 10s
if (select > 0) {
System.out.println("select return: " + select + " events ; current connection per ip: " + currentConnectionPerIP);
Set<SelectionKey> selectedKeys = selector.selectedKeys();
Iterator<SelectionKey> it = selectedKeys.iterator();
while (it.hasNext()) {
SelectionKey key = it.next();
if (key.isConnectable()) {
SocketChannel ch = (SocketChannel) key.channel();
if (ch.finishConnect()) {
++connected;
if (connected % (connectionPerIP * locals.length / 10) == 0) {
System.out.println("connected: " + connected);
}
key.interestOps(SelectionKey.OP_READ);
}
}
}
selectedKeys.clear();
}
}
}
}
When 800K connection is kept, CPU usage: about 65% of a single core.
class SelectAttachment {
private static final Random r = new Random();
private static final String[] urls = { "/d/aarp", "/d/about", "/d/zoo", "/d/throw", "/d/new", "/tmpls.js", "/mustache.js" };
String uri;
ByteBuffer request;
int response_length = -1;
int response_cnt = -1;
public SelectAttachment(String uri) {
this.uri = uri;
request = ByteBuffer.wrap(("GET " + uri + " HTTP/1.1\r\n\r\n").getBytes());
}
public static SelectAttachment next() {
return new SelectAttachment(urls[r.nextInt(urls.length)]);
}
}
public class PerformanceBench {
static final byte CR = 13;
static final byte LF = 10;
static final String CL = "content-length: ";
public static String readLine(ByteBuffer buffer) {
StringBuilder sb = new StringBuilder(64);
char b;
loop: for (;;) {
b = (char) buffer.get();
switch (b) {
case CR:
if (buffer.get() == LF)
break loop;
break;
case LF:
break loop;
}
sb.append(b);
}
return sb.toString();
}
public static void main(String[] args) throws IOException {
int concurrency = 1024 * 3;
long totalByteReceive = 0;
int total = 200000;
int remaining = total;
InetSocketAddress addr = new InetSocketAddress("127.0.0.1", 9090);
ByteBuffer readBuffer = ByteBuffer.allocateDirect(1024 * 64);
Selector selector = Selector.open();
SelectAttachment att;
SocketChannel ch;
long start = System.currentTimeMillis();
for (int i = 0; i < concurrency; ++i) {
ch = SocketChannel.open();
ch.socket().setReuseAddress(true);
ch.configureBlocking(false);
ch.register(selector, SelectionKey.OP_CONNECT, SelectAttachment.next());
ch.connect(addr);
}
loop: while (true) {
int select = selector.select();
if (select > 0) {
Set<SelectionKey> selectedKeys = selector.selectedKeys();
Iterator<SelectionKey> it = selectedKeys.iterator();
while (it.hasNext()) {
SelectionKey key = it.next();
if (key.isConnectable()) {
ch = (SocketChannel) key.channel();
if (ch.finishConnect()) {
key.interestOps(SelectionKey.OP_WRITE);
}
} else if (key.isWritable()) {
ch = (SocketChannel) key.channel();
att = (SelectAttachment) key.attachment();
ByteBuffer buffer = att.request;
ch.write(buffer);
if (!buffer.hasRemaining()) {
key.interestOps(SelectionKey.OP_READ);
}
} else if (key.isReadable()) {
ch = (SocketChannel) key.channel();
att = (SelectAttachment) key.attachment();
readBuffer.clear();
int read = ch.read(readBuffer);
totalByteReceive += read;
if (att.response_length == -1) {
readBuffer.flip();
String line = readLine(readBuffer);
while (line.length() > 0) {
line = line.toLowerCase();
if (line.startsWith(CL)) {
String length = line.substring(CL.length());
att.response_length = Integer.valueOf(length);
att.response_cnt = att.response_length;
}
line = readLine(readBuffer);
}
att.response_cnt -= readBuffer.remaining();
} else {
att.response_cnt -= read;
}
if (att.response_cnt == 0) {
remaining--;
if (remaining > 0) {
if (remaining % (total / 10) == 0) {
System.out.println("remaining\t" + remaining);
}
key.attach(SelectAttachment.next());
key.interestOps(SelectionKey.OP_WRITE);
} else {
break loop;
}
}
}
}
selectedKeys.clear();
}
}
long time = (System.currentTimeMillis() - start);
long receiveM = totalByteReceive / 1024 / 1024;
double reqs = (double) total / time * 1000;
double ms = (double) receiveM / time * 1000;
System.out.printf("total time: %dms; %.2f req/s; receive: %dM data; %.2f M/s\n", time, reqs, receiveM, ms);
}
}
It’s on github, https://github.com/shenfeng/dictionary
/server Server side code, in C./client Javascript/HTML/CSS/test/java Unit test and performance test code/src Clojure and java code to generate the dbdata fileCoding, editing —> an interesting function —> anther interesting function —> …
“My god, where have I been, how can I jump right back to the last edit position”
Eclipse has a handing feature: Last Edit Location, binding to Ctrl+Q, It works globally: all visited files, even those being edited, but currently closed.
session-jump-to-last-change, but per bufferI’ve being using emacs for several month, it serves me well, it lacks this very handing feature. Thanks to elisp, I can implement it myself.
;;; record two different file's last change. cycle them
(defvar feng-last-change-pos1 nil)
(defvar feng-last-change-pos2 nil)
(defun feng-swap-last-changes ()
(when feng-last-change-pos2
(let ((tmp feng-last-change-pos2))
(setf feng-last-change-pos2 feng-last-change-pos1
feng-last-change-pos1 tmp))))
(defun feng-goto-last-change ()
(interactive)
(when feng-last-change-pos1
(let* ((buffer (find-file-noselect (car feng-last-change-pos1)))
(win (get-buffer-window buffer)))
(if win
(select-window win)
(switch-to-buffer-other-window buffer))
(goto-char (cdr feng-last-change-pos1))
(feng-swap-last-changes))))
(defun feng-buffer-change-hook (beg end len)
(let ((bfn (buffer-file-name))
(file (car feng-last-change-pos1)))
(when bfn
(if (or (not file) (equal bfn file)) ;; change the same file
(setq feng-last-change-pos1 (cons bfn end))
(progn (setq feng-last-change-pos2 (cons bfn end))
(feng-swap-last-changes))))))
(add-hook 'after-change-functions 'feng-buffer-change-hook)
;;; just quick to reach
(global-set-key (kbd "M-`") 'feng-goto-last-change)
十一长假就快要过去了, 写的web crawler也告一段落: 速度能达到大概下载8万网页/小时, CPU和Mem的使用都比较满意: 运行40分钟的截图:



Crawler是Rss miner的一部分, git log查看, 已零星5个月, 这5个月的周末都耗在上面了, 其中大部分在crawler上, 数次大的重构或重写。
Crawer主要以Clojure和Java完成。 Clojure可以把程序写得很简洁, 利用Java可以很好的组织多线程, 面向对象 + functional, 感觉很不错。
开始, 我用Clojure了封装JDK 的 URLConnection, 由于Blocking, 为了加快速度, 需要使用多线程。
我寻找 Non-blocking的Http Client, 试用了两个, 都不太满意, 自己写了一个, 注重性能和稳定性。
public interface IHttpTask {
URI getUri();
Map<String, Object> getHeaders();
Object doTask(HttpResponse response) throws Exception;
Proxy getProxy();
}
public interface IHttpTaskProvder {
List<IHttpTask> getTasks();
}
I spent some of my spare time writing Rssminer, an intelligent RSS reader, I want it to be smart enough to highlight stories I like, and help me discover stories I may like.
I plan to do it by downloading as many web pages as possible from the Internet, extract RSS links it contains, download them, then apply machine learning algorithms on them. It’s ambitious.
The first thing need to be solved is an Http client. JDK’s URLConnection is blocking, 20 threads devoted to it, still not fast enough, and there are some keepalive timer come out of the way. The non-blocking AsyncHttpClient is tried, it works great, but it lacks socks proxy support, and I want to control everything.
So, I write my own async HTTP client, by using a great library netty, which provides a async socket framework and HTTP codec.
// Http client sample usage
HttpClientConfig config = new HttpClientConfig();
header = new HashMap<String, Object>();
HttpClient client = new HttpClient(config);
URI uri = new URI("http://onycloud.com");
final HttpResponseFuture future = client.execGet(uri, header);
resp.addListener(new Runnable() {
public void run() {
HttpResponse resp = future.get(); // async
}
});
HttpResponse resp = future.get(); // blocking
The source code is concise, about 1000 lines of code(about 600 lines excluding import statements and blank lines), can be found on github.
These days, I am experimenting Apache Lucene. I need a way to extract text from HTML source, feed it to Lucene. I first come up with a solution by using regex and Clojure:
(defn extract [html]
(when html
(str/replace html #"(?m)<[^<>]+>|\n" "")))
Most of the time, it works, and very fast. But it can’t ignore Javascript and CSS, which is needed. So I come up with another solution, by using enlive.
Here is the Clojure code.
(defn- emit-str [node]
(cond (string? node) node
(and (:tag node)
(not= :script (:tag node))) (emit-str (:content node))
(seq? node) (map emit-str node)
:else ""))
(defn extract-text [html]
(when html
(let [r (html/html-resource (java.io.StringReader. html))]
(str/trim (apply str (flatten (emit-str r)))))))
It’s works. javascript is ignored. But it’s a little slow: On my machine, extract a given html file, regex takes 0.21ms, But extract-text takes 2.76ms.
Enlive is build on top of TagSoup, which a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty and brutish, though quite often far from short.
By calling
(html/html-resource (java.io.StringReader. html)
Enlive build a tree for the html, which is a little overkill for only extract text. By directly using TagSoup, I can bypass this overhead. Here is the Java code:
public class Utils {
public static String extractText(String html) throws IOException,
SAXException {
Parser p = new Parser();
Handler h = new Handler();
p.setContentHandler(h);
p.parse(new InputSource(new StringReader(html)));
return h.getText();
}
}
class Handler extends DefaultHandler {
private StringBuilder sb = new StringBuilder();
private boolean keep = true;
public void characters(char[] ch, int start, int length)
throws SAXException {
if (keep) {
sb.append(ch, start, length);
}
}
public String getText() {
return sb.toString();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
if (localName.equalsIgnoreCase("script")) {
keep = false;
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
keep = true;
}
}
After experiment, I find
Parser p = new Parser();
takes a lot of CPU time. By using ThreadLocal
private static final ThreadLocal<Parser> parser = new ThreadLocal<Parser>() {
protected Parser initialValue() {
return new Parser();
}
};
It takes 0.38ms to extract text from the same html file. I am happy with the result.
An netty adapter impl on top of netty for used with Ring. Rssminer uses it to build the web server.
[me.shenfeng/async-ring-adapter "1.0.0"]
(use 'ring.adapter.netty)
(defn app [req]
{:status 200
:headers {"Content-Type" "text/html"}
:body (str "hello word")})
(run-netty app {:port 8080
:worker 4 ;; worker thread count
:netty {"reuseAddress" true}})
more options, refer netty doc
Netty is well designed and documented. It’s fun reading it’s code. It’s high performance.
Netty’s HTTP support is very different from the existing HTTP libraries. It gives you complete control over how HTTP messages are exchanged in a low level. Because it is basically the combination of HTTP codec and HTTP message classes, there is no restriction such as enforced thread model. That is, you can write your own HTTP client or server that works exactly the way you want. You have full control over thread model, connection life cycle, chunked encoding, and as much as what HTTP specification allows you to do.
There is a script ./scripts/start_server will start netty at port 3333, jetty at port 4444, here is a result on my machine
(def resp {:status 200
:headers {"Content-Type" "text/plain"}
:body "Hello World"})
ab -n 300000 -c 50 http://localhost:4444/ #11264.90 [#/sec] (mean) jetty
ab -n 300000 -c 50 http://localhost:3333/ #12638.37 [#/sec] (mean) netty
This repo was fork from datskos
The source code is in github
我们在昆山的团队中另外3个都用GNU Emacs, 为了更容易的和他们的代码风格保持一致, 我也试着学了一下。一学也有三个月了。
开发用的显示器分辨率是1920x1080, Emacs可以很好的使用这些空间, 通过hjiang的 smart-split,可以把 一个Emacs frame分成3个window, 通过other-window(C-x o)在这些window之间进行切换。 因为需要按Ctrl, 折磨小拇指, 并且也不是很方便。
我琢磨着写了下面的函数,可以快速切换到指定窗口。
(defun select-nth-window (n)
"Select the nth visible window of current frame,
window are ordered by top-left point"
(let* ((cmp (lambda (l r)
(if (= (second l) (second r))
(< (third l) (third r))
(< (second l) (second r)))))
(windows (sort
(mapcar (lambda (w)
(cons w (window-edges w)))
(window-list)) cmp))
(index (- (min n (length windows)) 1)))
(first (nth index windows))))
(defun select-first-window ()
"Select the top-left most window"
(interactive)
(select-window (select-nth-window 1)))
(defun select-second-window ()
(interactive)
(select-window (select-nth-window 2)))
(defun select-third-window ()
(interactive)
(select-window (select-nth-window 3)))
(global-set-key (kbd "M-1") 'select-first-window)
(global-set-key (kbd "M-2") 'select-second-window)
(global-set-key (kbd "M-3") 'select-third-window)
我改了一个hjiang的 smart-split 来更好的满足我的需要:
(defun smart-split ()
"Split the window into near 80-column sub-windows, try to
equally size every window"
(interactive)
(defun compute-width-helper (w)
"More than 220 column, can be split to 3, else 2"
(if (> (window-width w) 220) 80
(+ (/ (window-width w) 2) 1)))
(defun smart-split-helper (w)
"Helper function to split a given window into two, the first of which has
80 columns."
(if (> (window-width w) 130)
(let* ((w2 (split-window w (compute-width-helper w) t))
(i 0))
(with-selected-window w2
(next-buffer)
(while (and (string-match "^*" (buffer-name)) (< i 20))
(setq i (1+ i))
(next-buffer)))
(smart-split-helper w2))))
(smart-split-helper nil))
;; bind to F12 for quick access
(global-set-key [f12] 'smart-split)
我还bind了F1为delete-other-windows(default C-x 1)
(global-set-key [f1] 'delete-other-windows)
这样就可以F1, F12, M-1, M-2, M-3进行快速的切换。倒是挺方便。