1. 03 Mar 2012

    浅说Javascript的namespace 和 private

    Douglas Crockford

    我参与的项目用了很多javascript。比如

    • 中文版美味书签: 选集查看页面是用javascript渲染的; 选集创建,js也起了很重要的作用
    • Track: 就是一个Javascript App, 很多逻辑都是 在浏览器完成, 包括routing, 生成HTML……
    • Rssminer: javascript做了更多事情, 比如浏览器端 生产HTML, 去广告, 实现readability……

    依赖Javascript有很多好处,不细说。缺点也有一堆,比如“没有namespace和private”, 他们是模块化和封装的基础,我感觉尤为关键。packageprivate是javascript的关键字,但是作用却是:不小心用作变量名, object的key,程序出错,仅此而已。但是上帝在把门关上的时候,留了一个窗户。用函数可以实现他们(从某种意义上说)

    // utils.js
    (function () {
      //  private. given by closure
      var private_var = 1;
      var helper1 = function () { };
      // namespace: YOUR_NS. given by javascript's global object
      window.YOUR_NS = window.YOUR_NS || {};
      window.YOUR_NS.utils = {
        helper1: helper1            // export, public
      };
      // create an anonymous function, execute it immediately
    })();
    // app.js
    (function () {
      // like java's import, c++'s using namespace
      var utils = window.YOUR_NS.utils;
      var utils2 = window.YOUR_NS.utils;  // rename
      // direct import, java's import static
      var helper1 = utils.helper1;
      // you app's logic here
      utils.helper1();              // smaple usage
      utils2.helper1();             // smaple usage
      helper1();                    // smaple usage
    })();
    
    
    
    
  2. 01 Feb 2012

    How far epoll can push concurrent socket connection

    I wrote an online dictionary in pure C in Spring festival.

    The dictionary data(about 8.2M, file dbdata) is compressed and loaded into memory using mmap, an index is build on top of it for fast lookup using binary search. The RES is about 9M when concurrent connection is not high, say, blow 1k.

    I handcrafted the web server in pure C with epoll. It serves static file and word lookup request. The performance is amazing, 57.3k req/s, when 1600k socket connections are kept

    Test Machine

    Server and test app are run on the same computer. * Mem: 16G * CPU: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz * OS : GUN/Linux Debian 3.1.0-1-amd64

    Several config for Linux:

    # set up virtual network interface,
    # test client bind to these IP, then connect
    for i in `seq 21 87`; do sudo ifconfig eth0:$i 192.168.1.$i up ; done
    
    # more ports for testing
    sudo sysctl -w net.ipv4.ip_local_port_range="1025 65535"
    # tcp read buffer, min, default, maximum
    sudo sysctl -w net.ipv4.tcp_rmem="4096 4096 16777216"
    # tcp write buffer, min, default, maximum
    sudo sysctl -w net.ipv4.tcp_wmem="4096 4096 16777216"
    echo 9999999 | sudo tee /proc/sys/fs/nr_open
    echo 9999999 | sudo tee /proc/sys/fs/file-max
    
    # edit /etc/security/limits.conf, add line
    # * - nofile 9999999

    Command to show status

    cat /proc/net/sockstat

    1600K concurrent connection. C1600k.

    Test code, written in JAVA

    public class MakeupIdelConnection {
        final static int STEPS = 10;
        final static int connectionPerIP = 50000;
        public static void main(String[] args) throws IOException {
            final Selector selector = Selector.open();
            InetSocketAddress locals[] = new InetSocketAddress[32];
            for (int i = 0; i < locals.length; i++) {
                locals[i] = new InetSocketAddress("192.168.1." + (21 + i), 9090);
            }
            long start = System.currentTimeMillis();
            int connected = 0;
            int currentConnectionPerIP = 0;
            while (true) {
                if (System.currentTimeMillis() - start > 1000 * 60 * 10) {
                    break;
                }
                for (int i = 0; i < connectionPerIP / STEPS && currentConnectionPerIP < connectionPerIP; ++i, ++currentConnectionPerIP) {
                    for (InetSocketAddress addr : locals) {
                        SocketChannel ch = SocketChannel.open();
                        ch.configureBlocking(false);
                        Socket s = ch.socket();
                        s.setReuseAddress(true);
                        ch.register(selector, SelectionKey.OP_CONNECT);
                        ch.connect(addr);
                    }
                }
    
                int select = selector.select(1000 * 10); // 10s
                if (select > 0) {
                    System.out.println("select return: " + select + " events ; current connection per ip: " + currentConnectionPerIP);
                    Set<SelectionKey> selectedKeys = selector.selectedKeys();
                    Iterator<SelectionKey> it = selectedKeys.iterator();
    
                    while (it.hasNext()) {
                        SelectionKey key = it.next();
                        if (key.isConnectable()) {
                            SocketChannel ch = (SocketChannel) key.channel();
                            if (ch.finishConnect()) {
                                ++connected;
                                if (connected % (connectionPerIP * locals.length / 10) == 0) {
                                    System.out.println("connected: " + connected);
                                }
                                key.interestOps(SelectionKey.OP_READ);
                            }
                        }
                    }
                    selectedKeys.clear();
                }
            }
        }
    }

    57.3k req/s

    When 1600K connections are kept.

    class SelectAttachment {
        private static final Random r = new Random();
        private static final String[] urls = { "/d/aarp", "/d/about", "/d/zoo", "/d/throw", "/d/new", "/tmpls.js", "/mustache.js" };
        String uri;
        ByteBuffer request;
        int response_length = -1;
        int response_cnt = -1;
        public SelectAttachment(String uri) {
            this.uri = uri;
            request = ByteBuffer.wrap(("GET " + uri + " HTTP/1.1\r\n\r\n").getBytes());
        }
        public static SelectAttachment next() {
            return new SelectAttachment(urls[r.nextInt(urls.length)]);
        }
    }
    
    public class PerformanceBench {
        static final byte CR = 13;
        static final byte LF = 10;
        static final String CL = "content-length: ";
    
        public static String readLine(ByteBuffer buffer) {
            StringBuilder sb = new StringBuilder(64);
            char b;
            loop: for (;;) {
                b = (char) buffer.get();
                switch (b) {
                case CR:
                    if (buffer.get() == LF)
                        break loop;
                    break;
                case LF:
                    break loop;
                }
                sb.append(b);
            }
            return sb.toString();
        }
    
        public static void main(String[] args) throws IOException {
            int concurrency = 1024 * 3;
            long totalByteReceive = 0;
            int total = 200000;
            int remaining = total;
            InetSocketAddress addr = new InetSocketAddress("127.0.0.1", 9090);
            ByteBuffer readBuffer = ByteBuffer.allocateDirect(1024 * 64);
            Selector selector = Selector.open();
            SelectAttachment att;
            SocketChannel ch;
            long start = System.currentTimeMillis();
            for (int i = 0; i < concurrency; ++i) {
                ch = SocketChannel.open();
                ch.socket().setReuseAddress(true);
                ch.configureBlocking(false);
                ch.register(selector, SelectionKey.OP_CONNECT, SelectAttachment.next());
                ch.connect(addr);
            }
            loop: while (true) {
                int select = selector.select();
                if (select > 0) {
                    Set<SelectionKey> selectedKeys = selector.selectedKeys();
                    Iterator<SelectionKey> it = selectedKeys.iterator();
                    while (it.hasNext()) {
                        SelectionKey key = it.next();
                        if (key.isConnectable()) {
                            ch = (SocketChannel) key.channel();
                            if (ch.finishConnect()) {
                                key.interestOps(SelectionKey.OP_WRITE);
                            }
                        } else if (key.isWritable()) {
                            ch = (SocketChannel) key.channel();
                            att = (SelectAttachment) key.attachment();
                            ByteBuffer buffer = att.request;
                            ch.write(buffer);
                            if (!buffer.hasRemaining()) {
                                key.interestOps(SelectionKey.OP_READ);
                            }
                        } else if (key.isReadable()) {
                            ch = (SocketChannel) key.channel();
                            att = (SelectAttachment) key.attachment();
                            readBuffer.clear();
                            int read = ch.read(readBuffer);
                            totalByteReceive += read;
                            if (att.response_length == -1) {
                                readBuffer.flip();
                                String line = readLine(readBuffer);
                                while (line.length() > 0) {
                                    line = line.toLowerCase();
                                    if (line.startsWith(CL)) {
                                        String length = line.substring(CL.length());
                                        att.response_length = Integer.valueOf(length);
                                        att.response_cnt = att.response_length;
                                    }
                                    line = readLine(readBuffer);
                                }
                                att.response_cnt -= readBuffer.remaining();
                            } else {
                                att.response_cnt -= read;
                            }
                            if (att.response_cnt == 0) {
                                remaining--;
                                if (remaining > 0) {
                                    if (remaining % (total / 10) == 0) {
                                        System.out.println("remaining\t" + remaining);
                                    }
                                    key.attach(SelectAttachment.next());
                                    key.interestOps(SelectionKey.OP_WRITE);
                                } else {
                                    break loop;
                                }
                            }
                        }
                    }
                    selectedKeys.clear();
                }
            }
            long time = (System.currentTimeMillis() - start);
            long receiveM = totalByteReceive / 1024 / 1024;
            double reqs = (double) total / time * 1000;
            double ms = (double) receiveM / time * 1000;
            System.out.printf("total time: %dms; %.2f req/s; receive: %dM data; %.2f M/s\n", time, reqs, receiveM, ms);
        }
    }

    Source code

    It’s on github, https://github.com/shenfeng/dictionary

    1. /server Server side code, in C.
    2. /client Javascript/HTML/CSS
    3. /test/java Unit test and performance test code
    4. /src Clojure and java code to generate the dbdata file
  3. 24 Dec 2011

    Elisp, jump to last edit location across whole session

    Scenario

    Coding, editing —> an interesting function —> anther interesting function —> …

    “My god, where have I been, how can I jump right back to the last edit position”

    How other handle it

    Eclipse has a handing feature: Last Edit Location, binding to Ctrl+Q, It works globally: all visited files, even those being edited, but currently closed.

    Exiting solution

    Elisp to the rescue

    I’ve being using emacs for several month, it serves me well, it lacks this very handing feature. Thanks to elisp, I can implement it myself.

    Code

    ;;; record two different file's last change. cycle them
    (defvar feng-last-change-pos1 nil)
    (defvar feng-last-change-pos2 nil)
    
    (defun feng-swap-last-changes ()
      (when feng-last-change-pos2
        (let ((tmp feng-last-change-pos2))
          (setf feng-last-change-pos2 feng-last-change-pos1
                feng-last-change-pos1 tmp))))
    
    (defun feng-goto-last-change ()
      (interactive)
      (when feng-last-change-pos1
        (let* ((buffer (find-file-noselect (car feng-last-change-pos1)))
               (win (get-buffer-window buffer)))
          (if win
              (select-window win)
            (switch-to-buffer-other-window buffer))
          (goto-char (cdr feng-last-change-pos1))
          (feng-swap-last-changes))))
    
    (defun feng-buffer-change-hook (beg end len)
      (let ((bfn (buffer-file-name))
            (file (car feng-last-change-pos1)))
        (when bfn
          (if (or (not file) (equal bfn file)) ;; change the same file
              (setq feng-last-change-pos1 (cons bfn end))
            (progn (setq feng-last-change-pos2 (cons bfn end))
                   (feng-swap-last-changes))))))
    
    (add-hook 'after-change-functions 'feng-buffer-change-hook)
    ;;; just quick to reach
    (global-set-key (kbd "M-`") 'feng-goto-last-change)
  4. 07 Oct 2011

    A web crawler, written for speed, in JAVA and Clojure

    十一长假就快要过去了, 写的web crawler也告一段落: 速度能达到大概下载8万网页/小时, CPU和Mem的使用都比较满意: 运行40分钟的截图:

    image #### CPU, Mem使用

    image #### 网络使用(4M带宽,已极限)

    image #### 按status的分布

    Crawler是Rss miner的一部分, git log查看, 已零星5个月, 这5个月的周末都耗在上面了, 其中大部分在crawler上, 数次大的重构或重写。

    Crawer主要以Clojure和Java完成。 Clojure可以把程序写得很简洁, 利用Java可以很好的组织多线程, 面向对象 + functional, 感觉很不错。

    开始, 我用Clojure了封装JDK 的 URLConnection, 由于Blocking, 为了加快速度, 需要使用多线程。

    有一些问题, 例如:

    1. 线程少速度慢, 线程多了内存受不了, 我对内存较敏感, 有一部分是想挑战自己, 也有一部分是因为我的VPS只有512M内存, 想在上面跑Rss miner, 包括一个Web server, 一个Rss fetcher, 一个Web Crawler, 一个Online的实时推荐算法, 筹划中….
    2. URLConnection以[Stream](http://en.wikipedia.org/wiki/Stream_(computing)封装, 不是很方便。
    3. 如果各个线程分别自己保存自己下载的数据, Disk可能比较辛苦。 如果用Queue送给单独的一个线程处理, 又有一个额外的线程开销。

    我寻找 Non-blocking的Http Client, 试用了两个, 都不太满意, 自己写了一个, 注重性能和稳定性。

    实现:

    • 4个线程, 每个线程都是一个Loop, 相互之间是Producer, Consumer的关系, 通过Queue和Event交流
    • 管理状态比较多的,用Java实现, 比如用Tagsoup抽取链接和文本, 通过规则排除部分URL
    • DNS prefetch, Pdnsd做DNS cache: UDP提前发送, 忽略结果。
    • Java搭了一个简单的框架, 提供两个Interface, 由Clojure实现
    public interface IHttpTask {
        URI getUri();
        Map<String, Object> getHeaders();
        Object doTask(HttpResponse response) throws Exception;
        Proxy getProxy();
    }
    public interface IHttpTaskProvder {
        List<IHttpTask> getTasks();
    }
  5. 18 Sep 2011

    Async Java HTTP client

    I spent some of my spare time writing Rssminer, an intelligent RSS reader, I want it to be smart enough to highlight stories I like, and help me discover stories I may like.

    I plan to do it by downloading as many web pages as possible from the Internet, extract RSS links it contains, download them, then apply machine learning algorithms on them. It’s ambitious.

    The first thing need to be solved is an Http client. JDK’s URLConnection is blocking, 20 threads devoted to it, still not fast enough, and there are some keepalive timer come out of the way. The non-blocking is tried, it works great, but it lacks socks proxy support, and I want to control everything.

    So, I write my own async HTTP client, by using a great library netty, which provides a async socket framework and HTTP codec.

    // Http client sample usage
       HttpClientConfig config = new HttpClientConfig();
       header = new HashMap<String, Object>();
       HttpClient client = new HttpClient(config);
       URI uri = new URI("http://onycloud.com");
       final HttpResponseFuture future = client.execGet(uri, header);
       resp.addListener(new Runnable() {
           public void run() {
               HttpResponse resp = future.get(); // async
           }
       });
       HttpResponse resp = future.get(); // blocking

    The source code is concise, about 1000 lines of code(about 600 lines excluding import statements and blank lines), can be found on github.