Thursday, June 5, 2014

From Mozilla to NodeSource

Today I have officially handed in my resignation to Mozilla and will be starting work at NodeSource on June 16.

The decision to move was difficult. Dan Shaw can attest to that by the fact he'd taken every opportunity these last six months to bring me over. During that time I'd told him straight out that it would never happen. Working at Mozilla was about as good as any job could be. I mean, come on, Mozilla paid me to work on Node.js as a core maintainer but without any of the corporate oversight which could have required my efforts to focus the company's needs over my own desires as a maintainer to improve Node. Doesn't get much better than that for someone who enjoys developing open source software. Because of Mozilla I've been able to dive into many of the gritty aspects of core code and make them better just for the sake of having better code.

What eventually changed my mind? First, I have to give Dan credit at meeting my many requirements. Event to the point that he personally called my old manager Mark Mayo and discussed how I could continue to be closely affiliated with Mozilla after the transition. I realize that with how open Mozilla is it wouldn't be a problem for me to continue contributing, but there is some difference. Like, being able to walk a few desks down and speak with the Rust devs vs. chatting on IRC.

After feeling confident in being able to continue contributing back to the community just as openly as I have, and continue my work as core maintainer the same as usual, the decision to make the move felt more appropriate.

At NodeSource I'll have greater opportunity to work with companies that are using Node in the wild. These new interactions will give additional insight on how to focus my efforts as a core maintainer. My mantra is "make all the things fast", but with limited development hours it helps to know what to prioritize. Similarly I'm hopeful it will spark more creative madness to take Node to new levels of performance awesomeness.

NodeSource has coalesced an awesome group of talent. They'll definitely help me help the community, and to be honest, the community will get more out of me than before. Simply because I have additional assistance to assemble my mess of thoughts and code into something legible. So, you can thank NodeSource for future blog posts on how to make your Node apps faster.

So, here I jump from one awesome company to another. Never found myself a sentimental person. Irrational, rageful and sometimes impertinent sure. But it's always been about the code. Though in this case I'll make an exception (or maybe it's the Yo-Yo Ma instrumental piece I'm listening to while writing this) and say this is both a sad and exciting time in my career.

Wednesday, February 5, 2014

Node.js, ES6 and me

On January 31, 2014 my second son was born. That evening, while my wife and newborn were getting some sleep, I decided to jump on my phone and try to respond to the never ending mound of issues that seem to pile up. One of which was about enabling the --harmony option by default. Unfortunately one of my posts was misinterpreted by some. I will gladly take the blame for being simultaneously overbroad in my description of features accepted into Node core, and nonspecific about what core means.

Though bear in mind I never thought my comment would be taken so far as to lead some to believe Node would actually fork v8, or some such, to prevent ES6 features from being available in Node (especially since we had already done for one feature what this PR suggested we do for several others). Or because just prior to that paragraph I had discussed my conditions for allowing --harmony to be enabled by default, and it would have been a contradiction to then say it would never be allowed.

So first I'd like to clarify that I improperly referred to the entire ES6 spec. In that post I was specifically referring to iterators, generators and promises. Second I'd like to state for the record that I've never had the intention of preventing any Node developer from using new features coming to JavaScript, or from preventing these features from being exposed to the user if enabled in v8 by default. Last clarification is that the reference to "core" meant core code.

My Node Future

Some of your are probably thinking, "Why wouldn't he want to save us from callback hell and integrate a generator/promise based API?" The reasons are simple and pragmatic. First is because the key thing Node does (integrating JavaScript with the v8 C++ bindings/libuv) can only be done via callbacks. Second is that I want to remove abstraction and allow users to interface their code as close to the action as possible.

The community has proven two things. First is that none of us agree on how an API should be written. Second is that there are very talented developers that will push Node to the limits. My hopeful goal for Node is to introduce a standardized minimal API that sits very close to these C++ bindings. From there the community can create whatever interface they want with as close to zero unnecessary overhead as possible.

I want Node to be tremendously good at one thing. Being the kernel of a web platform for app developers who wish to write in JavaScript. That means writing an API that allows minimal abstraction and as close to zero overhead as possible. So why use the callback style? Because currently it's the only way to interface with the v8 C++ bindings. When the v8 team introduces an API for generators or promises I will give it honest consideration.

Moving at Node Speed

Some days I like to write little apps that sit directly on top of the internal C++ API (Max, promise I haven't forgotten about the router app). Every time I'm reminded of how much performance is left on the table due to abstraction.

Imagine you want to saturate 100Gb pipe with TCP traffic. That means you'll be sending at least 204,800 packets a second. Which also means something needs to be executed at least 204,800 times per second to send that data. Small amounts of extra abstraction start to make a big difference in performance at this level.

The v8 team has done some amazing stuff to improve performance in key areas that are important to Node (doubt they had Node in mind when making those changes, but I'll take it anyways). Things like making calls between C++ and JavaScript have become extremely cheap (we're talking double digit nanosecond cheap). I'm working on new performance patterns that can hopefully be integrated down the road that will allow techniques like using JavaScript to chain a series of C++ callbacks together (useful for high performance transforms).

There is still so much more to be done, but if we're really going to make Node fast and keep it light weight then it must be done using a specific subset of the JavaScript specification. Just how far can we push Node? Let me just say you ain't seen nothing yet.

Tuesday, July 23, 2013

measuring node performance - part 1

This is the first of some unknown number of blog posts about tracking your Node application's performance. Today we're going to start off with a few simple ones. Though before we being I'd just like to make sure everyone is familiar with the --v8-options flag. There is all sorts of magic here waiting to be found. Go ahead, take a quick look. I'll be waiting here for you... Now a lot of these won't be much use until we combine them with other tools, but we'll get to those in future posts. Right now we're going to focus on a few that can give us a quick peek into how our code is doing.

Let's start off with this tiny snippet of code:

function sumItUp(n) {
  for (var i = n + 1, m = n * 2; i <= m; i += 2)
    n += i;
  return n;
}

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = sumItUp(i);
  return tmp;
}

runTest(53509);

Now run it with --trace-deopt and --code-comments. See anything? What do you mean you don't see anything? Oh, yeah. Here, change 53509 to 53510 and run it again. Now see anything? Good. It should look something like this:

[deoptimizing (DEOPT eager): begin 0x1dfe78f8b109 runTest @7, FP to SP delta: 64]
            ;;; jump table entry 3: deoptimization bailout 7.
  translating runTest => node=40, height=24
    0x7fff7f678860: [top + 64] <- 0x5e3f1c5fe69 ; rdi 0x5e3f1c5fe69 <JS Global Object>
    0x7fff7f678858: [top + 56] <- 53510 ; rcx (smi)
    0x7fff7f678850: [top + 48] <- 0x2f3759173318 ; caller's pc
    0x7fff7f678848: [top + 40] <- 0x7fff7f678888 ; caller's fp
    0x7fff7f678840: [top + 32] <- 0x1dfe78f8b089; context
    0x7fff7f678838: [top + 24] <- 0x1dfe78f8b109; function
    0x7fff7f678830: [top + 16] <- 53509 ; r9 (smi)
    0x7fff7f678828: [top + 8] <- 0x5e3f1c04121 <undefined> ; literal
    0x7fff7f678820: [top + 0] <- 0x1dfe78f8b0c1 ; rdx 0x1dfe78f8b0c1 <JS Function sumItUp (SharedFunctionInfo 0x2911dc22c219)>
  translating sumItUp => node=38, height=16
    0x7fff7f678818: [top + 56] <- 0x5e3f1c5fe69 ; r8 0x5e3f1c5fe69 <JS Global Object>
    0x7fff7f678810: [top + 48] <- 2147409811 ; r11 (smi)
    0x7fff7f678808: [top + 40] <- 0x2f3759173526 ; caller's pc
    0x7fff7f678800: [top + 32] <- 0x7fff7f678848 ; caller's fp
    0x7fff7f6787f8: [top + 24] <- 0x1dfe78f8b089; context
    0x7fff7f6787f0: [top + 16] <- 0x1dfe78f8b0c1; function
    0x7fff7f6787e8: [top + 8] <- 107018 ; rax (smi)
    0x7fff7f6787e0: [top + 0] <- 107018 ; rbx (smi)
[deoptimizing (eager): end 0x1dfe78f8b0c1 runTest @7 => node=38, pc=0x2f37591736df, state=NO_REGISTERS, alignment=no padding, took 0.160 ms]
[removing optimized code for: runTest]

Wow. This looks all cryptic and scary, but there's a lot we can learn. It's basically the assembly output of what your JS became. The order of execution goes from the bottom up, so let's start there.

First let's pick out the values that we understand: 107018 ; rax (smi) is generated by the loop in sumItUp (remember since 53509 * 2 == 107018) and 2147409811 ; r11 (smi) is the return value.

What's that magical smi you ask? Well I'm glad you did. It stands for "small integer" and while all numbers in JS are technically type double, an optimization technique v8 uses is to detect when only smi's are used. On my x64 machine that's a full 32 bit integer. So, the largest smi would be... 2147483647.

Working our way up notice 53510 ; rcx (smi) and then the deoptimization. Grab the return value there with a simple console.log(runTest(53510)); and the output should be 2147516829. Applying our new found knowledge of smi's we can see that 2^32 / 2 - 1 - 2147516829 == -33182. Oops, overflow. Sometimes this may not be avoidable, but at least it's good to know why code is deoptimizing.

Another useful one is --trace-opt. Using that with our working code it'll output:

[marking 0x1dc7f8f8b011 <JS Function sumItUp (SharedFunctionInfo 0x6815302c219)> for recompilation, reason: small function, ICs with typeinfo: 5/5 (100%)]
[optimizing 0x1dc7f8f8b011 <JS Function sumItUp (SharedFunctionInfo 0x6815302c219)> - took 0.025, 0.114, 0.038 ms]
[marking 0x1dc7f8f8b059 <JS Function runTest (SharedFunctionInfo 0x6815302c2a9)> for recompilation, reason: small function, ICs with typeinfo: 2/2 (100%)]
[optimizing 0x1dc7f8f8b059 <JS Function runTest (SharedFunctionInfo 0x6815302c2a9)> - took 0.044, 0.109, 0.021 ms]

Nice to know and all, but where I really find it advantageous is to find problems like the following:

function retSumFn() {
  function sumItUp(n) {
    for (var i = n + 1, m = n * 2; i <= m; i += 2)
      n += i;
    return n;
  }
  return sumItUp;
}

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = retSumFn()(i);
  return tmp;
}

runTest(1e3);

Run this and prepare yourself for a wall of text. See, v8 needs to optimize every function returned. So keep functions at the top level. If your function needs something, pass it in. Here's a very simple modification that greatly reduces the stress on your application:

function sumItUp(n) {
  for (var i = n + 1, m = n * 2; i <= m; i += 2)
    n += i;
  return n;
}

function retSumFn() { return sumItUp; }

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = retSumFn()(i);
  return tmp;
}

runTest(53509);

And finally, the mother of all optimization --trace-inlining. This is code that can be not just optimized, but the resulting assembly can be inline'd into the function making the call. For this to happen the function body must be small. Under 600 characters actually. Variable types cannot change. So if you're expecting a number, then go ahead and coerce it on entry (e.g. +'0xff' == 255). Also limit the types of arguments that are passed.

Let's throw --trace-inling onto our original code and see the output. Actually pretty basic: Inlined sumItUp called from runTest. This is great. It means internally it placed the assembly generated from sumItUp directly into runTest. Now, we have reached a new height in optimization!

Welcome to the basics of optimization. Here are a couple links with more in-depth material:

Understanding v8
Optimizing for v8

As always, if I screwed up please let me know. Don't want to propagate bad information, but guess since no one reads this anyways that won't be a problem :)

Wednesday, July 10, 2013

node with threads

Put this together while reviewing someone else's code. It's a simple example of how to use uv_queue_work. I made the example to work with Node's v0.10 branch, as master has upgrade to v8 3.20 which introduced non-trivial API changes. Also, this code is academic and non-optimal. I promise you're code won't be faster implementing this as-is.

Starting at the bottom, we'll start by binding a native method to a JavaScript function.

Handle<Value> Run(const Arguments& args) {
  HandleScope scope;
  assert(args[0]->IsFunction());

  js_work* work = new js_work;
  work->req.data = work;
  work->callback = Persistent<Function>::New(args[0].As<Function>());

  // pretty simple, right?
  uv_queue_work(uv_default_loop(), &work->req, run_work, run_callback);
  return Undefined();
}

void Init(Handle<Object> target) {
  HandleScope scope;
  target->Set(String::New("run"),
      FunctionTemplate::New(Run)->GetFunction());
}

NODE_MODULE(basics, Init)

Pretty standard setup for binding a native method to the return object. In Run() the first thing that happens is to make sure the passed argument is a function, which will be run from run_callback when the work is complete. Next we're setting up a small struct containing necessary information that will propagate through the call. Finally we're making the call to uv_queue_work. This will run run_work asynchronously in the thread pool then call run_complete.

Now let's start defining some of these mystic variables. Starting from the top:

#include <v8.h>
#include <node.h>
#include <node_buffer.h>
#include <assert.h>

#define SIZE 8

using namespace v8;

struct js_work {
  uv_work_t req;
  Persistent<Function> callback;
  char* data;
  size_t len;
};

Referring back to the first section you'll see work->req.data = work;. The uv_work_t has a data field where we can store a void pointer. So by creating this loop reference to work we'll be able to get at either later on.

void run_work(uv_work_t* req) {
  js_work* work = static_cast<js_work*>(req->data);
  char* data = new char[SIZE];
  for (int i = 0; i < SIZE; i++)
    data[i] = 97;
  work->data = data;
  work->len = SIZE;
}

This snippet clarifies why it's useful storing a void pointer to our js_work struct. For this simple example we're just allocating a small chunk of memory and filling it with char code 97 (i.e. 'a'). Then we're able to assign it back to the struct we created earlier.

Awesome. We've created a thread and had it execute some work. Time to wrap it all up:

void run_callback(uv_work_t* req, int status) {
  HandleScope scope;

  js_work* work = static_cast<js_work*>(req->data);
  char* data = work->data;
  node::Buffer* buf = node::Buffer::New(data, work->len);

  Handle<Value> argv[1] = { buf->handle_ };

  // proper way to reenter the js world
  node::MakeCallback(Context::GetCurrent()->Global(),
                     work->callback,
                     1,
                     argv);

  // properly cleanup, or death by millions of tiny leaks
  work->callback.Dispose();
  work->callback.Clear();
  // unfortunately in v0.10 Buffer::New(char*, size_t) makes a copy
  // and we don't have the Buffer::Use() api yet
  delete[] data;
  delete work;
}

Note: This is an abstract and not implementation ready. In the case that node::MakeCallback() throws then data wouldn't be free'd.

To start we unravel our data and make a Buffer out of it. In v0.10 we're still stuck with the old way of doing things so it's not too pretty, but it'll get the job done. Afterwards we need to create an array of Value's that'll be passed to the callback.

Now finally, time to call our callback! MakeCallback is the proper way to reenter js-land from an asynchronous native method. So first we pass the global scope (any object would do actually), the callback function and then the callback argument count and array. Last thing necessary is some proper cleanup. Don't forget to wash behind the ears!

To complete the whole thing, here's the script I used to test:

var basics = require('./build/Release/basics');

function runMe(buf) {
  console.log(buf.toString());
}

basics.run(runMe);

console.log('called');

// output:
// called
// aaaaaaaa

Simple tutorial done. Though be warned, today was the first time I actually did this. So trolls may be lurking.

Tuesday, July 9, 2013

child process multiple file descriptors

Not sure why I never put these together, but a recent post on a GitHub issue from Ben demonstrated how simple it is to use file descriptors to pipe data between parent/child processes:
// null means to use the default setting (pipe) for that fd
var options = { stdio: [null, null, null, 'pipe'] };
var args = [ /* ... */ ];
var proc = child_process.spawn(cmd, args, options);
var pipe = proc.stdio[3];
pipe.write(Buffer('awesome'));


// now setup your child process like so
var pipe = new net.Socket({ fd: 3 });
pipe.on('data', function(buf) {
 // do whatever
});
Great. Now we can pipe data back and forth without possibly colliding with output of standard I/O.