Friday, February 6, 2015

AsyncWrap Tutorial - An Introduction

Some recent changes have both removed an API people were looking forward to and added another that isn't documented, but that does expose the hooks used so anyone can implement their own tracing or profiling API.

Because there is no official documentation on the new internal API I've written this short tutorial on how to use AsyncWrap for fun and profit. I'd encourage as many as are willing to try it out and give feedback on what more they'd like to see. Instead of exposing a complex tracing API, I'd rather give you the ability to create your own.

Please note that these features are experimental and may change. While I don't expect any existing APIs to break in the future, there are definitely features that will be added. Also that the functionality listed here will only work with Node v0.12.1 and post io.js v1.1.1.

Prepping the World

First thing we have to do is tell AsyncWrap what init, before and after callbacks we want fired. Here's the boilerplate to have everything setup:

// Grab the JS hooks to AsyncWrap.
var async_wrap = process.binding('async_wrap');

// Setup needed variables and callbacks.
var kCallInitHook = 0;
var asyncHooksObject = {};

function asyncInit() {
  // This propery must be set to an Object.
  this._asyncQueue = {};
}

function asyncBefore() { }

function asyncAfter() { }

// Set all the necessary callbacks.
async_wrap.setupHooks(asyncHooksObject, asyncInit, asyncBefore, asyncAfter);

If you're like me you would have taken the above and ran with it before reading on. Probably done something like added some logging to the callbacks to see when they get called. Though you'd have also noticed that nothing was happening.

There's actually one more important line that needs to be added, but I left it out because you need to understand how to use it before attempting to log from any of the async callbacks. Here's a continuation of the script:

// Tell AsyncWrap we want callbacks to be called.
asyncHooksObject[kCallInitHook] = 1;

This tells AsyncWrap that we want to start listening for asynchronous events. And it can just as easily be turned off by setting it equal to zero.

Now do not try to log information from any of the callbacks just yet. Unless you want the application to abort and core dump all over the place. Before doing so AsyncWrap should be told to not listen for anything. Otherwise it may get caught in an infinite loop. Here's a partial re-implementation of the script:

function asyncInit() {
  this._asyncQueue = {};

  // Cache the current setting.
  var was_listening = asyncHooksObject[kCallInitHook];
  // Disable listening for any calls to AsyncWrap.
  asyncHooksObject[kCallInitHook] = 0;

  // Safely make other calls.
  console.log('in asyncInit');

  // Restore the previous setting.
  asyncHooksObject[kCallInitHook] = was_listening;
}

So why is it important to cache the previous state for kCallInitHook? Because if a handle or request was created while AsyncWrap was active then the callbacks will always run. In other words kCallInitHook only affects the call to the init callback, and if it did run then the before/after callbacks must also run.

Propagating Information

So, we have this power API at our fingertips, but how do we leverage it? Let's first start with a simple example to see the ordering of how things are triggered:

var async_wrap = process.binding('async_wrap');
var uid = 0;

var kCallInitHook = 0;
var asyncHooksObject = {};

function asyncInit() {
  this._asyncQueue = { uid: ++uid };
  // Internal mechanism to write directly to stderr.
  process._rawDebug('init    ' + this._asyncQueue.uid);
}

function asyncBefore() {
  process._rawDebug('before: ' + this._asyncQueue.uid);
}

function asyncAfter() {
  process._rawDebug('after:  ' + this._asyncQueue.uid);
  process.exit();
}

async_wrap.setupHooks(asyncHooksObject, asyncInit, asyncBefore, asyncAfter);
asyncHooksObject[kCallInitHook] = 1;

var net = require('net');
var s = net.createServer();

s.on('connection', function onConnection(c) {
  c.end('bye\n');
});

s.listen(8000, function() {
  // Don't want to trace the req making the conntion. Only the received
  // req by the server.
  asyncHooksObject[kCallInitHook] = 0;

  // Create a connection for logging, then exit app.
  net.connect(8000, function() { });
});

// Output:
// init    1
// before: 1
// init    2
// after:  1

The important point to pay attention to is that the init call to 2 was made in between calls to before and after on 1. Allowing us to setup and tear down state for any newly created async requests. This will come in handy in a short while to create a very simple long stack trace script.

AsyncWrap in Action

Best way I can teach is by example. So let's write a simple example script that creates long stack traces for requests.

var async_wrap = process.binding('async_wrap');

var kCallInitHook = 0;
var asyncHooksObject = {};
// Keep track of our current stack.
var currentStack = [];

function asyncInit() {
  // Generate a new stack for this call site.
  var stack = {};
  Error.captureStackTrace(stack, asyncInit);

  // Add a copy of the current stack plus new stack to propagate.
  var newStack = currentStack.slice(0);
  newStack.push(stack);
  this._asyncQueue = { stack: stack, stack_arr: newStack };
}

function asyncBefore() {
  // Add this stack to the current stack.
  currentStack.push(this._asyncQueue.stack);
}

function asyncAfter() {
  // Remove this stack.
  currentStack.pop();
}

async_wrap.setupHooks(asyncHooksObject, asyncInit, asyncBefore, asyncAfter);
asyncHooksObject[kCallInitHook] = 1;


var net = require('net');
var s = net.createServer();

s.on('connection', function onConnection(c) {
  c.end('bye\n', function() {
    // Print long stack and exit app.
    var stack_arr = this._handle._asyncQueue.stack_arr;
    for (var i = stack_arr.length - 1; i >= 0; i--)
      console.log(i + '\n' + stack_arr[i].stack.substr(6));
    process.exit();
  });
});

s.listen(8000, function() {
  asyncHooksObject[kCallInitHook] = 0;
  net.connect(8000, function() { });
});
The output of this is rather interesting:
1

0
    at createTCP (net.js:32:10)
    at exports._createServerHandle (net.js:1077:14)
    at Server._listen2 (net.js:1121:14)
    at listen (net.js:1178:10)
    at Server.listen (net.js:1265:5)
    at Object. (/tmp/async-wrap-post.js:46:3)
    at Module._compile (module.js:446:26)
    at Object.Module._extensions..js (module.js:464:10)
    at Module.load (module.js:341:32)
    at Function.Module._load (module.js:296:12)

The space below 1 is empty. Meaning the incoming connection had no stack trace. This makes sense because it was created coming from the native layer, and no specific call from JS. So, in the least we'll be able to fine the reason the connection was made in the first place.

Future Stuff

There are several key features that will be included in the future.

First is having proper support for timers. Currently timers are stored in a linked list in JavaScript, so C++ isn't notified of every timer created.

Second is error tracing. This was a feature included in async listener, but was removed for expediency of getting the patch landed. With this you'll be able to be notified of any errors that happen within the stack of an asynchronous event you are watching. The error will bubble from the origin point through normal handling mechanisms, and will allow the error to be captured and handled just before the 'uncaughtException' event.

Third is subsystem filters. Instead of getting notified for all events, you can filter out when the init callback is called (e.g. TCP, PIPE).

Fourth is the ability to add hooks from the native layer (e.g. C++) so as to further reduce overhead of collecting metrics.

I look forward to working with the community to making Node's internal tracing and profiling hooks as robust as possible. Thanks for your help.

Friday, August 22, 2014

Intercepting the HTTP Data Stream

Word of warning: This post only works with +v0.11 of Node, and a few things here are not official API but based on implementation details. Even so, figured this would be a worthwhile post for some.

Callback in the Middle Attack

What we'll be doing is hijacking the data stream between being received by the TCP socket and being processed by the HTTP parser. This allows manipulation of the data with no module the wiser. Let's get started:

var http = require('http');

// First start by adding a property to the Socket prototype that
// will be used to store the old data event callback(s).
require('net').Socket.prototype._data_fn_ = undefined;

var server = http.createServer().listen(8000);

server.on('connection', onConnection);

function onConnection(conn) {
  // The use of ._events is a complete implementation detail of
  // the EventListener.
  conn._data_fn_ = conn._events.data;

  // Now replace the data event with our own. This is to ensure
  // the custom data handler always runs first.
  conn._events.data = interceptData;
}

function interceptData(chunk) {
  // This is where the magic will happen, and be explained later.

  // After the magic, call the data event callback(s). For ease of
  // the example we'll assume there was only one data event callback.
  this._data_fn_(chunk);
}

server.on('request', onRequest);

function onRequest(req, res) {
  res.end('bye\n');
}

Running the above should produce a normally running HTTP server. Though now there's an entry point into the data stream. The function interceptData() is the first place data is received by the user. Even before it reaches the HTTP parser. Take note, trying to do this on the readable event will fail. This is an implementation detail and difference between how pushing data immediately to the user via the data event and buffering the data to be read by the user has an ever so slight difference.

Take special notice of the fact we're storing the old 'data' event and running it directly after the data stream manipulation. The reason for this is twofold. First, it is the most sure way to make sure interceptData() runs first. Second, in the case of an error the full stack trace is available. Whereas simply adding another event would hide the fact the data stream has been manipulated.

Exploiting the Data Stream

When data is received, the amount of data read is accumulated on the bytesRead property. If we change the length of the data then the HTTP parser may stall or botch the parsing, having expected more/less data. Luckily getting past this is trivial, and once the counter measure is in place we can begin to manipulate the data however we want.

function interceptData(chunk) {
  // Roll back how much data was actually read.
  this.bytesRead -= chunk.length;

  // Make a change to the data stream.
  chunk = manipulateData(chunk);

  // Now add back the actual length propagated.
  this.bytesRead += chunk.length;

  this._data_fn_(chunk);
}

An important point to keep in mind is that this example over simplifies a key issue. Which is, all the data may not arrive in a single TCP packet. Meaning it will be up to you to buffer data accordingly until all the necessary data has arrived.

For a quick example, let's remove the header customHeader:

function manipulateData(chunk) {
  var arr = chunk.toString().split(/\r\n/);
  var customHeader;
  for (var i = 0; i < arr.length; i++)
    if (arr[i].indexOf('customHeader:') === 0)
        customHeader = arr.splice(i, 1);
  // Return new Buffer that doesn't contain customHeader. 
  return new Buffer(arr.join('\r\n'));
}

Fairly straight forward really. The basic template here should allow for any type of data manipulation to the http module. Though keep in mind I do not recommend this technique. Feel the need to say that since I'm a core maintainer, and should never publicly advocate to use anything other than the "official" API. More importantly is that multiple modules don't attempt to do this in the same application. If for no other reason, the implications of each one stepping on the other's toes could lead to unpredictable results that are very difficult to track down. But, in light of those two things, it could be very helpful in cases like wanting to pre-parse headers before the http module has a chance.

I haven't thought of an appropriate module-type API to encapsulate this functionality. If you have any ideas, please feel free to share in the comments and maybe I'll make one (unofficial of course ;-). There would also be a way to make this backwards compatible with v0.10. Though that would require a substantially deeper hack. But that's one thing I love about code. Anything is possible as long as you know how.

Thursday, June 5, 2014

From Mozilla to NodeSource

Today I have officially handed in my resignation to Mozilla and will be starting work at NodeSource on June 16.

The decision to move was difficult. Dan Shaw can attest to that by the fact he'd taken every opportunity these last six months to bring me over. During that time I'd told him straight out that it would never happen. Working at Mozilla was about as good as any job could be. I mean, come on, Mozilla paid me to work on Node.js as a core maintainer but without any of the corporate oversight which could have required my efforts to focus the company's needs over my own desires as a maintainer to improve Node. Doesn't get much better than that for someone who enjoys developing open source software. Because of Mozilla I've been able to dive into many of the gritty aspects of core code and make them better just for the sake of having better code.

What eventually changed my mind? First, I have to give Dan credit at meeting my many requirements. Event to the point that he personally called my old manager Mark Mayo and discussed how I could continue to be closely affiliated with Mozilla after the transition. I realize that with how open Mozilla is it wouldn't be a problem for me to continue contributing, but there is some difference. Like, being able to walk a few desks down and speak with the Rust devs vs. chatting on IRC.

After feeling confident in being able to continue contributing back to the community just as openly as I have, and continue my work as core maintainer the same as usual, the decision to make the move felt more appropriate.

At NodeSource I'll have greater opportunity to work with companies that are using Node in the wild. These new interactions will give additional insight on how to focus my efforts as a core maintainer. My mantra is "make all the things fast", but with limited development hours it helps to know what to prioritize. Similarly I'm hopeful it will spark more creative madness to take Node to new levels of performance awesomeness.

NodeSource has coalesced an awesome group of talent. They'll definitely help me help the community, and to be honest, the community will get more out of me than before. Simply because I have additional assistance to assemble my mess of thoughts and code into something legible. So, you can thank NodeSource for future blog posts on how to make your Node apps faster.

So, here I jump from one awesome company to another. Never found myself a sentimental person. Irrational, rageful and sometimes impertinent sure. But it's always been about the code. Though in this case I'll make an exception (or maybe it's the Yo-Yo Ma instrumental piece I'm listening to while writing this) and say this is both a sad and exciting time in my career.

Wednesday, February 5, 2014

Node.js, ES6 and me

On January 31, 2014 my second son was born. That evening, while my wife and newborn were getting some sleep, I decided to jump on my phone and try to respond to the never ending mound of issues that seem to pile up. One of which was about enabling the --harmony option by default. Unfortunately one of my posts was misinterpreted by some. I will gladly take the blame for being simultaneously overbroad in my description of features accepted into Node core, and nonspecific about what core means.

Though bear in mind I never thought my comment would be taken so far as to lead some to believe Node would actually fork v8, or some such, to prevent ES6 features from being available in Node (especially since we had already done for one feature what this PR suggested we do for several others). Or because just prior to that paragraph I had discussed my conditions for allowing --harmony to be enabled by default, and it would have been a contradiction to then say it would never be allowed.

So first I'd like to clarify that I improperly referred to the entire ES6 spec. In that post I was specifically referring to iterators, generators and promises. Second I'd like to state for the record that I've never had the intention of preventing any Node developer from using new features coming to JavaScript, or from preventing these features from being exposed to the user if enabled in v8 by default. Last clarification is that the reference to "core" meant core code.

My Node Future

Some of your are probably thinking, "Why wouldn't he want to save us from callback hell and integrate a generator/promise based API?" The reasons are simple and pragmatic. First is because the key thing Node does (integrating JavaScript with the v8 C++ bindings/libuv) can only be done via callbacks. Second is that I want to remove abstraction and allow users to interface their code as close to the action as possible.

The community has proven two things. First is that none of us agree on how an API should be written. Second is that there are very talented developers that will push Node to the limits. My hopeful goal for Node is to introduce a standardized minimal API that sits very close to these C++ bindings. From there the community can create whatever interface they want with as close to zero unnecessary overhead as possible.

I want Node to be tremendously good at one thing. Being the kernel of a web platform for app developers who wish to write in JavaScript. That means writing an API that allows minimal abstraction and as close to zero overhead as possible. So why use the callback style? Because currently it's the only way to interface with the v8 C++ bindings. When the v8 team introduces an API for generators or promises I will give it honest consideration.

Moving at Node Speed

Some days I like to write little apps that sit directly on top of the internal C++ API (Max, promise I haven't forgotten about the router app). Every time I'm reminded of how much performance is left on the table due to abstraction.

Imagine you want to saturate 100Gb pipe with TCP traffic. That means you'll be sending at least 204,800 packets a second. Which also means something needs to be executed at least 204,800 times per second to send that data. Small amounts of extra abstraction start to make a big difference in performance at this level.

The v8 team has done some amazing stuff to improve performance in key areas that are important to Node (doubt they had Node in mind when making those changes, but I'll take it anyways). Things like making calls between C++ and JavaScript have become extremely cheap (we're talking double digit nanosecond cheap). I'm working on new performance patterns that can hopefully be integrated down the road that will allow techniques like using JavaScript to chain a series of C++ callbacks together (useful for high performance transforms).

There is still so much more to be done, but if we're really going to make Node fast and keep it light weight then it must be done using a specific subset of the JavaScript specification. Just how far can we push Node? Let me just say you ain't seen nothing yet.

Tuesday, July 23, 2013

measuring node performance - part 1

This is the first of some unknown number of blog posts about tracking your Node application's performance. Today we're going to start off with a few simple ones. Though before we being I'd just like to make sure everyone is familiar with the --v8-options flag. There is all sorts of magic here waiting to be found. Go ahead, take a quick look. I'll be waiting here for you... Now a lot of these won't be much use until we combine them with other tools, but we'll get to those in future posts. Right now we're going to focus on a few that can give us a quick peek into how our code is doing.

Let's start off with this tiny snippet of code:

function sumItUp(n) {
  for (var i = n + 1, m = n * 2; i <= m; i += 2)
    n += i;
  return n;
}

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = sumItUp(i);
  return tmp;
}

runTest(53509);

Now run it with --trace-deopt and --code-comments. See anything? What do you mean you don't see anything? Oh, yeah. Here, change 53509 to 53510 and run it again. Now see anything? Good. It should look something like this:

[deoptimizing (DEOPT eager): begin 0x1dfe78f8b109 runTest @7, FP to SP delta: 64]
            ;;; jump table entry 3: deoptimization bailout 7.
  translating runTest => node=40, height=24
    0x7fff7f678860: [top + 64] <- 0x5e3f1c5fe69 ; rdi 0x5e3f1c5fe69 <JS Global Object>
    0x7fff7f678858: [top + 56] <- 53510 ; rcx (smi)
    0x7fff7f678850: [top + 48] <- 0x2f3759173318 ; caller's pc
    0x7fff7f678848: [top + 40] <- 0x7fff7f678888 ; caller's fp
    0x7fff7f678840: [top + 32] <- 0x1dfe78f8b089; context
    0x7fff7f678838: [top + 24] <- 0x1dfe78f8b109; function
    0x7fff7f678830: [top + 16] <- 53509 ; r9 (smi)
    0x7fff7f678828: [top + 8] <- 0x5e3f1c04121 <undefined> ; literal
    0x7fff7f678820: [top + 0] <- 0x1dfe78f8b0c1 ; rdx 0x1dfe78f8b0c1 <JS Function sumItUp (SharedFunctionInfo 0x2911dc22c219)>
  translating sumItUp => node=38, height=16
    0x7fff7f678818: [top + 56] <- 0x5e3f1c5fe69 ; r8 0x5e3f1c5fe69 <JS Global Object>
    0x7fff7f678810: [top + 48] <- 2147409811 ; r11 (smi)
    0x7fff7f678808: [top + 40] <- 0x2f3759173526 ; caller's pc
    0x7fff7f678800: [top + 32] <- 0x7fff7f678848 ; caller's fp
    0x7fff7f6787f8: [top + 24] <- 0x1dfe78f8b089; context
    0x7fff7f6787f0: [top + 16] <- 0x1dfe78f8b0c1; function
    0x7fff7f6787e8: [top + 8] <- 107018 ; rax (smi)
    0x7fff7f6787e0: [top + 0] <- 107018 ; rbx (smi)
[deoptimizing (eager): end 0x1dfe78f8b0c1 runTest @7 => node=38, pc=0x2f37591736df, state=NO_REGISTERS, alignment=no padding, took 0.160 ms]
[removing optimized code for: runTest]

Wow. This looks all cryptic and scary, but there's a lot we can learn. It's basically the assembly output of what your JS became. The order of execution goes from the bottom up, so let's start there.

First let's pick out the values that we understand: 107018 ; rax (smi) is generated by the loop in sumItUp (remember since 53509 * 2 == 107018) and 2147409811 ; r11 (smi) is the return value.

What's that magical smi you ask? Well I'm glad you did. It stands for "small integer" and while all numbers in JS are technically type double, an optimization technique v8 uses is to detect when only smi's are used. On my x64 machine that's a full 32 bit integer. So, the largest smi would be... 2147483647.

Working our way up notice 53510 ; rcx (smi) and then the deoptimization. Grab the return value there with a simple console.log(runTest(53510)); and the output should be 2147516829. Applying our new found knowledge of smi's we can see that 2^32 / 2 - 1 - 2147516829 == -33182. Oops, overflow. Sometimes this may not be avoidable, but at least it's good to know why code is deoptimizing.

Another useful one is --trace-opt. Using that with our working code it'll output:

[marking 0x1dc7f8f8b011 <JS Function sumItUp (SharedFunctionInfo 0x6815302c219)> for recompilation, reason: small function, ICs with typeinfo: 5/5 (100%)]
[optimizing 0x1dc7f8f8b011 <JS Function sumItUp (SharedFunctionInfo 0x6815302c219)> - took 0.025, 0.114, 0.038 ms]
[marking 0x1dc7f8f8b059 <JS Function runTest (SharedFunctionInfo 0x6815302c2a9)> for recompilation, reason: small function, ICs with typeinfo: 2/2 (100%)]
[optimizing 0x1dc7f8f8b059 <JS Function runTest (SharedFunctionInfo 0x6815302c2a9)> - took 0.044, 0.109, 0.021 ms]

Nice to know and all, but where I really find it advantageous is to find problems like the following:

function retSumFn() {
  function sumItUp(n) {
    for (var i = n + 1, m = n * 2; i <= m; i += 2)
      n += i;
    return n;
  }
  return sumItUp;
}

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = retSumFn()(i);
  return tmp;
}

runTest(1e3);

Run this and prepare yourself for a wall of text. See, v8 needs to optimize every function returned. So keep functions at the top level. If your function needs something, pass it in. Here's a very simple modification that greatly reduces the stress on your application:

function sumItUp(n) {
  for (var i = n + 1, m = n * 2; i <= m; i += 2)
    n += i;
  return n;
}

function retSumFn() { return sumItUp; }

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = retSumFn()(i);
  return tmp;
}

runTest(53509);

And finally, the mother of all optimization --trace-inlining. This is code that can be not just optimized, but the resulting assembly can be inline'd into the function making the call. For this to happen the function body must be small. Under 600 characters actually. Variable types cannot change. So if you're expecting a number, then go ahead and coerce it on entry (e.g. +'0xff' == 255). Also limit the types of arguments that are passed.

Let's throw --trace-inling onto our original code and see the output. Actually pretty basic: Inlined sumItUp called from runTest. This is great. It means internally it placed the assembly generated from sumItUp directly into runTest. Now, we have reached a new height in optimization!

Welcome to the basics of optimization. Here are a couple links with more in-depth material:

Understanding v8
Optimizing for v8

As always, if I screwed up please let me know. Don't want to propagate bad information, but guess since no one reads this anyways that won't be a problem :)