Friday, February 6, 2015

AsyncWrap Tutorial - An Introduction

Some recent changes have both removed an API people were looking forward to and added another that isn't documented, but that does expose the hooks used so anyone can implement their own tracing or profiling API.

Because there is no official documentation on the new internal API I've written this short tutorial on how to use AsyncWrap for fun and profit. I'd encourage as many as are willing to try it out and give feedback on what more they'd like to see. Instead of exposing a complex tracing API, I'd rather give you the ability to create your own.

Please note that these features are experimental and may change. While I don't expect any existing APIs to break in the future, there are definitely features that will be added. Also that the functionality listed here will only work with Node v0.12.1 and post io.js v1.1.1.

Prepping the World

First thing we have to do is tell AsyncWrap what init, before and after callbacks we want fired. Here's the boilerplate to have everything setup:

// Grab the JS hooks to AsyncWrap.
var async_wrap = process.binding('async_wrap');

// Setup needed variables and callbacks.
var kCallInitHook = 0;
var asyncHooksObject = {};

function asyncInit() {
  // This propery must be set to an Object.
  this._asyncQueue = {};
}

function asyncBefore() { }

function asyncAfter() { }

// Set all the necessary callbacks.
async_wrap.setupHooks(asyncHooksObject, asyncInit, asyncBefore, asyncAfter);

If you're like me you would have taken the above and ran with it before reading on. Probably done something like added some logging to the callbacks to see when they get called. Though you'd have also noticed that nothing was happening.

There's actually one more important line that needs to be added, but I left it out because you need to understand how to use it before attempting to log from any of the async callbacks. Here's a continuation of the script:

// Tell AsyncWrap we want callbacks to be called.
asyncHooksObject[kCallInitHook] = 1;

This tells AsyncWrap that we want to start listening for asynchronous events. And it can just as easily be turned off by setting it equal to zero.

Now do not try to log information from any of the callbacks just yet. Unless you want the application to abort and core dump all over the place. Before doing so AsyncWrap should be told to not listen for anything. Otherwise it may get caught in an infinite loop. Here's a partial re-implementation of the script:

function asyncInit() {
  this._asyncQueue = {};

  // Cache the current setting.
  var was_listening = asyncHooksObject[kCallInitHook];
  // Disable listening for any calls to AsyncWrap.
  asyncHooksObject[kCallInitHook] = 0;

  // Safely make other calls.
  console.log('in asyncInit');

  // Restore the previous setting.
  asyncHooksObject[kCallInitHook] = was_listening;
}

So why is it important to cache the previous state for kCallInitHook? Because if a handle or request was created while AsyncWrap was active then the callbacks will always run. In other words kCallInitHook only affects the call to the init callback, and if it did run then the before/after callbacks must also run.

Propagating Information

So, we have this power API at our fingertips, but how do we leverage it? Let's first start with a simple example to see the ordering of how things are triggered:

var async_wrap = process.binding('async_wrap');
var uid = 0;

var kCallInitHook = 0;
var asyncHooksObject = {};

function asyncInit() {
  this._asyncQueue = { uid: ++uid };
  // Internal mechanism to write directly to stderr.
  process._rawDebug('init    ' + this._asyncQueue.uid);
}

function asyncBefore() {
  process._rawDebug('before: ' + this._asyncQueue.uid);
}

function asyncAfter() {
  process._rawDebug('after:  ' + this._asyncQueue.uid);
  process.exit();
}

async_wrap.setupHooks(asyncHooksObject, asyncInit, asyncBefore, asyncAfter);
asyncHooksObject[kCallInitHook] = 1;

var net = require('net');
var s = net.createServer();

s.on('connection', function onConnection(c) {
  c.end('bye\n');
});

s.listen(8000, function() {
  // Don't want to trace the req making the conntion. Only the received
  // req by the server.
  asyncHooksObject[kCallInitHook] = 0;

  // Create a connection for logging, then exit app.
  net.connect(8000, function() { });
});

// Output:
// init    1
// before: 1
// init    2
// after:  1

The important point to pay attention to is that the init call to 2 was made in between calls to before and after on 1. Allowing us to setup and tear down state for any newly created async requests. This will come in handy in a short while to create a very simple long stack trace script.

AsyncWrap in Action

Best way I can teach is by example. So let's write a simple example script that creates long stack traces for requests.

var async_wrap = process.binding('async_wrap');

var kCallInitHook = 0;
var asyncHooksObject = {};
// Keep track of our current stack.
var currentStack = [];

function asyncInit() {
  // Generate a new stack for this call site.
  var stack = {};
  Error.captureStackTrace(stack, asyncInit);

  // Add a copy of the current stack plus new stack to propagate.
  var newStack = currentStack.slice(0);
  newStack.push(stack);
  this._asyncQueue = { stack: stack, stack_arr: newStack };
}

function asyncBefore() {
  // Add this stack to the current stack.
  currentStack.push(this._asyncQueue.stack);
}

function asyncAfter() {
  // Remove this stack.
  currentStack.pop();
}

async_wrap.setupHooks(asyncHooksObject, asyncInit, asyncBefore, asyncAfter);
asyncHooksObject[kCallInitHook] = 1;


var net = require('net');
var s = net.createServer();

s.on('connection', function onConnection(c) {
  c.end('bye\n', function() {
    // Print long stack and exit app.
    var stack_arr = this._handle._asyncQueue.stack_arr;
    for (var i = stack_arr.length - 1; i >= 0; i--)
      console.log(i + '\n' + stack_arr[i].stack.substr(6));
    process.exit();
  });
});

s.listen(8000, function() {
  asyncHooksObject[kCallInitHook] = 0;
  net.connect(8000, function() { });
});
The output of this is rather interesting:
1

0
    at createTCP (net.js:32:10)
    at exports._createServerHandle (net.js:1077:14)
    at Server._listen2 (net.js:1121:14)
    at listen (net.js:1178:10)
    at Server.listen (net.js:1265:5)
    at Object. (/tmp/async-wrap-post.js:46:3)
    at Module._compile (module.js:446:26)
    at Object.Module._extensions..js (module.js:464:10)
    at Module.load (module.js:341:32)
    at Function.Module._load (module.js:296:12)

The space below 1 is empty. Meaning the incoming connection had no stack trace. This makes sense because it was created coming from the native layer, and no specific call from JS. So, in the least we'll be able to fine the reason the connection was made in the first place.

Future Stuff

There are several key features that will be included in the future.

First is having proper support for timers. Currently timers are stored in a linked list in JavaScript, so C++ isn't notified of every timer created.

Second is error tracing. This was a feature included in async listener, but was removed for expediency of getting the patch landed. With this you'll be able to be notified of any errors that happen within the stack of an asynchronous event you are watching. The error will bubble from the origin point through normal handling mechanisms, and will allow the error to be captured and handled just before the 'uncaughtException' event.

Third is subsystem filters. Instead of getting notified for all events, you can filter out when the init callback is called (e.g. TCP, PIPE).

Fourth is the ability to add hooks from the native layer (e.g. C++) so as to further reduce overhead of collecting metrics.

I look forward to working with the community to making Node's internal tracing and profiling hooks as robust as possible. Thanks for your help.