Tuesday, July 23, 2013

measuring node performance - part 1

This is the first of some unknown number of blog posts about tracking your Node application's performance. Today we're going to start off with a few simple ones. Though before we being I'd just like to make sure everyone is familiar with the --v8-options flag. There is all sorts of magic here waiting to be found. Go ahead, take a quick look. I'll be waiting here for you... Now a lot of these won't be much use until we combine them with other tools, but we'll get to those in future posts. Right now we're going to focus on a few that can give us a quick peek into how our code is doing.

Let's start off with this tiny snippet of code:

function sumItUp(n) {
  for (var i = n + 1, m = n * 2; i <= m; i += 2)
    n += i;
  return n;
}

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = sumItUp(i);
  return tmp;
}

runTest(53509);

Now run it with --trace-deopt and --code-comments. See anything? What do you mean you don't see anything? Oh, yeah. Here, change 53509 to 53510 and run it again. Now see anything? Good. It should look something like this:

[deoptimizing (DEOPT eager): begin 0x1dfe78f8b109 runTest @7, FP to SP delta: 64]
            ;;; jump table entry 3: deoptimization bailout 7.
  translating runTest => node=40, height=24
    0x7fff7f678860: [top + 64] <- 0x5e3f1c5fe69 ; rdi 0x5e3f1c5fe69 <JS Global Object>
    0x7fff7f678858: [top + 56] <- 53510 ; rcx (smi)
    0x7fff7f678850: [top + 48] <- 0x2f3759173318 ; caller's pc
    0x7fff7f678848: [top + 40] <- 0x7fff7f678888 ; caller's fp
    0x7fff7f678840: [top + 32] <- 0x1dfe78f8b089; context
    0x7fff7f678838: [top + 24] <- 0x1dfe78f8b109; function
    0x7fff7f678830: [top + 16] <- 53509 ; r9 (smi)
    0x7fff7f678828: [top + 8] <- 0x5e3f1c04121 <undefined> ; literal
    0x7fff7f678820: [top + 0] <- 0x1dfe78f8b0c1 ; rdx 0x1dfe78f8b0c1 <JS Function sumItUp (SharedFunctionInfo 0x2911dc22c219)>
  translating sumItUp => node=38, height=16
    0x7fff7f678818: [top + 56] <- 0x5e3f1c5fe69 ; r8 0x5e3f1c5fe69 <JS Global Object>
    0x7fff7f678810: [top + 48] <- 2147409811 ; r11 (smi)
    0x7fff7f678808: [top + 40] <- 0x2f3759173526 ; caller's pc
    0x7fff7f678800: [top + 32] <- 0x7fff7f678848 ; caller's fp
    0x7fff7f6787f8: [top + 24] <- 0x1dfe78f8b089; context
    0x7fff7f6787f0: [top + 16] <- 0x1dfe78f8b0c1; function
    0x7fff7f6787e8: [top + 8] <- 107018 ; rax (smi)
    0x7fff7f6787e0: [top + 0] <- 107018 ; rbx (smi)
[deoptimizing (eager): end 0x1dfe78f8b0c1 runTest @7 => node=38, pc=0x2f37591736df, state=NO_REGISTERS, alignment=no padding, took 0.160 ms]
[removing optimized code for: runTest]

Wow. This looks all cryptic and scary, but there's a lot we can learn. It's basically the assembly output of what your JS became. The order of execution goes from the bottom up, so let's start there.

First let's pick out the values that we understand: 107018 ; rax (smi) is generated by the loop in sumItUp (remember since 53509 * 2 == 107018) and 2147409811 ; r11 (smi) is the return value.

What's that magical smi you ask? Well I'm glad you did. It stands for "small integer" and while all numbers in JS are technically type double, an optimization technique v8 uses is to detect when only smi's are used. On my x64 machine that's a full 32 bit integer. So, the largest smi would be... 2147483647.

Working our way up notice 53510 ; rcx (smi) and then the deoptimization. Grab the return value there with a simple console.log(runTest(53510)); and the output should be 2147516829. Applying our new found knowledge of smi's we can see that 2^32 / 2 - 1 - 2147516829 == -33182. Oops, overflow. Sometimes this may not be avoidable, but at least it's good to know why code is deoptimizing.

Another useful one is --trace-opt. Using that with our working code it'll output:

[marking 0x1dc7f8f8b011 <JS Function sumItUp (SharedFunctionInfo 0x6815302c219)> for recompilation, reason: small function, ICs with typeinfo: 5/5 (100%)]
[optimizing 0x1dc7f8f8b011 <JS Function sumItUp (SharedFunctionInfo 0x6815302c219)> - took 0.025, 0.114, 0.038 ms]
[marking 0x1dc7f8f8b059 <JS Function runTest (SharedFunctionInfo 0x6815302c2a9)> for recompilation, reason: small function, ICs with typeinfo: 2/2 (100%)]
[optimizing 0x1dc7f8f8b059 <JS Function runTest (SharedFunctionInfo 0x6815302c2a9)> - took 0.044, 0.109, 0.021 ms]

Nice to know and all, but where I really find it advantageous is to find problems like the following:

function retSumFn() {
  function sumItUp(n) {
    for (var i = n + 1, m = n * 2; i <= m; i += 2)
      n += i;
    return n;
  }
  return sumItUp;
}

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = retSumFn()(i);
  return tmp;
}

runTest(1e3);

Run this and prepare yourself for a wall of text. See, v8 needs to optimize every function returned. So keep functions at the top level. If your function needs something, pass it in. Here's a very simple modification that greatly reduces the stress on your application:

function sumItUp(n) {
  for (var i = n + 1, m = n * 2; i <= m; i += 2)
    n += i;
  return n;
}

function retSumFn() { return sumItUp; }

function runTest(max) {
  for (var i = 0; ++i < max;)
    var tmp = retSumFn()(i);
  return tmp;
}

runTest(53509);

And finally, the mother of all optimization --trace-inlining. This is code that can be not just optimized, but the resulting assembly can be inline'd into the function making the call. For this to happen the function body must be small. Under 600 characters actually. Variable types cannot change. So if you're expecting a number, then go ahead and coerce it on entry (e.g. +'0xff' == 255). Also limit the types of arguments that are passed.

Let's throw --trace-inling onto our original code and see the output. Actually pretty basic: Inlined sumItUp called from runTest. This is great. It means internally it placed the assembly generated from sumItUp directly into runTest. Now, we have reached a new height in optimization!

Welcome to the basics of optimization. Here are a couple links with more in-depth material:

Understanding v8
Optimizing for v8

As always, if I screwed up please let me know. Don't want to propagate bad information, but guess since no one reads this anyways that won't be a problem :)

No comments:

Post a Comment