debuggable

 
Contact Us
 
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Programmer Productivity: Weekends

Posted on 14/8/10 by Felix Geisendörfer

Weekends are a weird thing when you love what you are doing. Most weekends I have one voice in my head that pushes me to work, and another one that tells me to relax and have some fun.

It's tough, because I always have a lot of ideas for interesting projects, and weekends are about the only time when I can just hack on something for the fun of it. And of course, we are also bootstrapping a startup, so that is very hard to do without putting in some hours over the weekend.

I generally don't have an upfront plan for my weekends, so I mostly just do what feels right at any given moment, be it some hours of coding, some beach volleyball, or playing forkmaster with a few friends. There is usually also one night of hanging out with friends drinking.

This is nothing I'm proud of. It's 2 out of 7 days a week that go by without a strong vision. These weekends feel more like a "break" than a vacation. There is little adventure or heroic work ethics that I can look back at when starting out on Monday.

So for next weekend, I will try to work out a clear plan that will make it a wonderful experience including some well-defined time for serious work. By having an upfront plan, I expect myself to be more emerged in both the fun, and the work with less confusion in my head about what I should be doing at any given moment. Ideally I'll leave the weekend more refreshed than usual, while still having made a nice dent into my backlog.

After that I might also try a weekend with no computer / iphone access. If that yields considerably better results, I just need to get my weekly productivity levels high enough to make this possible.

Yesterday's Productivity

  1. Productive work: 2,80 h
  2. Busy work: 4,40 h
  3. Procrastination: 0,37 h

Time after 6pm: 0 h

271 lines of code added, 172 deleted in 2 projects

  • Busy work is still high due to the fact that the box that his site is running on is acting up, and I'm working on migrating everything to vps.net.
  • There was a beach volleyball tournament right after work, so no after-hour working
  • Procrastination levels are still excellent. I played one short game of SC2 and one game of foosball, that's reasonable for my taste.

--fg

PS: How are you spending your weekends? What would you like to change?

 

Programmer Productivity: Measuring Results

Posted on 13/8/10 by Felix Geisendörfer

This is post #2 of my programmer productivity series.

Before I embark on trying out various productivity strategies, I need to establish a set of metrics that will help me understand what works, and what doesn't.

For now I decided to keep track of two things:

  1. My time, using a daily journal where I enter every task I am working on, and the time spent on it.
  2. My commits, as well as the lines of code I add / delete throughout my day

If you want to follow along, I am using a simple notebook for my time tracking. I write one task per row, using the following format:

  • 09:23 - 09:26 Coffee & Planning today's tasks (3m)
  • 09:26 - 10:30 Work on setting up new server (1h 3m)

By writing out the time I start and end my tasks, it's pretty hard to miss an item.

As far as commits are concerned, I whipped up a simple node.js script that you can download here. To use the script, copy it anywhere on your machine (maybe you like /usr/local/bin/gitloc ?), and then add it to your git post-commit template hook like so:

$ cat /usr/local/git/share/git-core/templates/hooks/post-commit
#!/bin/sh
/usr/local/bin/gitloc

This will automatically track the commits in all new repositories you create. If you want to track an existing repository as well, simply do "git init" in the projects root directory, and the template hook will be copied in.

The CSV file created by the script looks like this:

2010-08-12 17:00:09,8eb212e,Implemented initial map display,142,130,/Library/WebServer/Documents/tvype/portal
2010-08-12 18:34:22,1fa8859,Initial billing code,86,2,/Library/WebServer/Documents/transloader.transload.it

The columns after the commit subject are the lines added / removed. The last column is the path of the project the commit was made to.

Analysis

As far as evaluating my time spent, I have come up with a simple system of classifying my task log entries into three categories:

  1. Productive work: I'm creating value for other people (clients, transloadit customers, etc.). If I did this all day, I'll reach all of my goals.
  2. Busy work: Answering emails, meetings, staring at the screen with little results, fixing bugs, even writing this blog post series (It's not productive work unless I'll make a living selling productivity advice, which I won't). If I did this all day, I'll end up getting very average results.
  3. Procrastination: Reading news, lunch (if longer than 30 min), checking emails in an unproductive way, playing games. If I did this all day I'd be homeless in no time : ).

Of course the lines are always a bit blurry. My advice is to assign a task the category you intuitively think of first. If you must, split the task up in 50% of each category. Skip to the bottom of this article to see my summary for yesterday.

Regarding the commit log, for now I'll simply count:

  1. Lines added
  2. Lines removed
  3. Projects worked on

Ideally I will strike a balance here. Never < 100 lines / day, but >= 1000 is way too much as well. This might be a "programmer productivity" series, but many of us have other tasks that would definitely get neglected if we only wrote code all day.

Yesterday's Productivity

And here comes the promised analysis from yesterdays logs:

  1. Productive work: 4,03 h
  2. Busy work: 4,60 h
  3. Procrastination: 0,45 h

Time after 6pm: 1.5.h

274 lines of code added, 186 deleted in 3 projects

A few thoughts on those numbers:

  • There was a 2.5 hour meeting with our client yesterday. That's the first time we had a meeting that long and hopefully the last time ; )
  • I won't add "Procrastination" that happens after 6pm to the numbers. If I choose to work after 6pm, fine, but if I like to play instead that's cool. There is no reward for burning out.
  • The 0,45 h procrastination are from a SC2 game I played with Tim after lunch. Damn you Blizzard ... . There was also 5 minutes of unproductive email checking / following a link to Facebook.

Coming next

My next post will look into productivity killing patterns and habits and how to fight them.

I won't start setting specific goals until I have collected a little more data about my current productivity.

--fg

PS: My friend Mark Grabanski wrote a post on various productivity factors. Good thoughts, especially on the importance of "tools".

 

Programmer Productivity

Posted on 12/8/10 by Felix Geisendörfer

90% of the code is written by 10% of the programmers.

-- Robert C. Martin

I have decided to go on a journey for greater productivity. This is my first post on the subject, and I intend to write about it every day for at least 30 days now.

Just in case you wonder, this whole series won't be a magic guide of universal wisdom that you just have to inhale in order to become a Super Mutant Ninja Developer. Nor do I plan on an artificial motivational boost that will stop as soon as the series is over.

No, my plan is to go deep and fix certain anti-patterns I have developed over the years. I want to look into time / task management, flow, sleep, music, diet, tools, email, work environment, free time, social life, projects, vacations, meditation, habits, money, etc. and achieve a greater understanding how these things affect my daily output.

A lot of what I write might be very individual, so should you be inclined to read it, please draw your own conclusions. This whole thing is going to be more about the process and analysis, than any particular strategy I might derive from it.

To put some science into the whole thing, my quest for today will be to determine a few good metrics for measuring my productivity and implementing tools that will help collecting them. After that I will set some initial goals for improvement and perform various productivity experiments throughout the coming weeks.

--fg

PS: If you are interested in joining me, just link me to any blog posts or leave comments about your own productivity experiences, I will gladly highlight them.

 

Announcing transloadit.com

Posted on 13/7/10 by Felix Geisendörfer

Today we are thrilled to announce the commercial availability of transloadit.com.

If you have a web (or mobile) application that needs file uploading you should consider integrating transloadit. Transloadit will handle the upload process, resizing of images, encoding of videos, and final storage of your content on Amazon S3 for you.

Our plans start at $19 / month which includes 3.5 GB of usage. This is enough for ~72 video encodings or ~717 image resizes per month.

This project has been almost two years in the making, with over 150 people participating in testing various versions. The version we are shipping now has already executed 55.000 internal jobs, each spawning 2-5 command line scripts on our servers.

We are also one of the first commercial software / infrastructure as a service product built on node.js. After experimenting with various technologies, we found it to be the perfect fit for our uploading and processing requirements.

Another thing we are very proud of is the ~95% of test coverage of the service's code base. We have an extensive suite of unit, integration and system tests that have already proven incredibly reliable for detecting problems, be it in our code, or changes to our stack.

If you are a long time reader of this blog, we would feel incredibly grateful if you would spread the word about our service to your boss, co-workers and geek-friends.

Otherwise we would be very happy to hear as much feedback, ideas and questions as you can come up with!

--fg

PS: I also want to use this opportunity to thank my co-founders Tim and Kevin for being the best partners in this business I can imagine. I love you guys.

 

Parsing file uploads at 500 mb/s with node.js

Posted on 31/5/10 by Felix Geisendörfer

A few weeks ago I set out to create a new multipart/form-data parser for node.js. We need this parser for the new version of transloadit that we have been working on since our setback last month.

The result is a new library called formidable, which, on a high level, makes receiving file uploads with node.js as easy as:

var formidable = require('formidable')
  , http = require('http')
  , sys = require('sys');

http.createServer(function(req, res) {
  if (req.url == '/upload' && req.method.toLowerCase() == 'post') {
    // parse a file upload
    var form = new formidable.IncomingForm();
    form.parse(req, function(fields, files) {
      res.writeHead(200, {'content-type': 'text/plain'});
      res.write('received upload:\n\n');
      res.end(sys.inspect({fields: fields, files: files}));
    });
    return;
  }

  // show a file upload form
  res.writeHead(200, {'content-type': 'text/html'});
  res.end
    ( '<form action="/upload" enctype="multipart/form-data" method="post">'
    + '<input type="text" name="title"><br>'
    + '<input type="file" name="upload" multiple="multiple"><br>'
    + '<input type="submit" value="Upload">'
    + '</form>'
    );
});

Essentially this works similar to other platforms where file uploads are saved to disk before your script is invoked with a path to the uploaded file.

What's nice about this however, is that you can hook into the whole thing on a lower level:

form.parse(req);
form.addListener('file', function(field, file) {
  // file looks like this:
  // {path: '...' , filename: '...', mime: '...'}
});

We use that interface for processing HTML5 multi-file uploads as they come in, rather than waiting for the entire upload to finish.

You could even overwrite the onPart handler, which gives you direct access to the raw data stream:

form.onPart = function(part) {
  part.addListener('data', function(chunk) {
    // do cool stuff, like streaming incoming files somewhere else
  });
};

All of this is possible thanks to the underlaying multipart parser which makes heavy use of node.js buffers.

Buffers in node are basically just (efficient) arrays of raw memory that you can access byte by byte. The parser works by looping over each incoming buffer chunk, while maintaining as little state as possible to do its work:

// simplified excerpt from MultipartParser.write
// chunk = Buffer of incoming data

for (var i = 0; i < chunk.length; i++) {
  var character = chunk[i];
  switch (this.state) {
    case 'BOUNDARY-BEGIN':
      if (character != this.boundary[i]) {
        // unexpected character, abort parsing
        return 0;
      }

      if (i == this.boundary.length) {
       // emit event, advance to next state
       this.onPartHeaderBegin();
       this.state = 'HEADER-BEGIN';
      }
      break;
    case 'HEADER-BEGIN':
      // ...
      break;
  }
}

But, as you can imagine, this approach turned out to be somewhat limited in speed. I was only able to get about 16-20 mb/s out of this. My goal however was to get somewhere around 125 mb/s, enough to saturate a 1 gbit network connection.

So I started to look for ways to optimize. The data I was parsing looks like this:

--AaB03x
content-disposition: form-data; name="title"

A picture of me on my unicycle!
--AaB03x
content-disposition: form-data; name="pic"; filename="unicycle.jpg"
Content-Type: image/jpeg

... binary data ...
--AaB03x--

The sequence here being:

  1. Series of boundary characters (--AaB03x), announcing the beginning of a new part
  2. Headers for that part
  3. \r\n\r\n
  4. Data for this part
  5. Series of boundary characters, announcing the end of the part or end of the stream

What stands out is the fact that there is no parsing needed for step #4. All the data of a part itself is a plain octet stream. So after talking to Ryan about it, he recommended me to look into the Boyer-Moore algorithm to speed things up.

The algorithm is usually the fastest method to find a sub string within a string of text. It basically works by analyzing the needle string and building a lookup table of its characters to efficiently skip over parts of the haystack.

But implementing it, was not exactly easy. The algorithm is not trivial, and many of the example implementations I found were wrong. That however was not the problem. The real challenge was that I was working with a stream of data, rather than a big string I had full access to.

This meant keeping lots of additional state in the parser, as well as creating a very complicated piece of code. I like challenges, but I also like efficiently using my time, so I started looking for a shortcut.

And then it hit me. Most of the boyer-moore algorithm is designed to improve the performance of the worst-case scenario. The worst-case scenario for this problem is the case where you hit a character in your haystack that is also a character in the needle string. Boyer-moore deals with this case by knowing the offset of each character in the needle string, so it can maximize the number of characters to skip in any case.

But file uploads rarely cause these worst-case scenarios! With human text, character repetition is pretty high. But file uploads are binary data, so most bytes are likely to fall outside the ASCII range of characters usually used for the boundary.

That made the solution much simpler. All I had to do was generating a list of all characters in the boundary, and whenever I hit a character that was not in that list, I knew I could safely skip the full length of the boundary:

while (i + boundary.length <= chunk.length) {
  if (chunk[i + boundary.length - 1] in boundaryChars) {
    // worst case, go back to byte by byte parsing until a non-matching char occurs
    break;
  }
  i += boundary.length;
}

This resulted in an incredible speed up, allowing to parse uploads at 500 mb/sec. The parsing can be even faster if a longer boundary sequence is used. Short boundaries and hitting the worst-case scenario frequently will slow things down.

The benchmark I created is using an actual boundary created by Mozilla Firefox. Your milage may vary slightly.

The whole thing could still be optimized further, but at this point I believe it is fast enough to make other parts of the system more likely to become the bottleneck.

Formidable is licensed under the MIT license. Questions & suggestions regarding the library, node.js or the parser would be most welcome.

--fg

 
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9