debuggable

 
Contact Us
 

Streaming file uploads with node.js

Posted on 28/9/09 by Felix Geisendörfer

Update: I just updated the code so it works with node v0.1.18.

Not excited by hello world in node.js? No problem.

Let's say you are a startup focusing on upload technology and you want the maximum level of control for your file uploads. In our case that means having the ability to directly interact with the multipart data stream as it comes in (so we can abort the upload if something isn't right, - beats the hell out of letting the user wait an hour to tell him after the upload has finished).

Here is a complete example on how to accomplish this in node.js (you'll need the bleeding edge git version):

var http = require('http');
var multipart = require('multipart');
var sys = require('sys');

var server = http.createServer(function(req, res) {
  switch (req.uri.path) {
    case '/':
      display_form(req, res);
      break;
    case '/upload':
      upload_file(req, res);
      break;
    default:
      show_404(req, res);
      break;
  }
});
server.listen(8000);

function display_form(req, res) {
  res.sendHeader(200, {'Content-Type': 'text/html'});
  res.sendBody(
    '<form action="/upload" method="post" enctype="multipart/form-data">'+
    '<input type="file" name="upload-file">'+
    '<input type="submit" value="Upload">'+
    '</form>'
  );
  res.finish();
}

function upload_file(req, res) {
  req.setBodyEncoding('binary');

  var stream = new multipart.Stream(req);
  stream.addListener('part', function(part) {
    part.addListener('body', function(chunk) {
      var progress = (stream.bytesReceived / stream.bytesTotal * 100).toFixed(2);
      var mb = (stream.bytesTotal / 1024 / 1024).toFixed(1);

      sys.print("Uploading "+mb+"mb ("+progress+"%)\015");

      // chunk could be appended to a file if the uploaded file needs to be saved
    });
  });
  stream.addListener('complete', function() {
    res.sendHeader(200, {'Content-Type': 'text/plain'});
    res.sendBody('Thanks for playing!');
    res.finish();
    sys.puts("\n=> Done");
  });
}

function show_404(req, res) {
  res.sendHeader(404, {'Content-Type': 'text/plain'});
  res.sendBody('You r doing it rong!');
  res.finish();
}

The code is rather straight forward. First of all we include the multipart.js parser which is a library that has just been added to node.js.

Next we create a server listening on port 8000 that dispatches incoming requests to one of our 3 functions: display_form, upload_file or show_404. Again, very straight forward.

display_form serves a very short (and invalid) piece of HTML that will render a file upload form with a submit button. You can get to it by running the example (via: node uploader.js) and pointing your browser to http://localhost:8000/.

upload_file kicks in as soon as you select a file and hit the submit button. It tells the request object to expect binary data and then passes the work on to the multipart Stream parser. The result is a new stream object that emits two kinds of events: 'part' and 'complete'. 'part' is called whenever a new element is found within the multipart stream, you can find all the information about it by looking at the first argument's headers property. In order to get the actual contents of this part we attach a 'body' listener to it, which gets called for each chunk of bytes getting uploaded. In our example we just use this event to render a progress indicator in our command line, but we could also append this chunk to a file which would eventually become the entire file as uploaded from the browser. Finally the 'complete' event sends a response to the browser indicating the file has been uploaded.

show_404 is handling all unknown urls by returning an error response.

As you can see, the entire process is pretty simple, yet gives you a ton of control. You can also easily use this technique to show an AJAX progress bar for the upload to your users. The multipart parser also works with non-HTTP requests, just pass it the {boundary: '...'} option into the constructor and use steam.write() to pass it some data to parse. Check out the source of the parser if you're curious how it works internally.

-- Felix Geisendörfer aka the_undefined

 
&nsbp;

You can skip to the end and add a comment.

wirtsi  said on Oct 01, 2009:

Wow, wicked stuff ... just so I get this straight, this runs on the server, right?

So the advantage of this is that you can check for wrong file uploads before the upload is complete. But then on the other hand implementing all these jpg/png format checkers in javascript doesn't look too much fun.

In general the whole concept of developing your webserver in javascript seems a bit odd ... or is the idea to push all this to a client based storage (gears etc) and upload it via ajax in the background?

wirtsi

Felix Geisendörfer said on Oct 01, 2009:

wirtsi: Yes, this runs on the server. Parsing a JPG by hand is no fun, but other stuff like using the first few bytes of a file to check it's mime type can easily be done.

What's odd about developing a web service in JavaScript? I think it's wonderful as it makes it so much easier to write great AJAX applications with a desktop-like experience!

Wirtsi  said on Oct 02, 2009:

Hey Felix

I still can't quite see the advantage of doing this in js ... apart from the cutting edge geek factor you are missing out on so much cool server stuff you get when using a proper framework (database access, file validation, you name it).

But perhaps I haven't quite grasped it yet ... I suppose the interesting part comes when you connect this code to the client side. Instead of just printing out the progress bar on the server console you'd probably have some sort of event handling system that the client picks up. Does it help that the server speaks the client's native language (js)? And if so, how did you implement the communication between them...

I hope I don't come off as too nosy :) It's just that I'm working on a system where you can upload files on the client to a gears storage (which is blazing fast since the data is only transferred locally). Gears then should push the data onto the server which turns out to be quite a pain ... with upload resumes over unreliable internet connections. So what you are showing here could really help since you'd have two js servers speaking to each other.

wirtsi

Felix Geisendörfer said on Oct 02, 2009:

wirtsi: Thanks for the great comments! The disadvantages you listed are all very valid at this point, but I'm involved with node.js at this early stage because of the potential I see:

- Share code between browser & server
- More productivity because of reduced context-switching when using just 1 programming language

- Easier data exchange between browser & server (JSON's natural home is JavaScript, no serialization boundaries)

This applies to all server-side JS solutions. What is especially exciting about node.js is the it's asynchronous programming model. All and every I/O that you execute is non-blocking. That means you never have to wait for function calls to return, never have to perform actions in series that could be parallelized. Yes, you can do that with some other programming languages, but I've never seen it in such a natural and easy-to-use fashion. So if I have to guess what will allow my services to scale in future, it's exactly this kind of technology vs. the traditional blocking stuff. What ya think?

Wirtsi  said on Oct 05, 2009:

Hi Felix

very interesting points ... especially for heavy ajax apps it probably helps a lot if the server works non blocking. But I wonder if you really can pull this whole thing through, there are some limits ...

* The browsers ajax limit (2 afaik) limits the need for massive server parallism
* Data integrity: It's quite easy to get into all the pitfalls of thread programming ... especially with a dynamically typed language like js I can imagine really nasty interferences or deadlocks.

* So the server work parallel but your database still stores everything sequentially.

* Speaking of databases, isn't the scaling of them the trickiest part? For the webserver you thrown in a few new webservers and a loadbalancer but get a db scaled out can be quite a pain (as you can see in all this ongoing NoSql movement)

But don't get me wrong, I'm really thrilled by the node.js approach you are taking. If you need a massively scalable system then this is probably the way to go. For your transcoding service I could imagine the load is more heavy on the video/audio compressors but these things should scale very well.

Perhaps this really is the dawning of a new area in web programming. When I started coding PHP, everything was poorly documented and everybody would resolve to these really vile ways of coding. Now there's frameworks like CakePHP help you getting work done proper ... perhaps in a few years time there'll be something like this for JavaScript.

wirtsi

Felix Geisendörfer said on Oct 05, 2009:

Wirtsi: Node.js was not a scaling decision for us. We use it because we were actually able to replace a few thousand lines of PHP as well as message queue system with a few lines of very effective JavaScript : ). When we receive a file upload, we deal with it over the course of several minutes triggering several actions on it. It turns out the easiest way to achieve that is in one non-blocking upload-server that guides the uploaded file through the entire process. We could not have done this with PHP.

Of course this is not a general-purpose use case, I still think PHP kicks node.js ass for web apps and will for a while. But there are so many other things your web app needs. For example I have a few projects that need realtime updates (a game and chat) right now, so I'm writing a cross-domain comet server in node.js - something that is very easy to do with node but would be very hard with all other technologies I know of.

Anyway, I do think node.js will bring a lot to the table in terms of web applications in the near future. Some very bright people are working on frameworks for it and it will be interesting to see what they come up with : ). I'll keep you posted.

Wirtsi  said on Oct 07, 2009:

Hey Felix ... I picked up some ideas we discussed here and stuffed them into a blog post of my own :)

http://blog.mykita.com/2009/10/enqueue-form-uploads-for-better-usability/">http://blog.mykita.com/2009/10/enqueue-form-uploads-for-better-usability/'>http://blog.mykita.com/2009/10/enqueue-form-uploads-for-better-usability/

Let me know what you think ...

Felix Geisendörfer said on Oct 07, 2009:

wirtsi: I can't comment on your blog (wordpress error), so I will here:

Have you tried using a hidden iframe? That also allows you to submit your forms in a non-blocking manner!

Wirtsi  said on Oct 07, 2009:

Stupid mod_security, I fixed that :)

I had a look at the iframes solution ... it's not quite the same though, I wanted a possibility to enqueue multiple form submits from different forms. With iframes I can't see how to do that.

Felix Geisendörfer said on Oct 07, 2009:

wirtsi: true, if that's your use case the local storage idea is pretty neat!

iceeer  said on Oct 09, 2009:

like Lotus Domino 8.5 server javascript

Bakyt Niyazov  said on Nov 24, 2009:

tried to post a message without any html tags and getting: Only simple Html tags like: a, b, i strong, em are allowed.
Please notify me about new comments

Bakyt Niyazov  said on Nov 24, 2009:

(trying again) Thank you Felix! I found Node by your tweet, that is really cool! Btw I think the code must be updated slightly in order to work with the latest Node:

add:

var sys = require('sys');

and then rename print function calling to use sys.print the same for puts


Sveinung  said on Nov 24, 2009:

I really hope CommonJS will be a success.

You need to change

print() to syst.print()

as well.


Roberto Saccon said on Dec 07, 2009:

Felix, I took your great example and tried it to extend, so files get saved at the server, but the uploaded files come out completely wrong. I think this is of general interest and posted my attempt at the google group:

http://groups.google.com/group/nodejs/browse_thread/thread/2643e1b474b85a15

Do you have any recommendation how to approach that ?

junior programmer  said on Dec 15, 2009:

what's your view on node.js vs erlang? A lot of the concepts in node.js seem to be very similar to that of Erlang.

Matt Kukowski said on Dec 17, 2009:

With the proliferation of the Web and Web Apps verse traditional local Apps, using JavaScript on the server is a very natural fit.

Why? Because, if your going to program JavaScript in the browser, translating those skills onto the Server will make for better JavaScript programmers.

Google Chrome OS, HTML5 and all these maturing and powerful Web technologies are going to transform the entire computing industry. The browser will ( and already is ) 'The Kill App'.

So, using JavaScript with V8 should help push the speed limit there as well, regarding the V8 engine.

Also, if there is a way to do it in Python or Ruby, I would probably take the JavaScript route, since mastering it on the Server only allows me to master it in the browser.

As, far as the non-blocking issue goes when speaking to Databases, I'm sure these 'problems' will get ironed out, by making commit style injections into them, or adding a blocking thread for just that connection. I don't know the details.

Anyhow, JavaScript is a very interesting language, being Event based, a clean language and Object Orientated.

It will definetely be interesting to see how far NODE.JS will go.

Felix Geisendörfer said on Dec 17, 2009:

junior programmer: I think node will be able to compete with erlang on common protocols such as HTTP. However, erlang has some really incredible features for dealing with binary streams which we may or may not be able to bring to node directly. We'll see, but its certainly an interesting comparison from multiple points of view (ease of use, performance, etc.)

Vladimir Grichina said on Jan 29, 2010:

While your post shows well how to receive uploaded file, it doesnt show at all how to save it (and it is not really trivial with Node, as event-based I/O is a bit unusual). So I wrote a post about handling uploads and writing to file with Node.js.

I think it should be useful to the readers of this post.

Felix Geisendörfer said on Jan 30, 2010:

Vladimir Grichina: Cool. If you want to make your code even shorter check out the file module:

http://github.com/ry/node/blob/master/lib/file.js

It is currently not documented (the API might change), but it takes care of queuing your posix calls.

HTH,
-- Felix

Vladimir Grichina said on Feb 04, 2010:

Felix, thanks, this is great suggestion.

I think I would try it and write another blog post.

Hendrik  said on Mar 13, 2010:

Hi Felix,
it seems stream.Bytes* is no longer available in the latest node release.

Is there any alternative way to get the current uploaded file's size?

Felix Geisendörfer said on Mar 13, 2010:

Hendrik: yeah, the req.headers['content-length'] has the total size and you have to sum of the chunks yourself as they come in.

Hendrik  said on Mar 13, 2010:

Thanks for the quick reply.
The headers['content-length'] seems to be different from the uploaded file size.

When I upload a file I sum the bytesWritten in a file.write() callback and the number matches the actual file size, but the headers['content-length'] always comes up with a higher value, most likely due to including the boundary and other data as well(?), whereas the previous stream.bytesTotal seems to have been the exact file size as well.

I got the upload working and files are saved properly, but I am still trying to figure out how to get an uploaded file's size in order to calculate the remaining data.

Felix Geisendörfer said on Mar 14, 2010:

Yes, content-length includes the full size of the HTTP message with boundaries, and all.

But no, the previous stream.bytesTotal did not adress that issue specifically. I'd actually advice you to just emit a fake 100% progress event once you hit the end of the multipart stream.

Vladimir Grichina said on Mar 25, 2010:

Vladimir Grichina: Cool. If you want to make your code even shorter check out the file module:

http://github.com/ry/node/blob/master/lib/file.js

Felix, as there is no such code in Node now, and event handling was also revised, what would you suggest to do now?

Felix Geisendörfer said on Mar 26, 2010:

Vladimir Grichina: Use fs.createWriteStream.

This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.