Blog » Debuggable - Node.js Consulting

The biggest CakeFest to be held in Berlin

I will make this short. The 3rd and so far biggest CakeFest will be held in Berlin, home of Debuggable! From July 9-12, people from all over Europe and the rest of the world will travel to Germany in order to celebrate and learn about the best PHP framework there is.

If you have not attended a CakeFest so far, here are some good ideas of what to expect:

The event consists of two parts:

July 9-10: CakePHP Workshop - (Lead by the CakePHP core team)
July 11-12: CakePHP Conference - (Presented by the core team + community)

The 2-day workshop is packed with a series of tutorials designed to give developers, both new and seasoned, a solid understanding in building reliable CakePHP applications. Veterans and experts can skip over stuff they already know and use this time for 1 on 1 sessions with the non-presenting developers. The Workshop + Conference ticket is 599 EUR (499 EUR if you do not attend the conference).

The conference itself is going to be packed with talks, delivered by both the CakePHP core team as well as interested community members. At just 199 EUR (student discounts to be announced soon) there is no excuse for not attending.

I will talk on JavaScript for (Cake)PHP developers as well as share my experience in project management using GitHub + Lighthouse. Tim is still making up his mind, but will probably talk about Advanced Debugging.

The location for the conference is a few streets down in my neighborhood, so be assured that we will have more than enough opportunity to gather & celebrate in the evenings.

If you have any questions, please feel free to post them here or email them to me at felix@debuggable.com.

Otherwise go ahead and sign up, you will not regret it as the conference will also feature some major announcements exclusively made there.

- -Felix Geisendörfer aka the_undefined

Git alias for displaying the GitHub commit url

Posted on 18/3/09 by Felix Geisendörfer

If you often find yourself pointing your team members to commit urls in GitHub, this might be fun for you.

I created git alias called 'hub' that automatically guesses the github repository url for the repository you are currently in:

$ git hub
https://github.com/felixge/my-project

Based on that I created a second alias called 'url', which gives you the url to HEAD commit:

$ git url
https://github.com/felixge/my-project/commit/0bdc57323a1ffec7ffe10bf83147cab5d6838d45

You can however also provide another sha1 you want to link to:

$ git url 22db8914220b717b0954b84365030ae3c9602a17
https://github.com/felixge/my-project/commit/22db8914220b717b0954b84365030ae3c9602a17

If you find those aliases useful, here are my ~/.gitconfig alias definitions for them:

[alias]
  hub =! echo "https://github.com/"`git config remote.origin.url` | sed -E s/[a-z]+@github\.com:// | sed s/\.git$//
  url =!sh -c 'HEAD=`git rev-parse HEAD` && SHA1=`[ "$0" = "sh" ] && echo $HEAD || echo $0` && echo `git hub`"/commit/"${SHA1i}'

(Bash gurus: I am sure you can do the above much more elegantly, wanna give it a try?)

Further I also have this little bash script in ~/bin/tiny:

#!/bin/bash
curl "http://tinyurl.com/api-create.php?url=$1"
echo ""

This allows me to make tiny urls for my links if I need to paste them on long-url unfriendly territory:

$ git url | xargs tiny
http://tinyurl.com/cva44t

Or if you are on OSX, you can use this to open the git url in your browser:

$ git url | xargs open

Now all of the above could probably be done much smarter, but as far as I am concerned it works great ; ).

-- Felix Geisendörfer aka the_undefined

PS: What command line tricks are part of your daily workflow?

Muscles on demand - Clean a large git repository the cloud way

Posted on 13/3/09 by Felix Geisendörfer

Hey folks,

don't you hate it when you sometimes have to stop your work because your dev machine is ultra-busy doing some CPU or I/O heavy operations that will take hours?

Even so it doesn't happen to me a lot, I actually ran into such a case last night while trying to fix the Git repository of a project we are working on. The repository itself was not corrupted, but it became so fat that git-index-pack would explode on many of the team members. How did that happen? Well it turns out that over time some of the image directories of the project were committed into the repository by accident. This ended up being an insane 1.7 GB of '.git' objects.

With SVN, this is when you realize you made a poor choice in versioning control software and it is time to start the repository over - loosing all history.

Not so much wit Git. Git has an excellent tool called git-filter-branch that you can use to rewrite an entire repositories history.

In our case we wanted to pretend app/webroot/files had never existed:

git filter-branch --index-filter 'git rm -r --cached app/webroot/files' -- --all

However, our repository is full of commits (3.5k) and as I mentioned has some incredibly huge and ugly blobs in it. This means the operation above is slow like crazy.

So instead of having my poor laptop tortured with it for an hour, I decided to hire some muscle. I knew the operation I planned to do was fairly CPU and I/O heavy. For that reason I fired up an c1.xlarge Ec2 instance featuring 8 cores and 7 GB RAM.

Launching one of Eric Hammond's excellent Ubuntu EC2 AMI's and installing git took < 5 minutes. Add a few minutes to transfer the 1.7GB repository over and I was ready to go.

Having recently read Kevin's excellent article on tmpfs, I just put the repository in /dev/shm. This simply meant that the repository was now fully stored in memory - 30x faster then HDD!

Even with all this power, the whole process still took 15 minutes to complete, but the result was impressive. Instead of 1.7GB the repository was shrunk down to 80MB and little angels were dancing & singing around it. It was beautiful : ).

I pushed the lean and mean clone up to github using 'git push -f' and then switching over the local clones of each team member was just a matter of:

git checkout -b backup-master
git branch -D master
git fetch
git checkout -b master origin/master

Of course the new master branch and its backup wouldn't be very nice to each other as far as merges are concerned, but cherry picking the most recent commits worked great.

As you can see cloud computing is not only for the application level, but it can also be a great tool for your development process as a whole. After all it is incredible what can be accomplished for $0.80 this way ; ).

-- Felix Geisendörfer aka the_undefined

How to render fixed length rows of items

Posted on 9/3/09 by Felix Geisendörfer

Hey folks,

if you find yourself in a situation where you need to write a template that splits up a list of items in multiple rows with each row having a fixed amount of items, here is a simple solution:

This will put 4 items in each row and a smile on your face for not having to write very ugly code to achieve the same : ).

-- Felix Geisendörfer aka the_undefined

Queues in the cloud - Debuggable PHP SQS Library

Posted on 6/3/09 by Felix Geisendörfer

Hey folks,

the cloud promises an abundance of CPU cycles and huge opportunities for parallell processing. However, before you can tap into this potential, you need to have a good mechanism for distribution. Hashing can help, but there are other options as well. One of them is to use a queue.

So how does it work? Well essentially a queue is a big list where new items are added to the bottom and existing items are read from the top. So lets say you want to encode 10,000 videos using 100 machines. The way a queue can help with this is by having one machine loop over all videos and put a message in the queue for each one of them. This message is a text string (JSON is nice for this) that represents 1 job of converting 1 video. So you end up with a queue that has 10,000 messages in it, each representing one job.

Now you can use EC2 to launch 100 High CPU instances at $0.20 / hour. Each one of these workers now runs a program in infinite loop that queries the queue for job messages. As soon as a message is fetched it becomes invisible to all other workers querying the queue. This is called the visibility timeout and lasts for a number of X seconds definable by you. So during this time the worker who fetched the message is responsible for completing the job. If it does, it sends a request to the queue for deleting the message. If the worker fails (for example because it crashes), the message will appear in the queue again after the timeout, and other workers can process it.

So as you can see there are a lot of advantages to this approach:

The worker machines don't need to be aware of each other
Fault tolerant, no job gets lost because a worker machine fails
Effectiveness, the queue tries to make sure no message gets processed twice
The system itself can never get overloaded and it always works as fast as it can
Queue length can serve as a very nice measurement of load on a given system

Of course all of those advantages depend on the queue service not failing which of course is very difficult to achieve. So in a scenario as the one described above, Amazon's SQS service is a very interesting solution. Why?

Well first of all it is cheap. While testing the service I put 1.7 mio messages in my queue - Amazon charged me $1.74, so its like $1 / mio messages + bandwidth. The next good thing is that its highly scalable. Whether you put 10 or 10 mio. messages in your queue, Amazon says they'll sort it out for you. And last but not least, there are already aws monitoring tools out there to monitor your queues hosted with amazon.

So far so good. There are also things that suck about SQS. First of all, the latency is pretty high. My tests confirmed what Wikipedia and others say about SQS: It takes ~2-10s for a message added to a queue to be available for reading. If you need a very responsive queue, that rules SQS out for you. Also very stupid is the lack of a "flush" function. So while you are developing you have to write your own tool for flushing a queue. And last but not least is the fact that SQS requires your system to be idempotent. This basically means that SQS does not guarantee that a message cannot be fetched by 2 workers at the same time and those be processed twice. Idempotent means that your app needs to be prepared for that and processing anything twice needs to lead to the same result. But of course, SQS tries to avoid this scenario as much as possible.

Anyway, if you come to the conclusion that SQS can solve more problems for you than it creates, it is an amazing service. So how do you use it in your apps? Well I first tried using the PHP library Amazon provides, but I have come to hate it. I mean it is very comprehensive and does the job. But the people who wrote it were clearly Java engineers forced to write PHP. I feel sorry for them.

For a long time my searches for an alternative came up empty, but last night I discovered at least 2 viable options. The first is called php-aws and provides very clean, easy to use classes for S3, SQS, EC2 and AWIS. From their project page I also found another project called Tarzan. PHP-AWS recommends Tarzan as a super-robust and comprehensive alternative. And from my analysis of it, it indeed looks like a very mature project and I encourage everybody to check it out.

Well - too bad for me that I was already way into implementing my own class last night when discovering those two options : ). But nevertheless I am very proud to present the Debuggable.com SQS PHP5 Library. Besides a very easy to use and intuitive interface, it features the following attributes:

Exponential backoff on failures and retry maximum
Uses CURL for reliable HTTP communication
It is completely unit tested, which was an interesting challenge

The lib itself is as simple as they come:

$queue = new SqsQueue('my_queue', array('key' => '...', 'secretKey' => '...'));
$queue->sendMessage(array('autoJsonSerialize' => 'is fantastic'));

$lookMaIAmUsingSPL = count($queue);

I do not recommend anybody to use the library for production purposes right now, but if you want to get started with SQS or study the implementation I think you have found an excellent library. Writing the library certainly has been a great experience and opportunity for me to study SQS in detail.

Anyway, enough said. Back to my cloud bed.

-- Felix Geisendörfer aka the_undefined

debuggable

The biggest CakeFest to be held in Berlin

Git alias for displaying the GitHub commit url

Muscles on demand - Clean a large git repository the cloud way

How to render fixed length rows of items

Queues in the cloud - Debuggable PHP SQS Library

RSS Feeds

Recent Posts

Archive

Recent Comments

Keep an eye on