Contact Us

XPath on PHP Arrays (Set::extract)

Posted on 12/9/08 by Felix Geisendörfer

Hey folks,

this has been something I wanted to write about for quite a while. Back when I was writing the post on the original Set::extract method by nate the idea of doing one supporting XPath came up.

One of the requirements was that the new method would need to be faster or at least as fast as the old implementation. My first attempts were big failures. Not only did the solutions I came up with contain tons of bugs. No, they were are also a lot slower the old extract function.

A few benchmarks later and I discovered the biggest bottleneck in my implementation: Recursiveness.

If you need a function to have the highest performance, try to express it non-recursively. It can make a 500% difference. So that is what I did. After lots of trial and error I was able to find an algorithm that would resolve XPath expressions without the use of recursion. The only exception is traversal via the '..' token which has memory / simplicity reasons.

If you are curious feel free to check out the XPath implementation in the CakePHP core.

Anyway, if you wonder what this whole thing can do for you, I recommend checking out the doc block:

Currently implemented selectors:

/User/id (similar to the classic {n}
/User[2]/name (selects the name of the second User)
/User[id>2] (selects all Users with an id > 2)
/User[id>2][<5] (selects all Users with an id > 2 but < 5)
/Post/Comment[author_name=john]/../name (Selects the name of all Posts that have at least one Comment written by john)
/Posts[name] (Selects all Posts that have a 'name' key)
/Comment/.[1] (Selects the contents of the first comment)
/Comment/.[:last] (Selects the last comment)
/Comment/.[:first] (Selects the first comment)
/Comment[text=/cakephp/i] (Selects the all comments that have a text matching the regex /cakephp/i)
/Comment/@* (Selects the all key names of all comments)

Usage is as simple as:

$users = $this->User->find('all', array('contain' => 'Comment'));
$bakers = Set::extract('/User/Comment[text=/cakephp/i]/..', $users);

While the implementation does not support full XPath (and probably won't in future), feel free to make suggestions on additional selectors or the idea in general.

-- Felix Geisendörfer aka the_undefined


You can skip to the end and add a comment.

Peter Robinett said on Sep 12, 2008:

Thanks, Felix, I'm already making use of it! For people new to it, as the doc block shows, the XPath implementation can do a lot more advanced searches than the traditional form of Set::extract, so I highly recommend switch. Now I just want to see the Xpath support added to Set::insert! =)

Christoph Dorn said on Sep 14, 2008:

Hi Felix. Great work on providing a great function to extract data from arrays. Have you heard of JSONPath? I wrote a little-follow up on my blog which has more info on it as well as some benchmarking numbers for the two libraries.

Alan Blount said on Sep 24, 2008:

I got a demo of this in action at the 08 workshop in NC and I can tell you that this really is powerful. Especially because you can combine it with Set::combine().


Matt Huggins said on Jan 27, 2009:

Let's say I have an XML object formatted as such:

<!-- ... -->











<!-- ... -->


I want to to a Set::extract to get all the user id's. I can't figure out what XPath to use though. Instinctively, I'd think something like this should work, but it doesn't seem to be doing the job:

$ids = Set::extract('/users/user/user_id', $xml);

This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.