String substitution using UUIDs

Hey folks,

this is post #2 of my 30 day challenge. If you've ever written any non-trivial String processing code, you've probably ran into the situation where you wanted to exclude certain parts of your string for a certain operation. Usually that would mean you have to tokenize your string, or adjust the operation you want to run so it doesn't affect the part of the string you want to exclude from it. Both of those solutions can be fairly time intensive so I was looking for a shortcut and found one.

Let me make a practical example so this sounds less like hypothetical BS: Let's say you have written your own Textile-inspired parser, but you want to allow sections to remain untouched by the parser as well. My solution: Just substitute the parts you want to exclude from textile processing with UUIDs and put them back in afterwards:

class Common{
static function substitue($regex, $text) {
preg_match_all($regex, $text, $matches);
if (empty($matches[0])) {
return array($text, array(), array());
}
$r = array();
foreach ($matches[0] as $match) {
$uuid = String::uuid();
$r[1][] = $uuid;
$r[2][] = $match;
$text = r($match, $uuid, $text);
}
$r[0] = $text;
return $r;
}
}

$html = 'This *is* my sample [literal]text for *this* post[/literal]!';
list($html, $uuids, $values) = Common::substitute('/\[literal\].+\[\/literal]/sU');
// $html is now: 'This *is* my sample 8ad0fe9-ec3c-4b0d-b314-7a111030b5da!';
$html = Textile::parse($html);
// $html is now: 'This <strong>is</strong> my sample 8ad0fe9-ec3c-4b0d-b314-7a111030b5da!';
$html = str_replace($uuids, $values, $html);
// $html is now: 'This <strong>is</strong> my sample [literal]text for *this* post[/literal]!';

I actually found having this little substitution method useful a few times in the past so I thought I share it ; ). Suggestions for improvements and other approaches are more then welcome!

-- Felix Geisendörfer aka the_undefined

PS: I have decided to allow myself to write my posts for the weekend the night before and have them auto-published. Otherwise my girlfriend is going to kill me ; ).

&nsbp;

You can skip to the end and add a comment.

oli said on Aug 21, 2008:

du hast sicher schon alle 30 vorgeschrieben :-)

Tim Koschützki said on Aug 21, 2008:

"du hast sicher schon alle 30 vorgeschrieben :-)"

English translation: You sure wrote every of the 30 [posts] in advance.

No he hasn't oli. ; ) I just got up at 8:30am today and the first thing that Felix did is bug me about reviewing his post. After all we ended up posting it 15secs before the 09:00am deadline. Hehehe!

Christoph Tavan said on Aug 21, 2008:

Nice trick! Just one thing: Am I wrong or should it rather be Common::substitute('/\[literal\].+\[\/literal]/sU'); ??

Fabio said on Aug 21, 2008:

$html = 'This *is* my sample [literal]text for *this* post[/literal]!';
$regex = '/\[literal\].+\[\/literal]/sU';

Simple & fast :

preg_match_all($regex, $html, $matches);
$html = preg_replace($regex, '%s', $html);

$html = Textile::parse($html);

$html = vsprintf($html, $matches[0]);

Fabio said on Aug 21, 2008:

Simple & fast :

< ?php

$html = 'This *is* my sample [literal]text for *this* post[/literal]!';
$regex = '/\[literal\].+\[\/literal]/sU';

preg_match_all($regex, $html, $matches);
$html = preg_replace($regex, '%s', $html);

$html = Textile::parse($html);

$html = vsprintf($html, $matches[0]);

Anum said on Aug 21, 2008:

@Fabio, you read my mind! :)

Dardo Sordi said on Aug 23, 2008:

@Fabio, you may want to replace % with %% before doing the preg_replace()

Felix Geisendörfer said on Aug 25, 2008:

Fabio: Excellent idea. My only issue with it is that it is somewhat possible user-received content will contain sprintf instructions which makes uuids better suited as placeholders.

Christoph Tavan: Shit, that's what you get for last minute changes ; ). Fixed now.

Dardo Sordi said on Aug 26, 2008:

Felix, if you apply Fabio's solution and replace % with %% before doing the preg_replace() then you shouldn't have any problems with occurrences of % in the users content.

This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.

debuggable