Title to url slug conversion
Posted on 19/10/06 by Felix Geisendörfer
 
		Deprecated post
The authors of this post have marked it as deprecated. This means the information displayed is most likely outdated, inaccurate, boring or a combination of all three.
Policy: We never delete deprecated posts, but they are not listed in our categories or show up in the search anymore.
Comments: You can continue to leave comments on this post, but please consult Google or our search first if you want to get an answer ; ).
As those of you who run a WordPress install probably already know, WP has a nice feature that converts the title of any post one writes into a more url friendly version, a so called post slug. The method it uses is pretty simple: lowercase everything, replace whitespaces with hyphens and convert non url friendly characters into ones that are. Now as I already mentioned in a post a while back, I'm using pretty url's that are RESTful these days. So in the early phase of the app that I'm finishing up right now, I simply had a field called URL Slug where I had to enter this url suffix manually. But since neither I, nor the client this app will ship to are into filling in this field all the times, I made it optional and created a WP-like function for creating the url slug from the title if the field was left blank by the user.
After talking to nate I decided to throw this function into a CommonComponent. It's not scrictly OOP to do so, but I had felt the need for having a namespace within CakePHP for stand-alone functions since a while, and this seemed like a reasonable candidate to go in there:
{
function stringToUrlSlug($string)
{
$unPretty = array('/ä/', '/ö/', '/ü/', '/Ä/', '/Ö/', '/Ü/', '/ß/', '/\s?-\s?/', '/\s?_\s?/', '/\s?\/\s?/', '/\s?\\\s?/', '/\s/', '/"/', '/\'/');
$pretty = array('ae', 'oe', 'ue', 'Ae', 'Oe', 'Ue', 'ss', '-', '-', '-', '-', '-', '', '');
return low(preg_replace($unPretty, $pretty, $string));
}
}
And here comes a usage example from my application:
{
if (empty($this->data['Page']['url_suffix']) && !empty($this->data['Page']['title']))
$this->data['Page']['url_suffix'] = $this->Common->stringToUrlSlug($this->data['Page']['title']);
}
Well one thing I have to mention is that this function is pretty German-biased and does not contain a complete list of possible replacements. I'm sure there are characters in other languages such as French that are not URL suitable as well and can be replaced with standard latin ones too, but I'm not expert on this topic. So if you have things to add, feel free to do so.
Oh and while I'm already talking about url's in CakePHP, here is another little pattern I adopted for my RESTful Url's:
define('PAGE_PUBLISHED', 1);
class Page extends AppModel
{
var $name = 'Page';
var $validate = array('title' => VALID_NOT_EMPTY,
'url_suffix' => VALID_NOT_EMPTY,
'text' => VALID_NOT_EMPTY);
function getUrl($page = null)
{
if (empty($page))
$page = $this->data;
return '/page/'.$page['Page']['id'].':'.$page['Page']['url_suffix'];
}
}
As you can see I added a getUrl() function to my page Model. This can than statically be used from within the view like Page::getUrl($page); in order to create a url for HtmlHelper::link(). Now you could argue that the url logic has nothing to do with the page itself. That's correct, but because every Page has the field url_suffix in it, it becomes part of the Model and therefor I think it's ethically correct to let the Model handle the Url generation ; ).
Alright, I hope this helps some folks out there, and I'd be happy to hear your thoughts on the technics I use.
--Felix Geisendörfer aka the_undefined
You can skip to the end and add a comment.
why didn't you use urlencode ?
ok it s less friendly but it sn t "German-biased"
ben: The url slug does not contain all information required to convert it back into the original title, that's correct. But as I mentioned early, my url's look like /page/5:my-pretty-title, so I already have the id of the database entry, and the url slug only serves as SE food. I really do believe that you should not use titles as secondary primary keys, it's a bad idea.
wluigi: I thought about using urlencode, and if it helps you feel free to add it. However, I discovered that urlencode would replace characters that don't get replaced by the browser when following a link containing them, and the reason I wrote this function was to not have %21 and similiar codes in my url, so I decided to not use urlencode at all.
I've borrowed your implementation above for my app. In this case it is for matching attribute names, where people may mess up case, punctuation and whitespace. Seems to work well for the moment, except I'm thinking it might be worth removing all whitespace rather than reducing and converting to hyphens, as our users may miss out spaces altogether.
/////////////////
	function title_slug( $title )
	{
		$slug = $title;
		$bad = array(	'Š','Ž','š','ž','Ÿ','À','Á','Â','Ã','Ä','Å','Ç','È','É','Ê','Ë','Ì','Í','Î','Ï','Ñ',
					'Ò','Ó','Ô','Õ','Ö','Ø','Ù','Ú','Û','Ü','Ý','à','á','â','ã','ä','å','ç','è','é','ê',
					'ë','ì','í','î','ï','ñ','ò','ó','ô','õ','ö','ø','ù','ú','û','ü','ý','ÿ',
					'Þ','þ','Ð','ð','ß','Œ','œ','Æ','æ','µ',
					'"',"'",'“','”',"\n","\r",'_');
		$good = array(	'S','Z','s','z','Y','A','A','A','A','A','A','C','E','E','E','E','I','I','I','I','N',
					'O','O','O','O','O','O','U','U','U','U','Y','a','a','a','a','a','a','c','e','e','e',
					'e','i','i','i','i','n','o','o','o','o','o','o','u','u','u','u','y','y',
					'TH','th','DH','dh','ss','OE','oe','AE','ae','u',
					'','','','','','','-');
		// replace strange characters with alphanumeric equivalents
		$slug = str_replace( $bad, $good, $slug );
$slug = trim($slug);
		// remove any duplicate whitespace, and ensure all characters are alphanumeric
		$bad_reg = array('/\s+/','/[^A-Za-z0-9\-]/');
		$good_reg   = array('-','');
		$slug = preg_replace($bad_reg, $good_reg, $slug);
		// and lowercase
		$slug = strtolower($slug);
		return $slug;
	}
Doesn't work for me. Something with my encoding seems to be wrong, but I checked everything twice now: utf-8 is set :( Nevertheless umlauts etc. are replaced with strange characters or not replaced at all because they're handled as the strange characters that appear, don't know. Grant Cox solution doesn't insert such chars but ö gets a instead of o :S
I am having the same problem.. when i am trying it in normal php file it works ok, but in cakephp when it saves to the db it saves as a a.
it does not work beacuse you are using UTF8 characters. Try to convert them to ISO-8859-1 with   utf8_decode.  (ie $slug = utf8_decode($title);
 )
and it´ll work.
I had the same problem as Chris and Christian and tried the solution suggested by ignacio, but still doesn't work.
I solved simply by adding the HTML tag
meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
this way encoding is correct and the script works fine.
hope this helps,
cheers
/n.
This post is too old. We do not allow comments here anymore in order to fight spam. If you have real feedback or questions for the post, please contact us.
 
		 
		 
		 
	 
		 
  
  
	
Would be cool to have a reverse function to convert back from a slug.. But that might require a different approach to the slug making function in order to make it reversible.