Here's some handy-dandy code that will properly escape a string that includes regex special characters.
<?php
$x = "
/\\[|\\\\|\\^|\\$|\\.|\\||\\?
|\\*|\\+|\\(|\\)|\\{|\\}/
";
$escaped_string = preg_replace($x, "\\\\$0", $in_string);
?>
This is useful if you are generating dynamic regular expressions based on user input. With the above example, the $escaped_string variable is now ready to be used in another regular expression as the search variable.
preg_replace
(PHP 4, PHP 5)
preg_replace — Rechercher et remplacer par expression rationnelle standard
Description
Analyse subject pour trouver l'expression rationnelle pattern et remplace les résultats par replacement .
Liste de paramètres
- pattern
-
Le masque à chercher. Il peut être une chaîne ou un tableau de chaînes.
e force preg_replace() à traiter replacement comme du code PHP une fois que les substitutions adéquates ont été faites. Conseil : assurez-vous que replacement est un code PHP valide, car sinon, PHP trouvera une erreur d'analyse dans la ligne contenant la fonction preg_replace().
- replacement
-
La chaîne ou un tableau de chaînes pour le remplacement. Si ce paramètre est une chaîne et le paramètre pattern est un tableau, tous les masques seront remplacés par cette chaîne. Si les paramètres pattern et replacement sont des tableaux, chaque pattern sera remplacé son replacement associé. Si replacement à moins d'éléments que pattern , alors une chaîne vide est utilisée pour le reste des valeurs.
replacement peut contenir des références de la forme \\n ou, (depuis PHP 4.0.4) $n. Cette dernière forme est recommandée. Ces références seront remplacées par le texte capturé par la n-ième parenthèse capturante du masque. n peut prendre des valeurs de 0 à 99, et \\0 ou $0, correspondent au texte de qui satisfait le masque complet. Les parenthèses ouvrantes sont comptées de gauche à droite (en commençant à 1) pour déterminer le numéro de parenthèse capturante.
Lorsque vous travaillez avec un masque de remplacement où une référence arrière est directement suivie par un nombre (i.e.: placer un nombre littéral immédiatement après une référence arrière), vous ne pouvez pas utiliser la syntaxe classique \\1 pour la référence arrière. \\11, par exemple, sera confus pour la fonction preg_replace() dans le sens où elle ne saura pas si vous désirez la référence arrière \\1 suivi du nombre 1 ou si vous désirez la référence arrière \\11 suivi de "rien". Dans ce cas, la solution est d'utiliser la syntaxe \${1}1. Cela créera une référence arrière isolée $1, suivi du nombre littéral 1.
Lorsque vous utilisez l'option e, cette fonction échappe quelques caractères (', ", \ et NULL) dans la chaîne qui remplace les références arrières. Ce comportement se justifie afin d'assurer qu'aucune erreur de syntaxe ne survient lors de l'utilisation des références arrières avec des guillemets simples et doubles (e.g. 'strlen(\'$1\')+strlen("$2")'). Assurez-vous d'être familier avec la syntaxe des chaînes afin de savoir exactement à quoi la chaîne interprétée doit ressembler.
- subject
-
La chaîne ou le tableau contenant des chaînes à chercher et à remplacer.
Si subject est un tableau, alors l'opération sera appliquée à chacun des éléments du tableau, et le tableau sera retourné.
- limit
-
Le nombre maximal de remplacement pour chaque masque dans chaque chaîne subject . Par défaut, vaut -1 (aucune limite).
- count
-
Si fournie, cette variable contiendra le nombre de remplacements effectués.
Valeurs de retour
preg_replace() retourne un tableau si le paramètre subject est un tableau, ou une chaîne sinon.
Si des correspondances sont trouvées, le nouveau subject sera retourné, sinon subject sera retourné à l'identique, ou NULL si une erreur survient.
Historique
| Version | Description |
|---|---|
| 5.1.0 | Ajout du paramètre count |
| 4.0.4 | Ajout de la forme '$n' pour le paramètre replacement |
| 4.0.1 | Ajout du paramètre limit |
Exemples
Exemple #1 Utilisation des références arrières avec des littéraux numériques
<?php
$string = 'April 15, 2003';
$pattern = '/(\w+) (\d+), (\d+)/i';
$replacement = '${1}1,$3';
echo preg_replace($pattern, $replacement, $string);
?>
L'exemple ci-dessus va afficher :
April1,2003
Exemple #2 Utilisation de tableaux indexé avec preg_replace()
<?php
$string = 'Le renard marron agile saute par dessus le chien paresseux.';
$patterns[0] = '/agile/';
$patterns[1] = '/marron/';
$patterns[2] = '/renard/';
$replacements[2] = 'grizzly';
$replacements[1] = 'brun';
$replacements[0] = 'lent';
echo preg_replace($patterns, $replacements, $string);
?>
L'exemple ci-dessus va afficher :
Le grizzly brun lent saute par dessus le chien paresseux.
En triant les masques et les remplacements, vous devriez obtenir le résultat escompté.
<?php
ksort($patterns);
ksort($replacements);
echo preg_replace($patterns, $replacements, $string);
?>
L'exemple ci-dessus va afficher :
Le lent grizzly brun saute par dessus le chien paresseux.
Exemple #3 Remplacement de plusieurs valeurs simultanément
<?php
$patterns = array ('/(19|20)(\d{2})-(\d{1,2})-(\d{1,2})/',
'/^\s*{(\w+)}\s*=/');
$replace = array ('\3/\4/\1\2', '$\1 =');
echo preg_replace($patterns, $replace, '{startDate} = 1999-5-27');
?>
L'exemple ci-dessus va afficher :
$startDate = 5/27/1999
Exemple #4 Utilisation de l'option 'e'
<?php
preg_replace("/(<\/?)(\w+)([^>]*>)/e",
"'\\1'.strtoupper('\\2').'\\3'",
$html_body);
?>
Cela va mettre en majuscule toutes les balises HTML du texte.
Exemple #5 Suppression des espaces
Cet exemple supprime les espaces en trop dans une chaîne.
<?php
$str = 'foo o';
$str = preg_replace('/\s\s+/', ' ', $str);
// Affichera 'foo o'
echo $str;
?>
Exemple #6 Utilisation du paramètre count
<?php
$count = 0;
echo preg_replace(array('/\d/', '/\s/'), '*', 'xp 4 to', -1 , $count);
echo $count; //3
?>
L'exemple ci-dessus va afficher :
xp***to 3
Notes
Note: Lorsque vous utilisez des tableaux avec les paramètres pattern et replacement , les clés sont traitées dans l'ordre dans lequel elles apparaissent dans le tableau. Ce n'est pas forcément la même chose que l'ordre des index numériques. Si vous utilisez des index pour identifier quel pattern doit être replacé par quel replacement , il est recommandé de faire un tri ksort() sur chaque tableau avant de faire appel à preg_replace().
preg_replace
25-Jun-2008 11:15
21-Jun-2008 08:09
A simple BB like thing..
function AddBB($var) {
$search = array(
'/\[b\](.*?)\[\/b\]/is',
'/\[i\](.*?)\[\/i\]/is',
'/\[u\](.*?)\[\/u\]/is',
'/\[img\](.*?)\[\/img\]/is',
'/\[url\](.*?)\[\/url\]/is',
'/\[url\=(.*?)\](.*?)\[\/url\]/is'
);
$replace = array(
'<strong>$1</strong>',
'<em>$1</em>',
'<u>$1</u>',
'<img src="$1" />',
'<a href="$1">$1</a>',
'<a href="$1">$2</a>'
);
$var = preg_replace ($search, $replace, $var);
return $var;
}
<!-- -->
Let me know of any error(s) :)
17-Apr-2008 12:35
For filename tidying I prefer to only ALLOW certain characters rather than converting particular ones that we want to exclude. To this end I use ...
<?php
$allowed = "/[^a-z0-9\\040\\.\\-\\_\\\\]/i";
preg_replace($allowed,"",$str));
?>
Allows letters a-z, digits, space (\\040), hyphen (\\-), underscore (\\_) and backslash (\\\\), everything else is removed from the string.
25-Mar-2008 03:45
This is in response to iasmin at amazingdiscoveries dot org's URL text to link function. Hope this is helpful to someone.
I played with it a bit and came up with this version (there were one or two little errors in the regex I think, also -- it didn't allow various necessary characters).
I start with a URL in brackets (this works for my case):
[http://www.site.com/path/that/may/be_long.php?fun=1]
It returns a link of the URL after the "http://":
www.site.com/path/that/may/b...
-----------
// Cuts off long URLs at $url_length, and appends "..."
function reduceurl($url, $url_length) {
$reduced_url = substr($url, 0, $url_length);
if (strlen($url) > $url_length) $reduced_url .= '...';
return $reduced_url;
}
// Makes URLs with brackets into links
// The regex searches for "http://" or equivalent, then various character possibilities (I don't know if it might be possible to exploit this if more characters were allowed). The "e" after the regex allows the reduceurl() to be evaluated.
function url2link($linktext) {
$linktext = preg_replace("#\[(([a-zA-Z]+://)([a-zA-Z0-9?&%.;:/=+_-]*))\]#e", "'<a href=\"$1\" target=\"_blank\">' . reduceurl(\"$3\", 30) . '</a>'", $linktext);
return $linktext;
}
29-Feb-2008 09:02
Below is a function for converting Hebrew final characters to their
normal equivelants should they appear in the middle of a word.
The /b argument does not treat Hebrew letters as part of a word,
so I had to work around that limitation.
<?php
$text="עברית מבולגנת";
function hebrewNotWordEndSwitch ($from, $to, $text) {
$text=
preg_replace('/'.$from.'([א-ת])/u','$2'.$to.'$1',$text);
return $text;
}
do {
$text_before=$text;
$text=hebrewNotWordEndSwitch("ך","כ",$text);
$text=hebrewNotWordEndSwitch("ם","מ",$text);
$text=hebrewNotWordEndSwitch("ן","נ",$text);
$text=hebrewNotWordEndSwitch("ף","פ",$text);
$text=hebrewNotWordEndSwitch("ץ","צ",$text);
} while ( $text_before!=$text );
print $text; // עברית מסודרת!
?>
The do-while is necessary for multiple instances of letters, such
as "אנני" which would start off as "אןןי". Note that there's still the
problem of acronyms with gershiim but that's not a difficult one
to solve. The code is in use at http://gibberish.co.il which you can
use to translate wrongly-encoded Hebrew, transliterize, and some
other Hebrew-related functions.
To ensure that there will be no regular characters at the end of a
word, just convert all regular characters to their final forms, then
run this function. Enjoy!
15-Jan-2008 10:53
Jacob Fogg's clean_filename function is good, but there is a typo. Replace "\\x00-\\x40" with "\\x00-\\x20" or you will exclude too many characters.
Also keep in mind when checking file names to look for the special directory names ".." and ".". A user could potentially use those to reach an unexpected directory.
14-Jan-2008 10:29
Here is my attempt at cleaning up a file name... it's similar to what someone else has done however a little cleaner with the addition of the | in the reserved characters... also I clean any characters from x00 to x40 (all non display characters and space) as well as everything greater than 7f and greater (removes the Del character and other non English characters), replacing them with an '_'.
function clean_filename($filename){//function to clean a filename string so it is a valid filename
$reserved = preg_quote('\/:*?"<>|', '/');//characters that are illegal on any of the 3 major OS's
//replaces all characters up through space and all past ~ along with the above reserved characters
return preg_replace("/([\\x00-\\x40\\x7f-\\xff{$reserved}])/e", "_", $filename);
}
11-Dec-2007 05:17
Actually I made a mistake in my previous post. In order to make the function more effective ... I broke it. The original which works (really) looks like this:
<?php
function repl_amp($text)
{
$text=preg_replace("/&(?!amp;)/i", "&", $text);
$text=preg_replace("/&#(\d+);/i", "&#$1;", $text); // For numeric entities
$text=preg_replace("/&(\w+);/i", "&$1;", $text); // For literal entities
return $text;
}
?>
The RegEx Tester says that the first expression is OK, but when testing with various entities, some of them came out broken. I'd tried to use only 2 preg_replace(); calls instead of three by using the alternative branch from the pattern syntax - which didn't came out well. Sorry for the previous error, and I still hope that someone can find a better alternative.
07-Dec-2007 07:28
Hi,
as I wasn't able to find another way to do this, I wrote a function converting any UTF-8 string into a correct NTFS filename (see http://en.wikipedia.org/wiki/Filename).
<?php
function strToNTFSFilename($string)
{
$reserved = preg_quote('\/:*?"<>', '/');
return preg_replace("/([\\x00-\\x1f{$forbidden}])/e", "_", $string);
}
?>
It converts all control characters and filename characters which are reserved by Windows ('\/:*?"<>') into an underscore.
This way you can safely create an NTFS filename out of any UTF-8 string.
30-Nov-2007 09:26
This code is much easier than preg_replaces current implementation, I stole some of this from someone else here, but to make it more explicit:
$relation['/pattern/'] = "replacement";
$text_out = preg_replace(array_keys($relation), array_values($relation), $text_in);
Fast, efficient, no guess work..
23-Oct-2007 08:35
@giel dot berkers
Use the 'PCRE_DOTALL' ('s') option so that the '.' covers newline characters:
$code = preg_replace('/\/\*.*\*\//ms', '', $code);
18-Oct-2007 05:49
Hi.
Not sure if this will be a great help to anyone out there, but thought i'd post just in case.
I was having an Issue with a project that relied on $_SERVER['REQUEST_URI']. Obviously this wasn't working on IIS.
(i am using mod_rewrite in apache to call up pages from a database and IIS doesn't set REQUEST_URI). So i knocked up this simple little preg_replace to use the query string set by IIS when redirecting to a PHP error page.
<?
//My little IIS hack :)
if(!isset($_SERVER['REQUEST_URI'])){
$_SERVER['REQUEST_URI'] = preg_replace( '/404;([a-zA-Z]+:\/\/)(.*?)\//i', "/" , $_SERVER['QUERY_STRING'] );
}
?>
Hope this helps someone else out there trying to do the same thing :)
If anyone finds a better way, please let met know, I'm still learning ;)
28-Sep-2007 05:52
at below post:
<?php
$template = "Price: #price#";
$price = '$5';
print "Price: $price\n";
$res = preg_replace("/#price#/", $price, $template);
print "From template: -> $res\n";
?>
20-Sep-2007 03:52
@ Santosh Patnaik
The perl regular expression engine will handle this expression better and much faster by using the word boundry escape code \b. Though it may not be obvious except to long time perl geeks such as I :)
so:
// Expect and get 'pa pa pa pa'
echo preg_replace('`\bma\b`', 'pa', 'ma ma ma ma');
Jeff
14-Sep-2007 08:11
Once a match is identified, the regular expression engine appears to set aside the matching segment of the target string. A second segment that you expect to match may therefore end up not getting matched:
// Expect 'pa pa pa pa' but get 'pa ma pa ma'
echo preg_replace('`(^|\s)ma(\s|$)`', '$1pa$2', 'ma ma ma ma');
Here the issue can be solved by using a 'lookahead':
// Expect and get 'pa pa pa pa'
echo preg_replace('`(^|\s)ma(?=\s|$)`', '$1pa', 'ma ma ma ma');
13-Sep-2007 04:37
I thought that someone could use this hyperlink function.
preg_replace is about 6 times faster than ereg_replace. I took the original example from the ereg_replace function page and modified so that it works perfect. I gave a comment of what it matches.
One thing is that I added a space at the beginning so that only links that don't have <a href="" around them or anything else touching will be replaced.
<i>NOTE! I had to break the long lines otherwise I couldn't have posted this. So take the new line out and it will work</i>
<?php
function hyperlink(&$text)
{
// match protocol://address/path/file.extension?some=variable&another=asf%
$text = preg_replace("/\s(([a-zA-Z]+:\/\/)([a-z][a-z0-9_\..-]*
[a-z]{2,6})([a-zA-Z0-9\/*-?&%]*))\s/i", " <a href=\"$1\">$3</a> ", $text);
// match www.something.domain/path/file.extension?some=variable&another=asf%
$text = preg_replace("/\s(www\.([a-z][a-z0-9_\..-]*
[a-z]{2,6})([a-zA-Z0-9\/*-?&%]*))\s/i", " <a href=\"http://$1\">$2</a> ", $text);
return $text;
}
?>
Play around with it and see how it works.
Courtesy of AmazingDiscoveries.org
God bless, Iasmin Balaj
24-Aug-2007 12:10
From what I can see, the problem is, that if you go straight and substitute all 'A's wit 'T's you can't tell for sure which 'T's to substitute with 'A's afterwards. This can be for instance solved by simply replacing all 'A's by another character (for instance '_' or whatever you like), then replacing all 'T's by 'A's, and then replacing all '_'s (or whatever character you chose) by 'A's:
$dna = "AGTCTGCCCTAG";
echo str_replace(array("A","G","C","T","_","-"), array("_","-","G","A","T","C"), $dna); //output will be TCAGACGGGATC
Although I don't know how transliteration in perl works (though I remember that is kind of similar to the UNIX command "tr") I would suggest following function for "switching" single chars:
function switch_chars($subject,$switch_table,$unused_char="_") {
foreach ( $switch_table as $_1 => $_2 ) {
$subject = str_replace($_1,$unused_char,$subject);
$subject = str_replace($_2,$_1,$subject);
$subject = str_replace($unused_char,$_2,$subject);
}
return $subject;
}
echo switch_chars("AGTCTGCCCTAG", array("A"=>"T","G"=>"C")); //output will be TCAGACGGGATC
21-Aug-2007 10:48
Also worth noting is that you can use array_keys()/array_values() with preg_replace like:
$subs = array(
'/\[b\](.+)\[\/b\]/Ui' => '<strong>$1</strong>',
'/_(.+)_/Ui' => '<em>$1</em>'
...
...
);
$raw_text = '[b]this is bold[/b] and this is _italic!_';
$bb_text = preg_replace(array_keys($subs), array_values($subs), $raw_text);
25-Jul-2007 10:15
I got problem echoing text that contains double-quotes into a text field. As it confuses value option. I use this function below to match and replace each pair of them by smart quotes. The last one will be replaced by a hyphen(-).
It works for me.
function smart_quotes($text) {
$pattern = '/"((.)*?)"/i';
$text = preg_replace($pattern,"“\\1”",stripslashes($text));
$text = str_replace("\"","-",$text);
$text = addslashes($text);
return $text;
}
17-Jul-2007 01:37
Based on previous comment, i suggest
( this function already exist in php 6 )
function unicode_decode($str){
return preg_replace(
'#\\\u([0-9a-f]{4})#e',
"unicode_value('\\1')",
$str);
}
function unicode_value($code) {
$value=hexdec($code);
if($value<0x0080)
return chr($value);
elseif($value<0x0800)
return chr((($value&0x07c0)>>6)|0xc0)
.chr(($value&0x3f)|0x80);
else
return chr((($value&0xf000)>>12)|0xe0)
.chr((($value&0x0fc0)>>6)|0x80)
.chr(($value&0x3f)|0x80);
}
21-Mar-2007 06:47
Be aware that when using the "/u" modifier, if your input text contains any bad UTF-8 code sequences, then preg_replace will return an empty string, regardless of whether there were any matches.
This is due to the PCRE library returning an error code if the string contains bad UTF-8.
07-Mar-2007 05:30
I could not find a function to unescape javascript unicode escapes anywhere (e.g., "\u003c"=>"<").
<?php
function js_uni_decode($s) {
return preg_replace('/\\\u([0-9a-f]{4})/ie', "chr(hexdec('\\1'))", $s);
}
echo js_uni_decode("\u003c");
?>
07-Feb-2007 08:09
Note that it is in most cases much more efficient to use preg_replace_callback(), with a named function or an anonymous function created with create_function(), instead of the /e modifier. When preg_replace() is called with the /e modifier, the interpreter must parse the replacement string into PHP code once for every replacement made, while preg_replace_callback() uses a function that only needs to be parsed once.
07-Sep-2006 11:21
Wasted several hours because of this:
$str='It's a string with HTML entities';
preg_replace('~&#(\d+);~e', 'code2utf($1)', $str);
This code must convert numeric html entities to utf8. And it does with a little exception. It treats wrong codes starting with �
The reason is that code2utf will be called with leading zero, exactly what the pattern matches - code2utf(039).
And it does matter! PHP treats 039 as octal number.
Try print(011);
Solution:
preg_replace('~�*(\d+);~e', 'code2utf($1)', $str);
21-Apr-2006 02:15
For those of you that have ever had the problem where clients paste text from msword into a CMS, where word has placed all those fancy quotes throughout the text, breaking the XHTML validator... I have created a nice regular expression, that replaces ALL high UTF-8 characters with HTML entities, such as ’.
Note that most user examples on php.net I have read, only replace selected characters, such as single and double quotes. This replaces all high characters, including greek characters, arabian characters, smilies, whatever.
It took me ages to get it just downto two regular expressions, but it handles all high level characters properly.
$text = preg_replace('/([\xc0-\xdf].)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'", $text);
$text = preg_replace('/([\xe0-\xef]..)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'", $text);
18-Oct-2004 10:39
It is useful to note that the 'limit' parameter, when used with 'pattern' and 'replace' which are arrays, applies to each individual pattern in the patterns array, and not the entire array.
<?php
$pattern = array('/one/', '/two/');
$replace = array('uno', 'dos');
$subject = "test one, one two, one two three";
echo preg_replace($pattern, $replace, $subject, 1);
?>
If limit were applied to the whole array (which it isn't), it would return:
test uno, one two, one two three
However, in reality this will actually return:
test uno, one dos, one two three
08-Feb-2004 06:45
People using the /e modifier with preg_replace should be aware of the following weird behaviour. It is not a bug per se, but can cause bugs if you don't know it's there.
The example in the docs for /e suffers from this mistake in fact.
With /e, the replacement string is a PHP expression. So when you use a backreference in the replacement expression, you need to put the backreference inside quotes, or otherwise it would be interpreted as PHP code. Like the example from the manual for preg_replace:
preg_replace("/(<\/?)(\w+)([^>]*>)/e",
"'\\1'.strtoupper('\\2').'\\3'",
$html_body);
To make this easier, the data in a backreference with /e is run through addslashes() before being inserted in your replacement expression. So if you have the string
He said: "You're here"
It would become:
He said: \"You\'re here\"
...and be inserted into the expression.
However, if you put this inside a set of single quotes, PHP will not strip away all the slashes correctly! Try this:
print ' He said: \"You\'re here\" ';
Output: He said: \"You're here\"
This is because the sequence \" inside single quotes is not recognized as anything special, and it is output literally.
Using double-quotes to surround the string/backreference will not help either, because inside double-quotes, the sequence \' is not recognized and also output literally. And in fact, if you have any dollar signs in your data, they would be interpreted as PHP variables. So double-quotes are not an option.
The 'solution' is to manually fix it in your expression. It is easiest to use a separate processing function, and do the replacing there (i.e. use "my_processing_function('\\1')" or something similar as replacement expression, and do the fixing in that function).
If you surrounded your backreference by single-quotes, the double-quotes are corrupt:
$text = str_replace('\"', '"', $text);
People using preg_replace with /e should at least be aware of this.
I'm not sure how it would be best fixed in preg_replace. Because double-quotes are a really bad idea anyway (due to the variable expansion), I would suggest that preg_replace's auto-escaping is modified to suit the placement of backreferences inside single-quotes (which seemed to be the intention from the start, but was incorrectly applied).
