You want to improve script performance by optimizing string-matching operations.
Replace unnecessary regular expression calls with faster string and character type function alternatives.
A common source of unnecessary computation is the use of regular expression functions when they are not needed—for example, if you’re validating a form submission for a valid username and want to make sure that the username contains only alphanumeric characters.
A common approach to this problem is a regular expression:
1 2 3 |
if (!preg_match('/^[a-z0-9]+$/i', $username)) { echo 'please enter a valid username.'; } |
The same test can be performed much faster with the ctype_alnum() function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
$username = 'foo411'; $start = microtime(true); if (!preg_match('/^[a-z0-9]+/i', $username)) { echo 'please enter a valid username'; } $regextime = microtime(true) - $start; $start = microtime(true); if (!ctype_alnum($username)) { echo 'please enter a valid username'; } $ctypetime = microtime(true) - $start; echo "preg_match took: $regextime seconds\n"; echo "ctype_alnum took: $ctypetime seconds\n"; |
This will output results similar to:
1 2 |
preg_match took: 0.000163078308105 seconds ctype_alnum took: 9.05990600586E-06 seconds |
ctype_alnum() is considerably faster; 9.05990600586E-06 is the same as 0.00000906 seconds, which is 18 times faster than the preg_match() regular expression, with exactly the same result.
When applied to a complex application, replacing unnecessary regular expressions with equivalent alternatives can add up to a significant performance gain.
A good litmus test for using a regular expression (or not) is to see whether the match you’re performing can be explained in a brief sentence.
Granted, there are some matches, such as “string is a valid email address,” which cannot be adequately verified without a complex regular expression.
However, “check if string A contains string B” can be tested with several different approaches, but is ultimately a very simple test that does not require regular expressions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$haystack = 'The quick brown fox jumps over the lazy dog'; $needle = 'lazy dog'; // slowest (and deprecated) if (ereg($needle, $haystack)) echo 'match!'; // slow if (preg_match("/$needle/", $haystack)) echo 'match!'; // fast if (strstr($haystack, $needle)) echo 'match!'; // fastest if (strpos($haystack, $needle) !== false) echo 'match!'; |
There is certainly a benefit to double-checking the ctype and string functions before making a commitment to a regular expression, particularly if you’re working a section of code that will loop repeatedly.