bug in preg_match_all? – PHP

  php

Q(Question):

I’m reluctant to post a bug report on php.net because they’re so
vicious with bogus reports, so I thought I’d check here to make sure
I’m not going crazy:

<?php
$string = "Hello[";
preg_match("/[^A-z0-9_]/", $string, $matches);
print_r($matches);
?>

Expected result:
Array(

A(Answer):

On May 7, 10:30*am, tommybi…@gmail.com wrote:

I’m reluctant to post a bug report on php.net because they’re so
vicious with bogus reports, so I thought I’d check here to make sure
I’m not going crazy:

<?php
$string = "Hello[";
preg_match("/[^A-z0-9_]/", $string, $matches);
print_r($matches);
?>

Expected result:
Array(

Ugh, firefox…

Expected result:
Array(
[0] =[
)

Actual result:
Array(
)

PHP 5.2.4

A(Answer):

to********@gmail.com escribió:

>$string = "Hello[";
preg_match("/[^A-z0-9_]/", $string, $matches);

Note you have ‘A-z’, not ‘A-Z’ or ‘A-z’. It just happens that ‘[‘ is
between uppercase and lowercase letters.

http://alvaro.es – Álvaro G. Vicario – Burgos, Spain
— Mi sitio sobre programación web: http://bits.demogracia.com
— Mi web de humor al baño María: http://www.demogracia.com

A(Answer):

On 7 May, 15:44, "Álvaro G. Vicario"
<alvaroNOSPAMTHA…@demogracia.comwrote:

tommybi…@gmail.com escribió:

$string = "Hello[";
preg_match("/[^A-z0-9_]/", $string, $matches);

Note you have ‘A-z’, not ‘A-Z’ or ‘A-z’. It just happens that ‘[‘ is
between uppercase and lowercase letters.


–http://alvaro.es- Álvaro G. Vicario – Burgos, Spain
— Mi sitio sobre programación web:http://bits.demogracia.com
— Mi web de humor al baño María:http://www.demogracia.com

I thik you meant to say "Note you have ‘A-z’, not ‘A-Z’ or ‘a-z’."

A(Answer):

Captain Paralytic escribió:

>>>$string = "Hello[";
preg_match("/[^A-z0-9_]/", $string, $matches);

Note you have ‘A-z’, not ‘A-Z’ or ‘A-z’. It just happens that ‘[‘ is
between uppercase and lowercase letters.

I thik you meant to say "Note you have ‘A-z’, not ‘A-Z’ or ‘a-z’."

Yep, sorry. I sometimes think I need a newsreader with syntax
highlighting and integrated debugger…


http://alvaro.es – Álvaro G. Vicario – Burgos, Spain
— Mi sitio sobre programación web: http://bits.demogracia.com
— Mi web de humor al baño María: http://www.demogracia.com

A(Answer):

On May 7, 10:44 am, "Álvaro G. Vicario"
<alvaroNOSPAMTHA…@demogracia.comwrote:

tommybi…@gmail.com escribió:

$string = "Hello[";
preg_match("/[^A-z0-9_]/", $string, $matches);

Note you have ‘A-z’, not ‘A-Z’ or ‘A-z’. It just happens that ‘[‘ is
between uppercase and lowercase letters.


–http://alvaro.es- Álvaro G. Vicario – Burgos, Spain
— Mi sitio sobre programación web:http://bits.demogracia.com
— Mi web de humor al baño María:http://www.demogracia.com

I tried to find a good reference for you as to why this is the case,
but nothing very good. Basically, Perl regular expressions are case
sensitive and uses ASCII character mapping, therefore [A-Z], [a-z],
and [A-z] are different. [A-Z] will allow one character from A-Z
(caps) – ASCII map numbers 65-90; [a-z] will allow one character a-z
(lowercase] – ASCII map 97-122; [A-z] goes from ASCII map 65-122
(obviously) but a little less obvious is what is between 90 and 97
(see http://en.wikipedia.org/wiki/ISO_8859-1). By using [A-z], you
are allowing all those additional characters through.

Regards,

Steve

A(Answer):

On May 7, 11:04*am, ELINTPimp <smsi…@gmail.comwrote:

On May 7, 10:44 am, "Álvaro G. Vicario"

<alvaroNOSPAMTHA…@demogracia.comwrote:

tommybi…@gmail.com escribió:

>$string = "Hello[";
>preg_match("/[^A-z0-9_]/", $string, $matches);

Note you have ‘A-z’, not ‘A-Z’ or ‘A-z’. It just happens that ‘[‘ is
between uppercase and lowercase letters.


–http://alvaro.es-Álvaro G. Vicario – Burgos, Spain
— Mi sitio sobre programación web:http://bits.demogracia.com
— Mi web de humor al baño María:http://www.demogracia.com

I tried to find a good reference for you as to why this is the case,
but nothing very good. *Basically, Perl regular expressions are case
sensitive and uses ASCII character mapping, therefore [A-Z], [a-z],
and [A-z] are different. *[A-Z] will allow one character from A-Z
(caps) – ASCII map numbers 65-90; [a-z] will allow one character a-z
(lowercase] – ASCII map 97-122; [A-z] goes from ASCII map 65-122
(obviously) but a little less obvious is what is between 90 and 97
(seehttp://en.wikipedia.org/wiki/ISO_8859-1). *By using [A-z], you
are allowing all those additional characters through.

Regards,

Steve

Wow, silly mistake on my part. I guess I never thought about the "A-
Z" shorthand being nothing more than an ascii character range… you
see it so frequently, and see things like [^\*-6] so rarely.

Thanks for the heads up.

LEAVE A COMMENT