PHP – using mail() and unicode text – text gets disturbed – PHP

  php

Q(Question):

I have the following problem. On a website there’s a (simple) feedback
form. This is used also by Polish visitors who (of course) type Polish
text using special characters.

However, when I receive the text in my mailbox, all special characters
have been turned into mess……

For example: "wspólprace" is turned into "współprace".

It seems PHP is handling the Unicode-8 strings quite well (when I
‘echo’ the strings on the site, I see the text correctly), until the
point that it is send by using mail().

Is this a server configuration issue? Or something else?

How can I get my text to remain in Unicode?

I have this problem both on my testserver (Apache 1.3.28, PHP 4.3.2 on
Windows XP) as on my providers server (Apache under Linux).
Hope anybody can help.

Many thanks,
Edo.

A(Answer):

For example: "wspólprace" is turned into "wspóÅ,prace".

It seems PHP is handling the Unicode-8 strings quite well

are you setting up the headers of the email to state something such as

Content-Type: text/html;charset=iso-8859-15

A(Answer):

It’s an encoding issue. One way to deal with this is to escape the UTF-8
text using imap_8bit() and set the charset in the email header to UTF-8.
Many email clients don’t handle this correctly though. I would recommend
sending multipart mails. In the plaintext part, remove the accent marks
(solidarnos’c’ -> solidarnosc). In the HTML part, encoding the special
characters as HTML entities (doka,d => dokąd). This will ensure that
everyone see something that’s readable. The same strategy is used by Outlook
Express. It’ll be helpful if you send yourself a test email and look at the
source.

Here are a couple functions that do what I suggested:

$pl_markless_tr = array(
"\xC4\x85" => "a",
"\xC4\x87" => "c",
"\xC4\x99" => "e",
"\xC5\x82" => "l",
"\xC5\x84" => "n",
"\xC5\x9b" => "s",
"\xC5\xba" => "z",
"\xC5\xbc" => "z");

$pl_uni_entities_tr = array(
"\xC4\x85" => "ą",
"\xC4\x87" => "ć",
"\xC4\x99" => "ę",
"\xC5\x82" => "ł",
"\xC5\x84" => "ń",
"\xC5\x9b" => "ś",
"\xC5\xba" => "ź",
"\xC5\xbc" => "ż");

function remove_polish_marks($s) {
global $pl_markless_tr;
return strtr($s, $pl_markless_tr);
}

function escape_polish_marks($s) {
global $pl_uni_entities_tr;
return strtr($s, $pl_uni_entities_tr);
}
Uzytkownik "Edo van der Zouwen"
<ez*****@dithiervoorisdomainenhetisbijdemonkenners wetenwattedoen.nl> napisal
w wiadomosci news:jm********************************@4ax.com…

I have the following problem. On a website there’s a (simple) feedback
form. This is used also by Polish visitors who (of course) type Polish
text using special characters.

However, when I receive the text in my mailbox, all special characters
have been turned into mess……

For example: "wspólprace" is turned into "współprace".

It seems PHP is handling the Unicode-8 strings quite well (when I
‘echo’ the strings on the site, I see the text correctly), until the
point that it is send by using mail().

Is this a server configuration issue? Or something else?

How can I get my text to remain in Unicode?

I have this problem both on my testserver (Apache 1.3.28, PHP 4.3.2 on
Windows XP) as on my providers server (Apache under Linux).
Hope anybody can help.

Many thanks,
Edo.

A(Answer):

On Sun, 1 Feb 2004 15:33:30 -0000, "Filth" <p.*********@blueyonder.co.uk>
wrote:

For example: "wspólprace" is turned into "wspóÅ,prace".

It seems PHP is handling the Unicode-8 strings quite well

are you setting up the headers of the email to state something such as

Content-Type: text/html;charset=iso-8859-15

Content-Type: text/plain;charset=utf-8

… sounds like the more appropriate header to send in this case.


Andy Hassall <an**@andyh.co.uk> / Space: disk usage analysis tool
<http://www.andyh.co.uk> / <http://www.andyhsoftware.co.uk/space>

A(Answer):

On Sun, 1 Feb 2004 15:33:30 -0000, "Filth"
<p.*********@blueyonder.co.uk> wrote:

For example: "wspólprace" is turned into "wspóÅ,prace".

It seems PHP is handling the Unicode-8 strings quite well

are you setting up the headers of the email to state something such as

Content-Type: text/html;charset=iso-8859-15

Thanks, this did the trick, except the header should contain:

"Content-Type: text/html; charset=UNICODE-1-1-UTF-8"

Cheers,
Edo.

A(Answer):

On Sun, 1 Feb 2004 12:06:26 -0500, "Chung Leong"
<ch***********@hotmail.com> wrote:

It’s an encoding issue. One way to deal with this is to escape the UTF-8
text using imap_8bit() and set the charset in the email header to UTF-8.
Many email clients don’t handle this correctly though. I would recommend
sending multipart mails. In the plaintext part, remove the accent marks
(solidarnos’c’ -> solidarnosc). In the HTML part, encoding the special
characters as HTML entities (doka,d => dokąd). This will ensure that
everyone see something that’s readable. The same strategy is used by Outlook
Express. It’ll be helpful if you send yourself a test email and look at the
source.

Here are a couple functions that do what I suggested:

$pl_markless_tr = array(
"\xC4\x85" => "a",
"\xC4\x87" => "c",
"\xC4\x99" => "e",
"\xC5\x82" => "l",
"\xC5\x84" => "n",
"\xC5\x9b" => "s",
"\xC5\xba" => "z",
"\xC5\xbc" => "z");

$pl_uni_entities_tr = array(
"\xC4\x85" => "ą",
"\xC4\x87" => "ć",
"\xC4\x99" => "ę",
"\xC5\x82" => "ł",
"\xC5\x84" => "ń",
"\xC5\x9b" => "ś",
"\xC5\xba" => "ź",
"\xC5\xbc" => "ż");

function remove_polish_marks($s) {
global $pl_markless_tr;
return strtr($s, $pl_markless_tr);
}

function escape_polish_marks($s) {
global $pl_uni_entities_tr;
return strtr($s, $pl_uni_entities_tr);
}

Thanks, very interesting method. For the time being, the email client
used by the receiver of the webforms is capable of handling the
unicode text, so I’ll stick to just using a header which enables
Unicode text.

However, I’ll definiately save and check your method, might be very
useful in the future.

Dziekuje i do wiedzenia 🙂
Edo.

A(Answer):

On Sun, 01 Feb 2004 18:20:19 +0000, Andy Hassall <an**@andyh.co.uk>
wrote:

Content-Type: text/plain;charset=utf-8

… sounds like the more appropriate header to send in this case.

Thx, found that out myself, but appreciate your input.

Edo.

LEAVE A COMMENT