Email Envelope and UTF-8

From cqwiki
Jump to navigationJump to search

Support of National Charsets

International Characters in Email Messages

Most of the modern email clients support national character in the email body. Encoding can be specified in the message envelope using Content-Type header. For instance, the following line added to the message header would inform email client that email body should be treated as HTML using UTF-8 character set.

Content-Type: text/html; charset=utf-8

Nevertheless, what do we suppose to do with international characters in the message envelope, such as email subject or "From" email address? Depending on your environment, there are few options.

Transfer unicode characters in UTF-8

While it was not originally supported in SMTP protocol, using UTF-8 encoding in message envelopes became standard de facto, and eventually was documented in RFCs extending SMTP protocol. In most of the environments, and for most of the modern email clients it is safe to assume that UTF-8 content can be transferred as-is, and it is default behavior for the Email Notification package. Nevertheless, you still need to specify character set using Content-Type header as described above.

Transcoding unicode characters

In some rare cases when email client or email server cannot handle UTF-8 encoded characters in the message envelope, a workaround documented in the 2047 can be used. The following is an example of user-defined function that can be used to encode UTF-8 strings using Base64 encoding and presenting it in the format specified in the RFC.

my $string = shift;

require utf8;
require MIME::Base64;

$string = MIME::Base64::encode($string);

return "=?UTF-8?B?$string?=";

When udb_property record is submitted, this user-defined function can be used to safely encode envelope fields that can contain unicode characters, for instance message subject:

RT_Encode_Envelope("Defect $ID - $headline")