6. Unicode


Newsletter Signup

   

Chapter 7: Message Queuing Chapter 5: Authentication & Security Chapter 6. Unicode and Non-ASCII Support

6.1 Quoted-Printable Format
6.2 Non-ASCII Characters in Headers
6.3 Unicode and UTF-8
6.4 UTF-8 Support in AspEmail
6.5 Valid CharSet Values

6.1 Quoted-Printable Format

AspEmail is capable of sending messages in alphabets other than US-ASCII by supporting the "Quoted-Printable" format. This format is described in RFC-2045. The idea of the format is that characters with codes less than 33 and greater than 126 are represented by an "=" followed by a two digit hexadecimal representation of the character's value. For example, the decimal value 12 (US-ASCII form feed) is represented as =0C, and the decimal value 61 (US-ASCII "=") can be represented as =3D.

AspEmail encodes the message body in the Quoted-Printable format automatically if the ContentTransferEncoding property is set to the string "Quoted-Printable" (letter case is immaterial). You may also set the Charset property to the appropriate character set. The following code snippet sends a message in Russian:

<% @codepage=1251 %>

<%
...
Mail.Charset = "Windows-1251"
Mail.Body = "Сообщение по-русски."
Mail.ContentTransferEncoding = "Quoted-Printable"
%>

The directive <% @codepage=1251 %> instructs the ASP interpreter to treat the hard-coded characters in the script as Russian symbols (1251 is the Russian code page). As a result, the Body property will receive a Russian Unicode string.

6.2 Non-ASCII Characters in Headers

If you wish to send a message with certain mail headers such as Subject:, To: or From: containing non-US-ASCII characters, you should use the method Mail.EncodeHeader to encode your character string according to the RFC 1522. The method takes one required parameter, the header string, and one optional parameter, the character set, which is "ISO-8859-1" by default. For example:

<% @codepage=1251 %>

<%
Mail.Subject = Mail.EncodeHeader("Тема По-русски", "Windows-1251")
Mail.FromName = Mail.EncodeHeader("Иван", "Windows-1251")
Mail.AddAddress "stein@somecompany.no", Mail.EncodeHeader("Штейн")
%>

6.3 Unicode and UTF-8

From MSDN: "Unicode is a 16-bit, fixed-width character encoding standard that encompasses virtually all of the characters commonly used on computers today. This includes most of the world's written languages, plus publishing characters, mathematical and technical symbols, and punctuation marks."

From Unicode.org: "Computers ... store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters... Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language."

For example, the basic Latin letter "A" has the code Hex 0041 (65), the Russian letter has the code Hex 0416 (1046), and the Chinese character has the code Hex 32A5 (12965).

UTF-8 (Unicode Transformation Format, 8-bit encoding form) is the recommended format to be used to send Unicode-based data across networks, in particular the Internet. UTF-8 represents a Unicode value as a sequence of 1, 2, or 3 bytes.

Unicode characters in the range Hex 0000 to 007F are encoded simply as bytes 00 to 7F. This means that files and strings which contain only 7-bit ASCII characters have the same encoding under both ASCII and UTF-8. Therefore, the Unicode 0041 ("A") in UTF-8 is Hex 41.

Unicode characters in the range Hex 0080 to 07FF are encoded as a sequence of two bytes For example, the Unicode 0416 () is encoded as Hex D0 96. Unicode characters in the range Hex 0800 to FFFF are encoded as a sequence of three bytes. For example the Unicode 32A5 () is encoded as Hex E3 8A A5.

6.4 UTF-8 Support in AspEmail

AspEmail 5.0 offers full UTF-8 support in both a message body and headers. To send a UTF-8 encoded message, you must set the CharSet property to the string "UTF-8" (case is immaterial), and ContentTransferEncoding to "Quoted-Printable". You should also pass "UTF-8" as the second argument to EncodeHeader.

The following code sample demonstrates the UTF-8 usage:

<%
' change to address of your own SMTP server
strHost = "smtp.broadviewnet.net"

' Enable UTF-8 -> Unicode translation for form items
Session.CodePage = 65001 ' UTF-8 code

If Request("Send") <> "" Then
   Set Mail = Server.CreateObject("Persits.MailSender")
   ' enter valid SMTP host
   Mail.Host = strHost

   Mail.From = "info@aspemail.com" ' From address
   Mail.FromName = Mail.EncodeHeader(Request("FromName"), "utf-8")
   Mail.AddAddress Request("To")

   ' message subject
   Mail.Subject = Mail.EncodeHeader( Request("Subject"), "utf-8")

   ' message body
   Mail.Body = Request("Body")

   ' UTF-8 parameters
   Mail.CharSet = "UTF-8"
   Mail.ContentTransferEncoding = "Quoted-Printable"
   Mail.Send ' send message
   Response.Write "Message sent to " & Request("To")
End If
%>

<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" content="text/html; charset=utf-8">
<TITLE>AspEmail: Unicode.asp</TITLE>
</HEAD>
<BODY>

<FORM METHOD="POST" ACTION="Unicode.asp">
<TABLE CELLSPACING=0 CELLPADDING=0>
<TR><TD>Enter email:</TD><TD><INPUT TYPE="TEXT" NAME="To"></TD></TR>
<TR><TD>Enter your name:</TD><TD><INPUT TYPE="TEXT" NAME="FromName"></TD></TR>
<TR><TD>Enter Subject:</TD><TD><INPUT TYPE="TEXT" NAME="Subject"></TD></TR>
<TR><TD>Enter Body:</TD><TD><TEXTAREA cols="50" rows="10" NAME="Body"></TEXTAREA></TD></TR>
<TR><TD COLSPAN=2><INPUT TYPE=SUBMIT NAME="Send" VALUE="Send"></TD></TR>
</TABLE>
</FORM>
</BODY>
</HTML>

This code sample has several important elements you must not overlook:

<META HTTP-EQUIV="Content-Type" content="text/html; charset=utf-8">

This META tag specifies the character set for this page to be UTF-8. This, among other things, instructs the browser to UTF8-encode all form items when the form is submitted.

Session.CodePage = 65001

This line instructs our ASP script to convert UTF8-encoded form items (returned by the Request.Form collection) back to regular Unicode strings. The number 65001 is the UTF-8 code page.

Mail.Subject = Mail.EncodeHeader( Request("Subject"), "utf-8")

The second optional argument is set to "UTF-8" for proper encoding of the header.

Mail.CharSet = "UTF-8"
Mail.ContentTransferEncoding = "Quoted-Printable"

These two lines ensure proper UTF-8 encoding of the message body.

Click the links below to run this code sample:

http://localhost/aspemail/NonAscii/Unicode.asp
http://localhost/aspemail/NonAscii/Unicode.aspx .NET Version

6.5 Valid CharSet Values

You may specify the following string values for the CharSet property, as well as the second optional argument to the EncodeHeader method:

Value Meaning
"UTF-8" UTF-8
"UTF-7" UTF-7
"Windows-1250"
"cp1250"
ANSI - Central Europe
"Windows-1251"
"cp1251"
ANSI - Cyrillic
"Windows-1252"
"cp1252"
"ascii"
"us-ascii"
Latin I
"Windows-1253"
"cp1253"
ANSI - Greek
"Windows-1254"
"cp1254"
ANSI - Turkish
"Windows-1255"
"cp1255"
ANSI - Hebrew
"Windows-1256"
"cp1256"
ANSI - Arabic
"Windows-1257"
"cp1257"
ANSI - Baltic
"Windows-1258"
"cp1258"
ANSI - Vietnamese
"ISO-8859-1" Latin I (default value)
"ISO-8859-2" Central Europe
"ISO-8859-3" Latin 3
"ISO-8859-4" Baltic
"ISO-8859-5" Cyrillic
"ISO-8859-6" Arabic
"ISO-8859-7" Greek
"ISO-8859-8" Hebrew
"ISO-8859-9" Latin 5
"ISO-8859-15" Latin 9
"cp866" Russian DOS
"koi8-r" Russian
"koi8-u" Ukrainian
"shift_jis" Japanese Windows
"ks_c_5601-1987"
"korean"
Korean
"EUC-KR"
"korean"
EUC - Korean
"BIG5" Traditional Chinese Windows
"GB2312"
"chinese"
Simplified Chinese
"HZ-GB-2312" Simplified Chinese HZ
"EUC-JP" EUC - Japanese
"X-EUC-TW" EUC - Traditional Chinese

Chapter 7: Message Queuing Chapter 5: Authentication & Security  

 
AspEmail.com Home Page Copyright © 1998 - 2009 Persits Software, Inc.
All Rights Reserved
AspEmail™ is a trademark of Persits Software, Inc.