DaemonForums  

Go Back   DaemonForums > Miscellaneous > Programming

Programming C, bash, Python, Perl, PHP, Java, you name it.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 6th May 2008
stukov's Avatar
stukov stukov is offline
Real Name: Jean-Michel Philippon-Nadeau
Package Pilot
 
Join Date: May 2008
Location: Sherbrooke, Qc, Canada
Posts: 167
Default Proper display of accents in mails sent by PERL

Hello Perl gurus,

I am creating a Perl script that sends e-mails to users. Some of them are french users and I need to send them an e-mail in French. However, accents get all screwed up:
Quote:
Pour des raisons de sécurité, nous demandons à tous nos usagers...
I use MIME::Lite to send e-mails. Here is what I found on the docs to set the proper charset for my e-mail:
Code:
$msg->attr("content-type.charset" => "ISO8859-1");
I also tried UTF-8.

Any ideas?

Thanks!
__________________
"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."
Reply With Quote
  #2   (View Single Post)  
Old 6th May 2008
lvlamb's Avatar
lvlamb lvlamb is offline
Real Name: Louis V. Lambrecht
Spam Deminer
 
Join Date: May 2008
Location: .be
Posts: 221
Default

There are several items to check.
First, ISO8859-15 should be your locale for accented characters (and the Euro sign).

Second, UTF-8, although somewhat compatible will have to be told you use fr_FR* locale.

Third, your system locale can be superseeded by you applications setups: for instance, I can use a system en_US.8859-1 locale, but have my xterm set to UTF-8 (uxterm) or emacs use whatever locale, or gnome set to us UTF-8 by default.
I can be on en_US on my terminal, but see Nautilus save my files as UTF-8.

Forth: Perl is going UTF-8 default. Currently there is an utf8 pragma (man utf8) allowing/disabling UTF-8.

Fifth: when you lauch an UTF-8 capable application, that application might call a subshell which will default run on your systems locale, which can be -as in my case- en_US. This sub-shell could eventually bork out with an error message as "malformed UTF-8 character string".

Most of the errors will be spitted out by bad written applications when those applications GNUishly presume your only possible setup is UTF-8 (if not UNICODE which is yet another non-standard standard).

My 2cents: check both your applications and DM (gnome, KDE) for proper encoding a common locale output.
Check the proper encoding choice when saving files.

Portable mails always should use 8859-1 text format.
You can read 8859-1 mails or webpages correctly, including diacritics from your most visited links by settin UTF-8 on your local browser setting. Local means that people using other locales will not read those pages the same way you do.

Fwiw, when switching from one code page to the other, include a setenv or export ENV or just change LC_* in your routine.
Remember that when you re-read the output, the output screen should also be set on the proper environment/locale. For instance, re-read your UTF-8 formatted mail with Thunderbird set to use UTF-8 also.
Note that UTF-8 will not be read correctly by 8859-15 set applications.
sécurité, is what you get.
__________________
da more I know I know I know nuttin'
Reply With Quote
  #3   (View Single Post)  
Old 6th May 2008
stukov's Avatar
stukov stukov is offline
Real Name: Jean-Michel Philippon-Nadeau
Package Pilot
 
Join Date: May 2008
Location: Sherbrooke, Qc, Canada
Posts: 167
Default

Thanks for the reply lvlamb. It is much clearer now.

What confuses me is that I sent a message from me to me via Thunderbird. A look at the full headers shows me that the encoding is ISO8859-1 (my accents were displayed properly):
Code:
text/plain; charset=ISO-8859-1; format=flowed
I also have similar headers with the message sent via my Perl script. Sounds like MIME::Lite set them properly. However, the mails is still encoded in UTF-8 (it worked when I changed the encoding of Thunderbird from ISO08859-1 to UTF-8):
Code:
multipart/mixed; boundary="_----------=_1210097604302960"; charset="ISO-8859-15"
My first guess would be that I am missing something and Perl still sends the message in UTF-8 even if I change the headers of the e-mail for 8859-1. Am I right?
__________________
"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."
Reply With Quote
  #4   (View Single Post)  
Old 6th May 2008
lvlamb's Avatar
lvlamb lvlamb is offline
Real Name: Louis V. Lambrecht
Spam Deminer
 
Join Date: May 2008
Location: .be
Posts: 221
Default

Assumption is the mother of all f*ck-ups. So I'll be cautious.
I did not checked every version or revision of the installed Perl on every OS but I am pretty sure that every maintainer has faced encoding problems and made attempts to correct them. Basically, check resulting code for the encodings and don't google for answers.

In theory, Perl works in UTF-8 unless told not to do so (unless defaulted otherwise at compile time ). Perl will default to UTF-8 in a nearby revision. Unless too many users complain.

So, I assume Perl is UTF-8 default.

Note that, since WinXP(ntfs) for one, since April last year for Linux, late to the party OpenSolaris,the defaults there are UTF-8.

I assume *BSD users are smart enough to modify their application to display UTF-8 encodings when needed. I would use UTF-8 defaults and only translate to 8859-1 for mails sent to mailing lists as you always will be called names when not writing pure ASCII-7 and in English

Part of the base xorg, you now have luit (man luit) to play with filters.

Fwiw, here is a Sun doc that gives some tips on locale/UTF-8 conversions.
http://docs.sun.com/app/docs/doc/819...3aglffe?a=view

This does not answer your question
IMVHO, using UTF-8 throughout will be correctly read by most applications (hence users).
It is up to the application to correctly translate the encodings, i.e.: use MIME flags.
Most files don't have MIME flags. There is IMHO the error. Much simpler to implement than making the whole OS UTF-8 compliant. Which is changing a bad to a worse. UTF-8 is not an universal encoding either.
__________________
da more I know I know I know nuttin'
Reply With Quote
  #5   (View Single Post)  
Old 15th May 2008
replaced replaced is offline
Real Name: Adam Hoka
New User
 
Join Date: May 2008
Posts: 8
Default

Quote:
Originally Posted by lvlamb View Post
I assume *BSD users are smart enough to modify their application to display UTF-8 encodings when needed.
Yes, we are.
Reply With Quote
  #6   (View Single Post)  
Old 15th May 2008
stukov's Avatar
stukov stukov is offline
Real Name: Jean-Michel Philippon-Nadeau
Package Pilot
 
Join Date: May 2008
Location: Sherbrooke, Qc, Canada
Posts: 167
Default

Quote:
Originally Posted by replaced View Post
Yes, we are.
I wish it would also be true for every other OS...
__________________
"Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."
Reply With Quote
Reply

Tags
perl, utf-8

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
dwm status bar won't display apm output asemisldkfj General software and network 6 16th August 2009 11:07 PM
can't open display error gosha OpenBSD General 12 28th May 2009 05:49 AM
Odd font display TerryP Feedback and Suggestions 4 2nd November 2008 11:22 AM
Terminal display behavior 18Googol2 FreeBSD General 8 26th September 2008 02:05 PM
backup mails on NAS directory milo974 OpenBSD General 3 8th August 2008 07:39 AM


All times are GMT. The time now is 08:59 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick