|
|||
Parsing emails with 'awk' and 'perl'
An 'awk' skeleton script to parse mails to decide what are email header lines, and which lines make up the body of the mail.
Code:
# awk skeleton to parse mails in mbox format # empty line separates header from body /^From/, /^$/ { printf "\nhead : %s", $0 next } /^$/,/^From/ { if ($1 ~ /^From/) next printf "\nbody : %s", $0 } Code:
$ awk -f awk-parse-mails mail-j65 head : From MAILER-DAEMON Thu Feb 24 01:50:56 2011 head : Date: 24 Feb 2011 01:50:56 +0100 head : From: Mail System Internal Data <MAILER-DAEMON@hercules.utp.xnet> head : Subject: DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA head : Message-ID: <1298508656@hercules.utp.xnet> head : X-IMAP: 1275177528 0000000491 head : Status: RO head : body : body : This text is part of the internal format of your mail folder, and is not body : a real message. It is created automatically by the mail system software. body : If deleted, important folder data will be lost, and it will be re-created body : with the data reset to initial values. body : body : head : From j65nko@hercules.utp.xnet Thu Feb 24 03:03:11 2011 head : Received: from hercules.utp.xnet (localhost [127.0.0.1]) head : by hercules.utp.xnet (8.14.3/8.14.3) with ESMTP id p1O23Bmk005438 head : for <j65nko@hercules.utp.xnet>; Thu, 24 Feb 2011 03:03:11 +0100 (CET) head : Received: (from j65nko@localhost) head : by hercules.utp.xnet (8.14.3/8.14.3/Submit) id p1O23B1a025655 head : for j65nko; Thu, 24 Feb 2011 03:03:11 +0100 (CET) head : Date: Thu, 24 Feb 2011 03:03:11 +0100 (CET) head : From: j65nko@hercules.utp.xnet head : Message-Id: <201102240203.p1O23B1a025655@hercules.utp.xnet> head : To: j65nko@hercules.utp.xnet head : Subject: apples head : body : body : I like to eat apples body : head : From j65nko@hercules.utp.xnet Thu Feb 24 03:03:11 2011 head : Received: from hercules.utp.xnet (localhost [127.0.0.1]) head : by hercules.utp.xnet (8.14.3/8.14.3) with ESMTP id p1O23B5W023497 head : for <j65nko@hercules.utp.xnet>; Thu, 24 Feb 2011 03:03:11 +0100 (CET) head : Received: (from j65nko@localhost) head : by hercules.utp.xnet (8.14.3/8.14.3/Submit) id p1O23BHm007707 head : for j65nko; Thu, 24 Feb 2011 03:03:11 +0100 (CET) head : Date: Thu, 24 Feb 2011 03:03:11 +0100 (CET) head : From: j65nko@hercules.utp.xnet head : Message-Id: <201102240203.p1O23BHm007707@hercules.utp.xnet> head : To: j65nko@hercules.utp.xnet head : Subject: oranges head : body : body : I like to eat oranges body : head : From j65nko@hercules.utp.xnet Thu Feb 24 03:03:11 2011 head : Received: from hercules.utp.xnet (localhost [127.0.0.1]) head : by hercules.utp.xnet (8.14.3/8.14.3) with ESMTP id p1O23BXo026743 head : for <j65nko@hercules.utp.xnet>; Thu, 24 Feb 2011 03:03:11 +0100 (CET) [snip] Code:
#!/usr/bin/perl use strict ; use warnings ; while (<>) { chomp ; if (/^From/../^$/) { print "\nhead : $_" ; next ; } if (/^$/.. /^From/) { if (/^From/) { next } ; print "\nbody : $_" ; } } Code:
$ perl-parse-mails mail-j65 >results.perl $ awk -f awk-parse-mails mail-j65 >results.awk $ diff results.awk results.perl $ cat -n results.awk | head -5 1 2 head : From MAILER-DAEMON Thu Feb 24 01:50:56 2011 3 head : Date: 24 Feb 2011 01:50:56 +0100 4 head : From: Mail System Internal Data <MAILER-DAEMON@hercules.utp.xnet> 5 head : Subject: DON'T DELETE THIS MESSAGE -- FOLDER INTERNAL DATA $
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
|
|||
The script and test file for downloading
BTW the emails in the test file were generated with: Code:
for X in apples oranges kiwi\s ; do echo I like to eat $X | mail -s "$X" j65nko ; done
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump Last edited by J65nko; 24th February 2011 at 03:49 AM. |
Tags |
awk, mbox format, parsing mail, perl |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Perl 5.12.3 released | J65nko | News | 0 | 26th January 2011 11:00 AM |
Perl locale | Theta | OpenBSD General | 3 | 9th January 2009 01:59 PM |
Learning Perl | mtx | Book reviews | 7 | 22nd October 2008 05:55 PM |
perl/tk | bsdnewbie999 | OpenBSD Packages and Ports | 4 | 8th August 2008 12:34 AM |
Perl Script | c0mrade | Programming | 1 | 26th June 2008 05:04 AM |