View Single Post
  #1   (View Single Post)  
Old 12th December 2011
J65nko J65nko is offline
Administrator
 
Join Date: May 2008
Location: Budel - the Netherlands
Posts: 4,128
Default Tools to assist in database import

In a database import of over 200,000 records, I noticed that a length of 30 chars was not long enough to hold all email addresses.

With assistance of a short awk script I could quickly determine the optimal length:

Code:
$ awk -f field_length.awk import_file.csv

111628 065031051@rrrrrr.com
322626 0594500515@wwwwww.co.za
[snip]
361682 huilin19
78@wrwrwrwr.c
;om
xxxxxxxxx@hotmail.com
146
This unbelievable longest mail address probably was caused by either an UTF-16 issue or data corruption.

The awk script:
Code:
BEGIN {
SIZE = 0 ;
FS = "," ;
}

# field 1 is the primary key
# field 2 is the email address
{
if (length($2) > SIZE ) {
   print $1, $2 ;
   SIZE = length($2) ;
   }
}

END {
print $2 ;
print SIZE ;
}
BTW sed(1) confirmed this:
Code:
$ sed -ne '/361682/p' masterlist_comma.csv 
361682 huilin19
78@wrwrwrwr.c
;om
After editing out the offending line the result was:
Code:
 $ awk -f field_length.awk import_file2.csv
[snip]
58
So a field width of 60 seemed safe
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump

Last edited by J65nko; 12th December 2011 at 06:17 AM.
Reply With Quote