In a database import of over 200,000 records, I noticed that a length of 30 chars was not long enough to hold all email addresses.
With assistance of a short awk script I could quickly determine the optimal length:
Code:
$ awk -f field_length.awk import_file.csv
111628 065031051@rrrrrr.com
322626 0594500515@wwwwww.co.za
[snip]
361682 huilin19
78@wrwrwrwr.c
;om
xxxxxxxxx@hotmail.com
146
This unbelievable longest mail address probably was caused by either an UTF-16 issue or data corruption.
The awk script:
Code:
BEGIN {
SIZE = 0 ;
FS = "," ;
}
# field 1 is the primary key
# field 2 is the email address
{
if (length($2) > SIZE ) {
print $1, $2 ;
SIZE = length($2) ;
}
}
END {
print $2 ;
print SIZE ;
}
BTW
sed(1) confirmed this:
Code:
$ sed -ne '/361682/p' masterlist_comma.csv
361682 huilin19
78@wrwrwrwr.c
;om
After editing out the offending line the result was:
Code:
$ awk -f field_length.awk import_file2.csv
[snip]
58
So a field width of 60 seemed safe