|
|||
Tools to assist in database import
In a database import of over 200,000 records, I noticed that a length of 30 chars was not long enough to hold all email addresses.
With assistance of a short awk script I could quickly determine the optimal length: Code:
$ awk -f field_length.awk import_file.csv 111628 065031051@rrrrrr.com 322626 0594500515@wwwwww.co.za [snip] 361682 huilin19 78@wrwrwrwr.c ;om xxxxxxxxx@hotmail.com 146 The awk script: Code:
BEGIN { SIZE = 0 ; FS = "," ; } # field 1 is the primary key # field 2 is the email address { if (length($2) > SIZE ) { print $1, $2 ; SIZE = length($2) ; } } END { print $2 ; print SIZE ; } Code:
$ sed -ne '/361682/p' masterlist_comma.csv 361682 huilin19 78@wrwrwrwr.c ;om Code:
$ awk -f field_length.awk import_file2.csv [snip] 58
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump Last edited by J65nko; 12th December 2011 at 06:17 AM. |
|
|||
In tomorrows episode a Perl script to count the number of comma's, and thus the fields, in an .csv file.
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
|
|||
Consider the following (assuming comma are not found in the data...):
Code:
#!/bin/env perl use strict; use warnings; # array of test strings my @a = (',', ',,', ',,,'); my $max = 0; foreach my $line (@a) { my @matches = ($line =~ /(\,)/g); $max = @matches if @matches > $max; } print "max #occurrences = $max\n"; Last edited by ocicat; 12th December 2011 at 07:01 AM. |
|
|||
The issue I had on hand was, that the import file was over 200,000 records. So I needed to know exactly which record/line did not have 56 fields.
I took advantage of the split function which in scalar context returns the number of fields resulting from the split: Code:
#!/usr/bin/perl use warnings ; use strict ; my @temp ; my $nr ; while (<>) { chomp ; $nr = (@temp = split /,/) ; print "$nr $temp[0]\n" ; } Code:
$ head -10 masterlist_comma.csv | ./split.pl 56 id 56 100625 56 100626 56 100627 56 100628 56 100629 56 100630 56 100631 56 100632 56 100633 I checked these culprits by first redirecting to file fields_count.txt and then use grep(1) Code:
$ grep -v '^56 ' fields_count.txt $ PS why grep -v? Code:
-v Selected lines are those not matching any of the specified patterns.
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
MySQL database import not helpful | J65nko | General software and network | 2 | 10th December 2011 02:45 AM |
When is a database not so relational? | J65nko | News | 1 | 23rd May 2011 05:38 AM |
zfs import | terryd | FreeBSD General | 1 | 23rd November 2008 09:26 PM |
import physical freeBSD into VMWARE (ESX) server as a vServer | ccc | FreeBSD General | 6 | 3rd October 2008 07:04 AM |
PHP database interfaces | TerryP | Programming | 6 | 11th September 2008 01:03 PM |