DaemonForums - View Single Post

J65nko · #4 **(View Single Post)** 12th December 2011

The issue I had on hand was, that the import file was over 200,000 records. So I needed to know exactly which record/line did not have 56 fields.

I took advantage of the split function which in scalar context returns the number of fields resulting from the split:

Code:

#!/usr/bin/perl

use warnings ;
use strict ;

my @temp ;
my $nr  ;

while (<>)  {
  chomp ;
  $nr = (@temp = split /,/) ; 
  print "$nr $temp[0]\n" ; 
}

A sample run:

Code:

$ head -10 masterlist_comma.csv | ./split.pl

56 id
56 100625
56 100626
56 100627
56 100628
56 100629
56 100630
56 100631
56 100632
56 100633

The first field 'id' is the primary index (unique) so I choose to print that too, and thus enabling me to locate any culprits.

I checked these culprits by first redirecting to file fields_count.txt and then use grep(1)

Code:

$ grep -v '^56 ' fields_count.txt
$

No culprits in the first 10 records

There also were not any in the other 200,000.

PS why grep -v?

Code:

    -v      Selected lines are those not matching any of the specified
             patterns.