View Single Post
  #1   (View Single Post)  
Old 17th December 2012
J65nko J65nko is offline
Administrator
 
Join Date: May 2008
Location: Budel - the Netherlands
Posts: 4,128
Default Deleting whitespace from otherwise blank lines

Recently I had to convert several text documents to XML.
To make sure that there were no empty lines with just spaces and/or tabs, I wrote the following small Perl script called 'xlblanks'.

Code:
#!/usr/bin/perl

use warnings;
use strict;
use diagnostics;

# --- delete spaces and tabs from otherwise empty lines

my $total = 0;
my $line_nr;
my @nrs;

while (<>) {
    ++$line_nr; 
    if (
	s/
	^	# at begin of line
	[\t\ ]+	# one or more tabs or blanks
	$	# followed by END OF LINE
	//x	# by nothing
	) {
	++$total;
        push @nrs, $line_nr; 
    }
    print;
}

print STDERR "\n$0: Number of lines found with only tabs or blanks: $total\n";
$, = '-' ;
print STDERR "$0: The line numbers: ", @nrs , "\n\n";
A small sample file shows no visible blanks or tabs on otherwise empty lines:
Code:
FreeBSD
 
DragonFlyBSD
 	   
NetBSD  
	
OpenBSD
Running the script:
Code:
$ xlblanks blanklines.txt                                                         

FreeBSD

DragonFlyBSD

NetBSD  

OpenBSD


./xlblanks: Number of lines found with only tabs or blanks: 3
./xlblanks: The line numbers: -3-5-7-
Displaying the file with 'cat' confirmed these results:
Code:
$ cat -net blanklines.txt                                                         
     1  $
     2  FreeBSD$
     3   $
     4  DragonFlyBSD$
     5   ^I   $
     6  NetBSD  $
     7  ^I$
     8  OpenBSD$
     9  $
    10  $
The two lines reporting the results are sent to STDERR, allowing to create a 'clean' version by redirecting the output to file:

Code:
$ ./xlblanks blanklines.txt >clean.txt 

./xlblanks: Number of lines found with only tabs or blanks: 3
./xlblanks: The line numbers: -3-5-7-

$ cat -net clean.txt
     1  $
     2  FreeBSD$
     3  $
     4  DragonFlyBSD$
     5  $
     6  NetBSD  $
     7  $
     8  OpenBSD$
     9  $
    10  $
The line number output sent to 'stderr' or file descriptor 2, can be redirected to file with:
Code:
$ ./xlblanks blanklines.txt >clean.txt 2> culprits.txt  
$ cat culprits.txt

./xlblanks: Number of lines found with only tabs or blanks: 3
./xlblanks: The line numbers: -3-5-7-
In case you wonder why the line numbers needed to be reported:
The original master files are being maintained in MS Word format , so knowing the line numbers made it easy to eliminate those irritating, useless blanks and tabs.

An equivalent 'sed' script, without the lines reporting stuff:

Code:
$ sed -Ee 's/^[[:blank:]]+$//g' blanklines.txt | cat -net
     1  $
     2  FreeBSD$
     3  $
     4  DragonFlyBSD$
     5  $
     6  NetBSD  $
     7  $
     8  OpenBSD$
     9  $
    10  $
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump
Reply With Quote