DaemonForums  

Go Back   DaemonForums > Miscellaneous > Guides

Guides All Guides and HOWTO's.

Reply
 
Thread Tools Display Modes
  #1   (View Single Post)  
Old 17th December 2012
J65nko J65nko is offline
Administrator
 
Join Date: May 2008
Location: Budel - the Netherlands
Posts: 4,128
Default Deleting whitespace from otherwise blank lines

Recently I had to convert several text documents to XML.
To make sure that there were no empty lines with just spaces and/or tabs, I wrote the following small Perl script called 'xlblanks'.

Code:
#!/usr/bin/perl

use warnings;
use strict;
use diagnostics;

# --- delete spaces and tabs from otherwise empty lines

my $total = 0;
my $line_nr;
my @nrs;

while (<>) {
    ++$line_nr; 
    if (
	s/
	^	# at begin of line
	[\t\ ]+	# one or more tabs or blanks
	$	# followed by END OF LINE
	//x	# by nothing
	) {
	++$total;
        push @nrs, $line_nr; 
    }
    print;
}

print STDERR "\n$0: Number of lines found with only tabs or blanks: $total\n";
$, = '-' ;
print STDERR "$0: The line numbers: ", @nrs , "\n\n";
A small sample file shows no visible blanks or tabs on otherwise empty lines:
Code:
FreeBSD
 
DragonFlyBSD
 	   
NetBSD  
	
OpenBSD
Running the script:
Code:
$ xlblanks blanklines.txt                                                         

FreeBSD

DragonFlyBSD

NetBSD  

OpenBSD


./xlblanks: Number of lines found with only tabs or blanks: 3
./xlblanks: The line numbers: -3-5-7-
Displaying the file with 'cat' confirmed these results:
Code:
$ cat -net blanklines.txt                                                         
     1  $
     2  FreeBSD$
     3   $
     4  DragonFlyBSD$
     5   ^I   $
     6  NetBSD  $
     7  ^I$
     8  OpenBSD$
     9  $
    10  $
The two lines reporting the results are sent to STDERR, allowing to create a 'clean' version by redirecting the output to file:

Code:
$ ./xlblanks blanklines.txt >clean.txt 

./xlblanks: Number of lines found with only tabs or blanks: 3
./xlblanks: The line numbers: -3-5-7-

$ cat -net clean.txt
     1  $
     2  FreeBSD$
     3  $
     4  DragonFlyBSD$
     5  $
     6  NetBSD  $
     7  $
     8  OpenBSD$
     9  $
    10  $
The line number output sent to 'stderr' or file descriptor 2, can be redirected to file with:
Code:
$ ./xlblanks blanklines.txt >clean.txt 2> culprits.txt  
$ cat culprits.txt

./xlblanks: Number of lines found with only tabs or blanks: 3
./xlblanks: The line numbers: -3-5-7-
In case you wonder why the line numbers needed to be reported:
The original master files are being maintained in MS Word format , so knowing the line numbers made it easy to eliminate those irritating, useless blanks and tabs.

An equivalent 'sed' script, without the lines reporting stuff:

Code:
$ sed -Ee 's/^[[:blank:]]+$//g' blanklines.txt | cat -net
     1  $
     2  FreeBSD$
     3  $
     4  DragonFlyBSD$
     5  $
     6  NetBSD  $
     7  $
     8  OpenBSD$
     9  $
    10  $
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump
Reply With Quote
  #2   (View Single Post)  
Old 5th January 2013
Skinny Skinny is offline
Port Guard
 
Join Date: Jul 2012
Posts: 25
Default

Or this:

Code:
perl -pi -e 's/^\s+$/\n/g' blanklines.txt
Reply With Quote
  #3   (View Single Post)  
Old 6th January 2013
Mike-Sanders Mike-Sanders is offline
Fdisk Soldier
 
Join Date: Dec 2012
Posts: 52
Default

Yet another, using the [:space:] POSIX character class:

Code:
sed 's/^[[:space:]]*$//g' file.in > file.out
[:space:] (whitespace) = [ \t\r\n\v\f]
__________________
www.tacoshack.xyz

Last edited by Mike-Sanders; 7th January 2013 at 03:58 AM. Reason: fixed really bad typo... (palm/face)
Reply With Quote
  #4   (View Single Post)  
Old 17th June 2013
thomasw_ thomasw_ is offline
Real Name: thomas
Port Guard
 
Join Date: Feb 2013
Location: kimberley
Posts: 30
Default

Handy solutions here, and being a fan of the simplicity of substitution in sed, I am especially fond of this last example by Mike.

Some folks, though, might want to insert a "_ "or a " ." where spaces occur, especially in filenames. I wrote a simple script that recursively removes spaces in a directory and in the names of its files. I find it useful for renaming my audio files.

I'll paste it below for anyone to use or modify if anyone finds it useful.

Code:
 rmspaces.sh

#!/bin/ksh
#recursive script to replace spaces in filenames with periods


find . -name '* *' | while read file;
do
target=`echo "$file" | sed 's/ /\./g'`;
echo "Renaming '$file' to '$target'";
mv "$file" "$target";
done;
Reply With Quote
  #5   (View Single Post)  
Old 18th June 2013
s0xxx's Avatar
s0xxx s0xxx is offline
Package Pilot
 
Join Date: May 2008
Posts: 192
Default

This might be the right occasion to exploit awk's "NF" built-in variable, which stands for the number of fields in the current input record separated by whitespace; whitespace in awk means any string of one or more spaces and/or tabs. So, NF is true if there are fields in the record:

Code:
$ cat -net test 
     1  FreeBSD$
     2  ^I ^I$
     3  DragonFlyBSD $
     4   $
     5  NetBSD$
     6  ^I  $
     7  OpenBSD$
     8    ^I  ^I$
     9  MirOS^I $

$ awk 'NF {print $0 "\n"}' test | cat -net
     1  FreeBSD$
     2  $
     3  DragonFlyBSD $
     4  $
     5  NetBSD$
     6  $
     7  OpenBSD$
     8  $
     9  MirOS^I $
    10  $
$
It says: "if there are fields in the record, print the record line and plus anotha newline; ignore all other lines".
To continue exploiting awk's other built-in variables, we might have written it as...
Code:
$ awk 'BEGIN{ORS="\n\n"} NF' test | cat -net -

       or

$ awk 'BEGIN{ORS=RS RS} NF' test | cat -net -
...or a bit more cryptic:
Code:
$ awk 'ORS=NF?RS RS:""' test | cat -net -
... which all do the same.

The only time it will not work is when you have multiple blank lines (containing either space, tabs or both) and you want to retain them, i.e. the format of the file. It that case, above awk command will only output a single newline.
Casual reader will notice an extra newline at the end, that is left as an exercise.

If one doesn't care about the format of the file and just want to kill all whitespace, then it's just a:
Code:
$ awk NF test | cat -net
     1  FreeBSD$
     2  DragonFlyBSD $
     3  NetBSD$
     4  OpenBSD$
     5  MirOS^I $
$
Simple heh?


P.S. I apologize for hijacking your thread @J65nko, I saw the others did and I had a bit of a inspirational moment...
__________________
The best way to learn UNIX is to play with it, and the harder you play, the more you learn.
If you play hard enough, you'll break something for sure, and having to fix a badly broken system is arguably the fastest way of all to learn. -Michael Lucas, AbsoluteBSD
Reply With Quote
Reply

Tags
perl, sed, text formatting

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
deleting a file or directory divadgnol67 OpenBSD General 7 1st April 2011 03:31 PM
Blank screen after installkernel beaute FreeBSD Installation and Upgrading 1 3rd June 2010 10:54 AM
Deleting lines with certain letters/keywords. bigb89 Programming 4 12th November 2008 11:59 PM
Putting Lines to Together. bigb89 Programming 4 24th September 2008 03:59 AM
root password is blank mfaridi FreeBSD Security 10 16th May 2008 10:19 PM


All times are GMT. The time now is 03:57 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Content copyright © 2007-2010, the authors
Daemon image copyright ©1988, Marshall Kirk McKusick