|
Guides All Guides and HOWTO's. |
|
Thread Tools | Display Modes |
|
|||
Deleting whitespace from otherwise blank lines
Recently I had to convert several text documents to XML.
To make sure that there were no empty lines with just spaces and/or tabs, I wrote the following small Perl script called 'xlblanks'. Code:
#!/usr/bin/perl use warnings; use strict; use diagnostics; # --- delete spaces and tabs from otherwise empty lines my $total = 0; my $line_nr; my @nrs; while (<>) { ++$line_nr; if ( s/ ^ # at begin of line [\t\ ]+ # one or more tabs or blanks $ # followed by END OF LINE //x # by nothing ) { ++$total; push @nrs, $line_nr; } print; } print STDERR "\n$0: Number of lines found with only tabs or blanks: $total\n"; $, = '-' ; print STDERR "$0: The line numbers: ", @nrs , "\n\n"; Code:
FreeBSD DragonFlyBSD NetBSD OpenBSD Code:
$ xlblanks blanklines.txt FreeBSD DragonFlyBSD NetBSD OpenBSD ./xlblanks: Number of lines found with only tabs or blanks: 3 ./xlblanks: The line numbers: -3-5-7- Code:
$ cat -net blanklines.txt 1 $ 2 FreeBSD$ 3 $ 4 DragonFlyBSD$ 5 ^I $ 6 NetBSD $ 7 ^I$ 8 OpenBSD$ 9 $ 10 $ Code:
$ ./xlblanks blanklines.txt >clean.txt ./xlblanks: Number of lines found with only tabs or blanks: 3 ./xlblanks: The line numbers: -3-5-7- $ cat -net clean.txt 1 $ 2 FreeBSD$ 3 $ 4 DragonFlyBSD$ 5 $ 6 NetBSD $ 7 $ 8 OpenBSD$ 9 $ 10 $ Code:
$ ./xlblanks blanklines.txt >clean.txt 2> culprits.txt $ cat culprits.txt ./xlblanks: Number of lines found with only tabs or blanks: 3 ./xlblanks: The line numbers: -3-5-7- The original master files are being maintained in MS Word format , so knowing the line numbers made it easy to eliminate those irritating, useless blanks and tabs. An equivalent 'sed' script, without the lines reporting stuff: Code:
$ sed -Ee 's/^[[:blank:]]+$//g' blanklines.txt | cat -net 1 $ 2 FreeBSD$ 3 $ 4 DragonFlyBSD$ 5 $ 6 NetBSD $ 7 $ 8 OpenBSD$ 9 $ 10 $
__________________
You don't need to be a genius to debug a pf.conf firewall ruleset, you just need the guts to run tcpdump |
|
|||
Or this:
Code:
perl -pi -e 's/^\s+$/\n/g' blanklines.txt |
|
|||
Yet another, using the [:space:] POSIX character class:
Code:
sed 's/^[[:space:]]*$//g' file.in > file.out
__________________
www.tacoshack.xyz Last edited by Mike-Sanders; 7th January 2013 at 03:58 AM. Reason: fixed really bad typo... (palm/face) |
|
|||
Handy solutions here, and being a fan of the simplicity of substitution in sed, I am especially fond of this last example by Mike.
Some folks, though, might want to insert a "_ "or a " ." where spaces occur, especially in filenames. I wrote a simple script that recursively removes spaces in a directory and in the names of its files. I find it useful for renaming my audio files. I'll paste it below for anyone to use or modify if anyone finds it useful. Code:
rmspaces.sh #!/bin/ksh #recursive script to replace spaces in filenames with periods find . -name '* *' | while read file; do target=`echo "$file" | sed 's/ /\./g'`; echo "Renaming '$file' to '$target'"; mv "$file" "$target"; done; |
|
||||
This might be the right occasion to exploit awk's "NF" built-in variable, which stands for the number of fields in the current input record separated by whitespace; whitespace in awk means any string of one or more spaces and/or tabs. So, NF is true if there are fields in the record:
Code:
$ cat -net test 1 FreeBSD$ 2 ^I ^I$ 3 DragonFlyBSD $ 4 $ 5 NetBSD$ 6 ^I $ 7 OpenBSD$ 8 ^I ^I$ 9 MirOS^I $ $ awk 'NF {print $0 "\n"}' test | cat -net 1 FreeBSD$ 2 $ 3 DragonFlyBSD $ 4 $ 5 NetBSD$ 6 $ 7 OpenBSD$ 8 $ 9 MirOS^I $ 10 $ $ To continue exploiting awk's other built-in variables, we might have written it as... Code:
$ awk 'BEGIN{ORS="\n\n"} NF' test | cat -net - or $ awk 'BEGIN{ORS=RS RS} NF' test | cat -net - Code:
$ awk 'ORS=NF?RS RS:""' test | cat -net - The only time it will not work is when you have multiple blank lines (containing either space, tabs or both) and you want to retain them, i.e. the format of the file. It that case, above awk command will only output a single newline. Casual reader will notice an extra newline at the end, that is left as an exercise. If one doesn't care about the format of the file and just want to kill all whitespace, then it's just a: Code:
$ awk NF test | cat -net 1 FreeBSD$ 2 DragonFlyBSD $ 3 NetBSD$ 4 OpenBSD$ 5 MirOS^I $ $ P.S. I apologize for hijacking your thread @J65nko, I saw the others did and I had a bit of a inspirational moment...
__________________
The best way to learn UNIX is to play with it, and the harder you play, the more you learn. If you play hard enough, you'll break something for sure, and having to fix a badly broken system is arguably the fastest way of all to learn. -Michael Lucas, AbsoluteBSD |
Tags |
perl, sed, text formatting |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
deleting a file or directory | divadgnol67 | OpenBSD General | 7 | 1st April 2011 03:31 PM |
Blank screen after installkernel | beaute | FreeBSD Installation and Upgrading | 1 | 3rd June 2010 10:54 AM |
Deleting lines with certain letters/keywords. | bigb89 | Programming | 4 | 12th November 2008 11:59 PM |
Putting Lines to Together. | bigb89 | Programming | 4 | 24th September 2008 03:59 AM |
root password is blank | mfaridi | FreeBSD Security | 10 | 16th May 2008 10:19 PM |