Because of the heavy commenting, no further explanation here.
Code:
$ cat titlecollect
# -- sed command file to collect all title tags from an XML document
#
# example lines:
# <title>Automating patch generation and application for configuration files</title>
# <section><title>Introduction</title>
# <section><title>Automating patch creation</title
#
/<title>..*<\/title>/ {
# replace <section> and </section> tags with 'nothing'
s/<section>//
s/<\/section>//
# insert four blanks as indentation at begin of line
s/^/ /
# append a "\n" to 'hold space', then append 'pattern space'
# to 'hold space'
# in other words, collect all these line to 'hold space'
H
}
# stuff to do at last line
$ {
# insert XML header and 'toc' root element
i\
<?xml version="1.0" ?>\
<?xml-stylesheet href="XSL" type="text/xsl" ?>\
\
<toc>
# get all collected stuff from hold space to pattern space
g
# append a closing 'toc" XML element and a comment
a\
</toc>\
\
<!-- end of table of contents ->
# print 'pattern space'
p
}
# -- end of sed command file
An example run:
Code:
$ sed -nf titlecollect Patchcreate.xml
<?xml version="1.0" ?>
<?xml-stylesheet href="XSL" type="text/xsl" ?>
<toc>
<title>Automating patch generation and application for configuration files</title>
<title>Introduction</title>
<title>Patching <file>/etc/mail/aliases</file>manually</title>
<title>Automating patch creation</title>
<title>Using <file>patchcreate</file> to create a <file>sshd_config</file> patch</title>
<title>The <file>patchcreate</file> script</title>
<title>Comparison of <prog>sshd</prog> before and after patching</title>
</toc>
The pure sed command file without comments
Code:
$ grep -v '^#' titlecollect
/<title>..*<\/title>/ {
s/<section>//
s/<\/section>//
s/^/ /
H
}
$ {
i\
<?xml version="1.0" ?>\
<?xml-stylesheet href="XSL" type="text/xsl" ?>\
\
<toc>
g
a\
</toc>\
\
<!-- end of table of contents ->
p
}