Link to www.Python.org, the home page for the Python programming language

PyHat

Home        Stephen Ferg      revised: 2008-06-10

Click HERE to use the online/CGI version of PyHat to add a table of contents to one of your HTML files.

Table of Contents

           
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

1 What PyHat does

PyHat is a Python program that adds a table of contents to HTML files. The table of contents is generated from the file's head (<H#> ) elements. The default is to use H2, H3 and H4 to generate table of contents entries.

1.1 How PyHat works

Processing is done in two passes.

If the -x option is specified, only the first pass is executed, and all table of contents markup is removed, except for the <div class="table_of_contents">...</div>tags (see below).

1.2 How PyHat's markup works

2 Tips on how to use (and not to use) PyHat

2.1 The standard disclaimer, and some specific warnings

Use at your own risk.

In this case, that means: take appropriate precautions so that you do not lose your input HTML file if PyHat messes up somehow. It is recommended that you copy your input file to some backup location before processing it with PyHat, so that you can roll back to it if necessary.

There are two reasons for this warning. The usual reason is that the program may contain bugs.

Of greater concern is the fact that your input file may not contain valid HTML, and that will cause PyHat to produce unpredictable results. HTML (in contrast to XML) is notoriously lax, and Web browsers are notoriously accommodating to ill-formed HTML. This means that your HTML file may display perfectly well in a Web browser, yet still be ill-formed. And if it is ill-formed, PyHat may mangle it. I have put checks for several kinds of ill-formedness into PyHat, but the program can't catch everything — it doesn't try to do HTML validation.

There are also some concerns with inline <style> sections. There have been a few problems with this in the past, which I think are the results of limitations in the Python HTML parser. So take care.

The bottom line is that you should exercise normal precautions. You should never use PyHat without (1) verifying that its output is OK, and (2) being able to recover back to the original file if the output fails to be correct.

2.2 Some tips

If you are publishing a document on a web site, probably the safest and easiest way to use PyHat is to keep two copies of the file. The original, master copy is the copy that you edit and maintain. It will not contain a table of contents. The publication copy is created by running the master copy through PyHat with the -d (destination directory) option pointing to the appropriate directory on your website where you will publish the document.

+--------------+           _____________           +-------------+
|              |          /             \          |             |
|    master    |-------->(     PyHat     )-------->| publication |
|     copy     |          \_____________/          |    copy     |
|              |                                   |             |
+--------------+                                   +-------------+
        

If you don't have such a setup — if you edit the document infrequently — the "-x" option is useful. You can use the -x option to remove the table of contents markup, making the HTML less cluttered and easier to edit. Then, when you've finished editing the file, you can re-create the table of contents.

2.3 Requirements for the input file

The input file should have one, and only one, <H1> tag. It should occur before any other head tags in the file, and should contain the title of the page. Note that the contents of the <H1> tag will not be put into the table of contents.

A head tag cannot occur within the scope of another head tag. That is, the following code is invalid.

<h2> ....
   <h3> ... </h3>
</h2>

Head tags should be used in proper, unbroken sequence. That is, the following code is invalid, because it skips the <h3> level.

<h2> ... </h2>
<h4> ... </h4>

The input file should contain a div tag with the class name of "table_of_contents", like this:

<div class="table_of_contents">...</div>

These tags tell PyHat where to put the table of contents. If the input file does not contain such tags (for example, on the first time PyHat is run on a file), then the table of contents markup will be placed immediately before the first H2 element. Later, if you don't like the placement of these tags, you can move them.

3 How to download and install PyHat

Click HERE to download file pyhat.zip.

PyHat is a Python application. It requires Python 2.3 or greater.

Assuming that you are using Python 2.5, you can install PyHat by unzipping pyhat.zip into your Python25/Lib/site-packages directory.

4 How to run PyHat

You can invoke PyHat from the command line this way:

python pyhat.py [options] filename1 [filename2 filename3 filename4 ...]

The simplest possible way to invoke PyHat is this way:

python pyhat.py filename

The following options are recognized:

-hheadnumber
The lowest level of header to be included in the table of contents. May be a number from 2 to 6. Defaults to "4".
-ddestdir
Destination directory — puts the modified HTML files into the <destdir> directory. If <destdir> does not exist, it is created. Defaults to "pyhat_out".
-x
Removes all table of contents information — except the table of contents location tags:   <div class="table_of_contents">..</div>.
-v
Verbose mode — shows extremely verbose progress messages during processing. May be helpful in debugging invalid conditions in your input file.
-q
Quiet mode— suppresses the (very brief) normal progress messages.
-w
Removes the first word of every head tag. This option supports conversion from situations in which table of contents numbers were hard-coded as the first word of the header tags. Unless the -q option is used, this option will print messages to the console showing the words that have been removed. This option should be used very carefully! If you use it, review the progress messages and the output HTML file very carefully!

Here are some examples

python pyhat.py -v c:\HtmlFiles\myfile.html
Uses the verbose option for debugging.
Note that the output file myfile.html is written to the default output directory, the pyhat_out subdirectory of the current directory.

python pyhat.py -x c:\HtmlFiles\myfile.html
Same as the previous example, but uses the -x option to remove all table of contents information from the file..

python pyhat.py -h3 -dc:/www/mypapers myfile.html
Uses only H2 and H3 elements in the table of contents.
Puts the output in the c:/www/mypapers directory

Note that it is possible to use the "-d" option to tell PyHat to replace its input file, in place. (If you do this, take appropriate steps to insure that the master copy of your input file will not be lost if PyHat trashes its input file.) The trick is simply to specify that the output directory should be the same directory as the directory where the input file is located. Here are a couple of examples:

python  pyhat.py   -d.               myfile.html
python  pyhat.py   -d/www/mypapers   /www/mypapers/myfile.html

5 Contact Information

To report problems, send e-mail to Stephen Ferg. But note that pyhat is an old project and I am no longer actively maintaining it.

6 Usage Conditions

Creative Commons Licence

This work is licensed under the Creative Commons Attribution 2.0 License You are free to copy, distribute, and display the work, to make derivative works, and to make commercial use of the work. If you do, you must give the original author credit.

7 Revision History

2004-12-06
Version 1 released into the public domain.

8 Demonstration

8.1 Test of H3

8.2 Test of H3

8.2.1 Test of H4

Test of H5
Test of H6