PyHatis a Python program that adds a table of contents to HTML files. The table of contents is generated from the file's head (<H#> ) elements. The default is to use
H4to generate table of contents entries.
1.1 How PyHat works
Processing is done in two passes.
- In the first pass, the input file is read into memory, and then closed. Working on the in-memory copy, PyHat removes any existing table of contents markup from the file.
- In the second pass, working on the in-memory copy, a new table of contents is created — hierarchical numbers (e.g.
"1.2.1") are added to the header elements, enclosed in
<span class="contents_item>..</span>tags, then the numbers and header elements are used to generate a table of contents.
- At the conclusion of processing, the in-memory copy of the file is written to disk.
-xoption is specified, only the first pass is executed, and all table of contents markup is removed, except for the <div class="table_of_contents">...</div>tags (see below).
1.2 How PyHat's markup works
- In the table of contents:
- Each section number is a link that jumps to the top of the table of contents.
- Each section name is a link that jumps to the corresponding section.
- In the main part of the page:
- Each section number is a reverse link that jumps back to the corresponding line in the table of contents.
2.1 The standard disclaimer, and some specific warnings
Use at your own risk.
In this case, that means: take appropriate precautions so that you do not lose your input HTML file if PyHat messes up somehow. It is recommended that you copy your input file to some backup location before processing it with PyHat, so that you can roll back to it if necessary.
There are two reasons for this warning. The usual reason is that the program may contain bugs.
Of greater concern is the fact that your input file may not contain valid HTML, and that will cause PyHat to produce unpredictable results. HTML (in contrast to XML) is notoriously lax, and Web browsers are notoriously accommodating to ill-formed HTML. This means that your HTML file may display perfectly well in a Web browser, yet still be ill-formed. And if it is ill-formed, PyHat may mangle it. I have put checks for several kinds of ill-formedness into PyHat, but the program can't catch everything — it doesn't try to do HTML validation.
There are also some concerns with inline <style> sections. There have been a few problems with this in the past, which I think are the results of limitations in the Python HTML parser. So take care.
The bottom line is that you should exercise normal precautions. You should never use PyHat without (1) verifying that its output is OK, and (2) being able to recover back to the original file if the output fails to be correct.
2.2 Some tips
If you are publishing a document on a web site, probably the safest and easiest way to use PyHat is to keep two copies of the file. The original, master copy is the copy that you edit and maintain. It will not contain a table of contents. The publication copy is created by running the master copy through PyHat with the -d (destination directory) option pointing to the appropriate directory on your website where you will publish the document.
+--------------+ _____________ +-------------+ | | / \ | | | master |-------->( PyHat )-------->| publication | | copy | \_____________/ | copy | | | | | +--------------+ +-------------+
If you don't have such a setup — if you edit the document infrequently — the
"-x"option is useful. You can use the
-xoption to remove the table of contents markup, making the HTML less cluttered and easier to edit. Then, when you've finished editing the file, you can re-create the table of contents.
2.3 Requirements for the input file
The input file should have one, and only one, <H1> tag. It should occur before any other head tags in the file, and should contain the title of the page. Note that the contents of the <H1> tag will not be put into the table of contents.
A head tag cannot occur within the scope of another head tag. That is, the following code is invalid.<h2> .... <h3> ... </h3> </h2>
Head tags should be used in proper, unbroken sequence. That is, the following code is invalid, because it skips the <h3> level.<h2> ... </h2> <h4> ... </h4>
The input file should contain a div tag with the class name of "table_of_contents", like this:<div class="table_of_contents">...</div>
These tags tell PyHat where to put the table of contents. If the input file does not contain such tags (for example, on the first time PyHat is run on a file), then the table of contents markup will be placed immediately before the first H2 element. Later, if you don't like the placement of these tags, you can move them.
HEREto download file
PyHat is a Python application. It requires Python 2.3 or greater.
Assuming that you are using Python 2.5, you can install PyHat by unzipping
You can invoke PyHat from the command line this way:
python pyhat.py [options] filename1 [filename2 filename3 filename4 ...]
The simplest possible way to invoke PyHat is this way:
python pyhat.py filename
The following options are recognized:
- The lowest level of header to be included in the table of contents. May be a number from 2 to 6. Defaults to "4".
- Destination directory — puts the modified HTML files into the <destdir> directory. If <destdir> does not exist, it is created. Defaults to "pyhat_out".
- Removes all table of contents information — except the table of contents location tags:
- Verbose mode — shows extremely verbose progress messages during processing. May be helpful in debugging invalid conditions in your input file.
- Quiet mode— suppresses the (very brief) normal progress messages.
- Removes the first word of every head tag. This option supports conversion from situations in which table of contents numbers were hard-coded as the first word of the header tags. Unless the
-qoption is used, this option will print messages to the console showing the words that have been removed. This option should be used very carefully! If you use it, review the progress messages and the output HTML file very carefully!
Here are some examples
- python pyhat.py -v c:\HtmlFiles\myfile.html
- Uses the verbose option for debugging.
Note that the output file myfile.html is written to the default output directory, the pyhat_out subdirectory of the current directory.
- python pyhat.py -x c:\HtmlFiles\myfile.html
- Same as the previous example, but uses the -x option to remove all table of contents information from the file..
- python pyhat.py -h3 -dc:/www/mypapers myfile.html
- Uses only H2 and H3 elements in the table of contents.
Puts the output in the c:/www/mypapers directory
Note that it is possible to use the "-d" option to tell PyHat to replace its input file, in place. (If you do this, take appropriate steps to insure that the master copy of your input file will not be lost if PyHat trashes its input file.) The trick is simply to specify that the output directory should be the same directory as the directory where the input file is located. Here are a couple of examples:python pyhat.py -d. myfile.html python pyhat.py -d/www/mypapers /www/mypapers/myfile.html
To report problems, send e-mail to Stephen Ferg. But note that pyhat is an old project and I am no longer actively maintaining it.
This work is licensed under the Creative Commons Attribution 2.0 License You are free to copy, distribute, and display the work, to make derivative works, and to make commercial use of the work. If you do, you must give the original author credit.
- Version 1 released into the public domain.