Scratchy - The Apache Log Parser and HTML Report Generator for Python

Parse

The Scratchy parser, parse.py attempts to parse an Apache log file:

$ python parse.py

This assumes you have a config file named "config". If you have multiple config files and/or a config file named differently you will then use the -c or --config command line parameter to tell the parser which config file to use:

$ python parse.py --config=config_search

-or-

$ python parse.py -c config_search

Either of the previous commands will invoke the parser using the file named config_search as the config file.

Assuming everything went well the apache log described in the config file was parsed and the respective reports were produced. If you have a different log file to parse rather than the one you mentioned in the config file you can specify this alternative log file on the command line with the -f or --file option:

$ python parse.py --file=/usr/local/apache/log/access_log.4

-or-

$ python parse.py -f /usr/local/apache/log/access_log.4

Either of thse commands will attempt to parse the file /usr/local/apache/log/access_log.4 rather than the file named in the config file. The config file parameter is useful when you usually parse the same log. The command line parameter is useful when you want to parse a different (perhaps historical) access log.

The parser keeps track of parsed files and offsets, so it is safe to parse access_log repeatedly. Consider this scenario where log rotation is used:

State/Action Result

Your current access_log has 500 lines

You parse access_log with parse.py 500 lines are parsed and the data is added to the data repository

This log file grows to 700 lines

You parse access_log_with parse.py The parser recognizes the first line of this file and determines it has already been parsed. The last 200 lines are parsed and data is stored.

Your log is rotated and a new access_log is created

You parse access_log with parse.py The entire log is parsed and data is collected.

The parser uses a file called filetracker to store all of the logfiles that have ever been parsed and the last location in the file that was parsed. This way, if the log file grows then the previously parsed data will be ignored. Even if the file changes names (due to rotation, for instance) since the first line will always remain the same, the parser continues to work properly.

The parser will collect data from the desired apache web server logfile and organize it for easy retrieval and manipulation for the reporter. For each month of data that is found the data is written to a file and the reporter is invoked. That is, if an apache log contains data that spans 3 months, Jan, Feb and Mar then a file will be created for Jan, the reporter invoked and then the process will repeat with Feb and Mar. The data is written to a file that depends on the DATA_DIR and DATA_NAME (that are specified in your config file) and the month and year of the data being parsed. Assume that we are parsing data for Novemeber, 2002 and DATA_DIR and DATA_NAME are defined as such::

DATA_DIR=/home/phil/scratchy_data
DATA_NAME=phil

the file that will be created will then be:
/home/phil/scratchy_data/phil/112002

After viewing the report, you may wish to modify your config file and re-run the report manually.