Parsing

class apachelogs.LogParser(format, encoding='iso-8859-1', errors=None)[source]

A class for parsing Apache access log entries in a given log format. Instantiate with a log format string, and then use the parse() and/or parse_lines() methods to parse log entries in that format.

Parameters:
  • format (str) – an Apache log format

  • encoding (str) – The encoding to use for decoding certain strings in log entries (see Supported Directives); defaults to 'iso-8859-1'. Set to 'bytes' to cause the strings to be returned as bytes values instead of str.

  • errors (str) – the error handling scheme to use when decoding; defaults to 'strict'

Raises:
parse(entry)[source]

Parse an access log entry according to the log format and return a LogEntry object.

Parameters:

entry (str) – an access log entry to parse

Return type:

LogEntry

Raises:

InvalidEntryError – if entry does not match the log format

parse_lines(entries, ignore_invalid=False)[source]

Parse the elements in an iterable of access log entries (e.g., an open text file handle) and return a generator of LogEntrys. If ignore_invalid is True, any entries that do not match the log format will be silently discarded; otherwise, such an entry will cause an InvalidEntryError to be raised.

Parameters:
  • entries – an iterable of str

  • ignore_invalid (bool) – whether to silently discard entries that do not match the log format

Return type:

LogEntry generator

Raises:

InvalidEntryError – if an element of entries does not match the log format and ignore_invalid is False

class apachelogs.LogEntry[source]

A parsed Apache access log entry. The value associated with each directive in the log format is stored as an attribute on the LogEntry object; for example, if the log format contains a %s directive, the LogEntry for a parsed entry will have a status attribute containing the status value from the entry as an int. See Supported Directives for the attribute names & types of each directive supported by this library.

If the log format contains two or more directives that are stored in the same attribute (e.g., %D and %{us}T), the given attribute will contain the first non-None directive value.

The values of date & time directives are stored in a request_time_fields: dict attribute. If this dict contains enough information to assemble a complete (possibly naïve) datetime.datetime, then the LogEntry will have a request_time attribute equal to that datetime.datetime.

directives

New in version 0.3.0.

A dict mapping individual log format directives (e.g., "%h" or "%<s") to their corresponding values from the log entry. %{*}t directives with multiple subdirectives (e.g., %{%Y-%m-%d}t) are broken up into one entry per subdirective (For %{%Y-%m-%d}t, this would become the three keys "%{%Y}t", "%{%m}t", and "%{%d}t"). This attribute provides an alternative means of looking up directive values besides using the named attributes.

entry

The original logfile entry with trailing newlines removed

format

The entry’s log format string

apachelogs.parse(format, entry, encoding='iso-8859-1', errors=None)[source]

A convenience function for parsing a single logfile entry without having to directly create a LogParser object.

encoding and errors have the same meaning as for LogParser.

apachelogs.parse_lines(format, entries, encoding='iso-8859-1', errors=None, ignore_invalid=False)[source]

A convenience function for parsing an iterable of logfile entries without having to directly create a LogParser object.

encoding and errors have the same meaning as for LogParser. ignore_invalid has the same meaning as for LogParser.parse_lines().