apachelogs — Parse Apache access logs
GitHub | PyPI | Documentation | Issues | Changelog
Parsing
- class apachelogs.LogParser(format, encoding='iso-8859-1', errors=None)[source]
A class for parsing Apache access log entries in a given log format. Instantiate with a log format string, and then use the
parse()
and/orparse_lines()
methods to parse log entries in that format.- Parameters:
format (str) – an Apache log format
encoding (str) – The encoding to use for decoding certain strings in log entries (see Supported Directives); defaults to
'iso-8859-1'
. Set to'bytes'
to cause the strings to be returned asbytes
values instead ofstr
.errors (str) – the error handling scheme to use when decoding; defaults to
'strict'
- Raises:
InvalidDirectiveError – if an invalid directive occurs in
format
UnknownDirectiveError – if an unknown directive occurs in
format
- parse(entry)[source]
Parse an access log entry according to the log format and return a
LogEntry
object.- Parameters:
entry (str) – an access log entry to parse
- Return type:
- Raises:
InvalidEntryError – if
entry
does not match the log format
- parse_lines(entries, ignore_invalid=False)[source]
Parse the elements in an iterable of access log entries (e.g., an open text file handle) and return a generator of
LogEntry
s. Ifignore_invalid
isTrue
, any entries that do not match the log format will be silently discarded; otherwise, such an entry will cause anInvalidEntryError
to be raised.- Parameters:
- Return type:
LogEntry
generator- Raises:
InvalidEntryError – if an element of
entries
does not match the log format andignore_invalid
isFalse
- class apachelogs.LogEntry[source]
A parsed Apache access log entry. The value associated with each directive in the log format is stored as an attribute on the
LogEntry
object; for example, if the log format contains a%s
directive, theLogEntry
for a parsed entry will have astatus
attribute containing the status value from the entry as anint
. See Supported Directives for the attribute names & types of each directive supported by this library.If the log format contains two or more directives that are stored in the same attribute (e.g.,
%D
and%{us}T
), the given attribute will contain the first non-None
directive value.The values of date & time directives are stored in a
request_time_fields: dict
attribute. If thisdict
contains enough information to assemble a complete (possibly naïve)datetime.datetime
, then theLogEntry
will have arequest_time
attribute equal to thatdatetime.datetime
.- directives
Added in version 0.3.0.
A
dict
mapping individual log format directives (e.g.,"%h"
or"%<s"
) to their corresponding values from the log entry.%{*}t
directives with multiple subdirectives (e.g.,%{%Y-%m-%d}t
) are broken up into one entry per subdirective (For%{%Y-%m-%d}t
, this would become the three keys"%{%Y}t"
,"%{%m}t"
, and"%{%d}t"
). This attribute provides an alternative means of looking up directive values besides using the named attributes.
- entry
The original logfile entry with trailing newlines removed
- format
The entry’s log format string
- apachelogs.parse(format, entry, encoding='iso-8859-1', errors=None)[source]
A convenience function for parsing a single logfile entry without having to directly create a
LogParser
object.encoding
anderrors
have the same meaning as forLogParser
.
- apachelogs.parse_lines(format, entries, encoding='iso-8859-1', errors=None, ignore_invalid=False)[source]
A convenience function for parsing an iterable of logfile entries without having to directly create a
LogParser
object.encoding
anderrors
have the same meaning as forLogParser
.ignore_invalid
has the same meaning as forLogParser.parse_lines()
.
Utilities
- apachelogs.parse_apache_timestamp(s)[source]
Parse an Apache timestamp into a
datetime.datetime
object. The month name in the timestamp is expected to be an abbreviated English name regardless of the current locale.>>> parse_apache_timestamp('[01/Nov/2017:07:28:29 +0000]') datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc)
- Parameters:
s (str) – a string of the form
DD/Mon/YYYY:HH:MM:SS +HHMM
(optionally enclosed in square brackets)- Returns:
an aware
datetime.datetime
- Raises:
ValueError – if
s
is not in the expected format
Log Format Constants
The following standard log formats are available as string constants in this package so that you don’t have to keep typing out the full log format strings:
- apachelogs.COMBINED = '%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"'
NCSA extended/combined log format
- apachelogs.COMBINED_DEBIAN = '%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"'
Like
COMBINED
, but with%O
(total bytes sent including headers) in place of%b
(size of response excluding headers)
- apachelogs.COMMON = '%h %l %u %t "%r" %>s %b'
Common log format (CLF)
- apachelogs.COMMON_DEBIAN = '%h %l %u %t "%r" %>s %O'
Like
COMMON
, but with%O
(total bytes sent including headers) in place of%b
(size of response excluding headers)
- apachelogs.VHOST_COMBINED = '%v:%p %h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"'
COMBINED_DEBIAN
with virtual host & port prepended
Exceptions
- exception apachelogs.Error[source]
Bases:
Exception
The base class for all custom exceptions raised by
apachelogs
- exception apachelogs.InvalidDirectiveError[source]
Bases:
Error
,ValueError
Raised by the
LogParser
constructor when given a log format containing an invalid or malformed directive- format
The log format string containing the invalid directive
- pos
The position in the log format string at which the invalid directive occurs
Supported Directives
The following table lists the log format directives supported by this library
along with the names & types of the attributes at which their parsed values are
stored on a LogEntry
. The attribute names for the directives are based off
of the names used internally by the Apache source code.
A directive with the <
modifier (e.g., %<s
) will be stored at
entry.original_attribute_name
, and a directive with the >
modifier will be stored at entry.final_attribute_name
A type of str
marked with an asterisk (*) means that the directive’s values
are decoded according to the encoding
option to LogParser
.
Any directive may evaluate to None
when it is modified by a set of status
codes (e.g., %400,501T
or %!200T
).
See the Apache documentation for information on the meaning of each directive.
Directive |
|
Type |
---|---|---|
|
N/A |
N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
aware |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(See below) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported strftime
Directives
The following table lists the strftime
directives supported for use in the
parameter of a %{*}t
directive along with the keys & types at which they
are stored in the dict
entry.request_time_fields
. See any C
documentation for information on the meaning of each directive.
A %{*}t
directive with the begin:
modifier (e.g.,
%{begin:%Y-%m-%d}t
) will have its subdirectives stored in
entry.begin_request_time_fields
(in turn used to set
entry.begin_request_time
), and likewise for the end:
modifier.
Directive |
|
Type |
---|---|---|
|
N/A |
N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
N/A |
N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
N/A |
N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Footnotes
Changelog
v0.7.0 (in development)
Support Python 3.9, 3.10, 3.11, and 3.12
Drop support for Python 3.5, 3.6, and 3.7
LogEntry
’s__eq__
method now returnsNotImplemented
instead ofFalse
when comparing against non-LogEntry
valuesMigrated from setuptools to hatch
v0.6.0 (2020-10-13)
Support Python 3.8
%s
now matches any sequence of exactly three digits. Previously, it matched either ‘0’ or any sequence of digits not beginning with ‘0’. Thanks to @chosak for the patch.
v0.5.0 (2019-05-21)
Improved the routine for assembling
request_time
fromrequest_time_fields
:If the month is only available as a full or abbreviated name and the name is not in English, try looking it up in the current locale
If the year is only available in abbreviated form (the
%y
directive) without a century (%C
), treat years less than 69 as part of the twenty-first century and other years as part of the twentiethWhen necessary, use the values of the
%G
,%g
,%u
,%V
,%U
,%W
, and%w
time directives to derive the dateIf
%Z
equals"GMT"
,"UTC"
, or one of the names intime.tzname
, produce an awaredatetime
%{%n}t
and%{%t}t
now match any amount of any whitespace, in order to match strptime(3)’s behaviorBreaking: Renamed the
request_time_fields
keys for%{%G}t
and%{%g}t
from"week_year"
and"abbrev_week_year"
to"iso_year"
and"abbrev_iso_year"
, respectively%{%p}t
can now match the empty string (its value in certain locales)%{%Z}t
can now match the empty string
v0.4.0 (2019-05-19)
Support the
%{c}h
log directive%f
and%R
can now beNone
Bugfix:
%u
can now match the string""
(two double quotes)Support
mod_ssl
’s%{*}c
and%{*}x
directivesSupport the
%{hextid}P
directive (as a hexadecimal integer)Support the
%L
and%{c}L
directivesParameters to
%{*}p
,%{*}P
, and%{*}T
are now treated case-insensitively in order to mirror Apache’s behaviorRefined some directives to better match only the values emitted by Apache:
%l
and%m
no longer accept whitespace%s
and%{tid}P
now only match unsigned integers%{*}C
no longer accepts semicolons or leading or trailing spaces%q
no longer accepts whitespace or pound/hash signs
v0.3.0 (2019-05-12)
Gave
LogEntry
adirectives
attribute for looking up directive values by the corresponding log format directives
v0.2.0 (2019-05-09)
v0.1.0 (2019-05-06)
Initial release
apachelogs
parses Apache access log files. Pass it a log format string and get back a
parser for logfile entries in that format. apachelogs
even takes care of
decoding escape sequences and converting things like timestamps, integers, and
bare hyphens to datetime
values, int
s, and None
s.
Installation
apachelogs
requires Python 3.8 or higher. Just use pip for Python 3 (You have pip, right?) to install
apachelogs
and its dependencies:
python3 -m pip install apachelogs
Examples
Parse a single log entry:
>>> from apachelogs import LogParser
>>> parser = LogParser("%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"")
>>> # The above log format is also available as the constant `apachelogs.COMBINED`.
>>> entry = parser.parse('209.126.136.4 - - [01/Nov/2017:07:28:29 +0000] "GET / HTTP/1.1" 301 521 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"\n')
>>> entry.remote_host
'209.126.136.4'
>>> entry.request_time
datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc)
>>> entry.request_line
'GET / HTTP/1.1'
>>> entry.final_status
301
>>> entry.bytes_sent
521
>>> entry.headers_in["Referer"] is None
True
>>> entry.headers_in["User-Agent"]
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
>>> # Log entry components can also be looked up by directive:
>>> entry.directives["%r"]
'GET / HTTP/1.1'
>>> entry.directives["%>s"]
301
>>> entry.directives["%t"]
datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc)
Parse a file full of log entries:
>>> with open('/var/log/apache2/access.log') as fp:
... for entry in parser.parse_lines(fp):
... print(str(entry.request_time), entry.request_line)
...
2019-01-01 12:34:56-05:00 GET / HTTP/1.1
2019-01-01 12:34:57-05:00 GET /favicon.ico HTTP/1.1
2019-01-01 12:34:57-05:00 GET /styles.css HTTP/1.1
# etc.