apachelogs — Parse Apache access logs¶
GitHub | PyPI | Documentation | Issues | Changelog
Parsing¶
-
class
apachelogs.
LogParser
(format, encoding='iso-8859-1', errors=None)[source]¶ A class for parsing Apache access log entries in a given log format. Instantiate with a log format string, and then use the
parse()
and/orparse_lines()
methods to parse log entries in that format.- Parameters
format (str) – an Apache log format
encoding (str) – The encoding to use for decoding certain strings in log entries (see Supported Directives); defaults to
'iso-8859-1'
. Set to'bytes'
to cause the strings to be returned asbytes
values instead ofstr
.errors (str) – the error handling scheme to use when decoding; defaults to
'strict'
- Raises
InvalidDirectiveError – if an invalid directive occurs in
format
UnknownDirectiveError – if an unknown directive occurs in
format
-
parse
(entry)[source]¶ Parse an access log entry according to the log format and return a
LogEntry
object.- Parameters
entry (str) – an access log entry to parse
- Return type
- Raises
InvalidEntryError – if
entry
does not match the log format
-
parse_lines
(entries, ignore_invalid=False)[source]¶ Parse the elements in an iterable of access log entries (e.g., an open text file handle) and return a generator of
LogEntry
s. Ifignore_invalid
isTrue
, any entries that do not match the log format will be silently discarded; otherwise, such an entry will cause anInvalidEntryError
to be raised.- Parameters
- Return type
LogEntry
generator- Raises
InvalidEntryError – if an element of
entries
does not match the log format andignore_invalid
isFalse
-
class
apachelogs.
LogEntry
[source]¶ A parsed Apache access log entry. The value associated with each directive in the log format is stored as an attribute on the
LogEntry
object; for example, if the log format contains a%s
directive, theLogEntry
for a parsed entry will have astatus
attribute containing the status value from the entry as anint
. See Supported Directives for the attribute names & types of each directive supported by this library.If the log format contains two or more directives that are stored in the same attribute (e.g.,
%D
and%{us}T
), the given attribute will contain the first non-None
directive value.The values of date & time directives are stored in a
request_time_fields: dict
attribute. If thisdict
contains enough information to assemble a complete (possibly naïve)datetime.datetime
, then theLogEntry
will have arequest_time
attribute equal to thatdatetime.datetime
.-
directives
= None¶ New in version 0.3.0.
A
dict
mapping individual log format directives (e.g.,"%h"
or"%<s"
) to their corresponding values from the log entry.%{*}t
directives with multiple subdirectives (e.g.,%{%Y-%m-%d}t
) are broken up into one entry per subdirective (For%{%Y-%m-%d}t
, this would become the three keys"%{%Y}t"
,"%{%m}t"
, and"%{%d}t"
). This attribute provides an alternative means of looking up directive values besides using the named attributes.
-
entry
= None¶ The original logfile entry with trailing newlines removed
-
format
= None¶ The entry’s log format string
-
-
apachelogs.
parse
(format, entry, encoding='iso-8859-1', errors=None)[source]¶ A convenience function for parsing a single logfile entry without having to directly create a
LogParser
object.encoding
anderrors
have the same meaning as forLogParser
.
-
apachelogs.
parse_lines
(format, entries, encoding='iso-8859-1', errors=None, ignore_invalid=False)[source]¶ A convenience function for parsing an iterable of logfile entries without having to directly create a
LogParser
object.encoding
anderrors
have the same meaning as forLogParser
.ignore_invalid
has the same meaning as forLogParser.parse_lines()
.
Utilities¶
-
apachelogs.
parse_apache_timestamp
(s)[source]¶ Parse an Apache timestamp into a
datetime.datetime
object. The month name in the timestamp is expected to be an abbreviated English name regardless of the current locale.>>> parse_apache_timestamp('[01/Nov/2017:07:28:29 +0000]') datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc)
- Parameters
s (str) – a string of the form
DD/Mon/YYYY:HH:MM:SS +HHMM
(optionally enclosed in square brackets)- Returns
an aware
datetime.datetime
- Raises
ValueError – if
s
is not in the expected format
Log Format Constants¶
The following standard log formats are available as string constants in this package so that you don’t have to keep typing out the full log format strings:
-
apachelogs.
COMBINED
= '%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"'¶ NCSA extended/combined log format
-
apachelogs.
COMBINED_DEBIAN
= '%h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"'¶ Like
COMBINED
, but with%O
(total bytes sent including headers) in place of%b
(size of response excluding headers)
-
apachelogs.
COMMON
= '%h %l %u %t "%r" %>s %b'¶ Common log format (CLF)
-
apachelogs.
COMMON_DEBIAN
= '%h %l %u %t "%r" %>s %O'¶ Like
COMMON
, but with%O
(total bytes sent including headers) in place of%b
(size of response excluding headers)
-
apachelogs.
VHOST_COMBINED
= '%v:%p %h %l %u %t "%r" %>s %O "%{Referer}i" "%{User-Agent}i"'¶ COMBINED_DEBIAN
with virtual host & port prepended
Exceptions¶
-
exception
apachelogs.
Error
[source]¶ Bases:
Exception
The base class for all custom exceptions raised by
apachelogs
-
exception
apachelogs.
InvalidDirectiveError
[source]¶ Bases:
apachelogs.errors.Error
,ValueError
Raised by the
LogParser
constructor when given a log format containing an invalid or malformed directive-
format
= None¶ The log format string containing the invalid directive
-
pos
= None¶ The position in the log format string at which the invalid directive occurs
-
-
exception
apachelogs.
InvalidEntryError
[source]¶ Bases:
apachelogs.errors.Error
,ValueError
Raised when a attempting to parse a log entry that does not match the given log format
-
entry
= None¶ The invalid log entry
-
format
= None¶ The log format string the entry failed to match against
-
-
exception
apachelogs.
UnknownDirectiveError
[source]¶ Bases:
apachelogs.errors.Error
,ValueError
Raised by the
LogParser
constructor when given a log format containing an unknown or unsupported directive-
directive
= None¶ The unknown or unsupported directive
-
Supported Directives¶
The following table lists the log format directives supported by this library
along with the names & types of the attributes at which their parsed values are
stored on a LogEntry
. The attribute names for the directives are based off
of the names used internally by the Apache source code.
A directive with the <
modifier (e.g., %<s
) will be stored at
entry.original_attribute_name
, and a directive with the >
modifier will be stored at entry.final_attribute_name
A type of str
marked with an asterisk (*) means that the directive’s values
are decoded according to the encoding
option to LogParser
.
Any directive may evaluate to None
when it is modified by a set of status
codes (e.g., %400,501T
or %!200T
).
See the Apache documentation for information on the meaning of each directive.
Directive |
|
Type |
---|---|---|
|
N/A |
N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
aware |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
(See below) |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Supported strftime
Directives¶
The following table lists the strftime
directives supported for use in the
parameter of a %{*}t
directive along with the keys & types at which they
are stored in the dict
entry.request_time_fields
. See any C
documentation for information on the meaning of each directive.
A %{*}t
directive with the begin:
modifier (e.g.,
%{begin:%Y-%m-%d}t
) will have its subdirectives stored in
entry.begin_request_time_fields
(in turn used to set
entry.begin_request_time
), and likewise for the end:
modifier.
Directive |
|
Type |
---|---|---|
|
N/A |
N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
N/A |
N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
N/A |
N/A |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Footnotes
- 1(1,2,3,4,5,6,7,8,9)
The
cookies
,cryptography
,env_vars
,headers_in
,headers_out
,notes
,trailers_in
,trailers_out
, andvariables
attributes are case-insensitivedict
s.- 2
Apache renders
%{hextid}P
as either a decimal integer or a hexadecimal integer depending on the APR version available.apachelogs
expects%{hextid}P
to always be in hexadecimal; if your Apache produces decimal integers instead, you must instead use%{tid}P
in the log format passed toapachelogs
.
Changelog¶
v0.4.0 (2019-05-19)¶
Support the
%{c}h
log directive%f
and%R
can now beNone
Bugfix:
%u
can now match the string""
(two double quotes)Support
mod_ssl
’s%{*}c
and%{*}x
directivesSupport the
%{hextid}P
directive (as a hexadecimal integer)Support the
%L
and%{c}L
directivesParameters to
%{*}p
,%{*}P
, and%{*}T
are now treated case-insensitively in order to mirror Apache’s behavior- Refined some directives to better match only the values emitted by Apache:
%l
and%m
no longer accept whitespace%s
and%{tid}P
now only match unsigned integers%{*}C
no longer accepts semicolons or leading or trailing spaces%q
no longer accepts whitespace or pound/hash signs
v0.3.0 (2019-05-12)¶
Gave
LogEntry
adirectives
attribute for looking up directive values by the corresponding log format directives
v0.2.0 (2019-05-09)¶
v0.1.0 (2019-05-06)¶
Initial release
apachelogs
parses Apache access log files. Pass it a log format string and get back a
parser for logfile entries in that format. apachelogs
even takes care of
decoding escape sequences and converting things like timestamps, integers, and
bare hyphens to datetime
values, int
s, and None
s.
Installation¶
apachelogs
requires Python 3.5 or higher. Just use pip for Python 3 (You have pip, right?) to install
apachelogs
and its dependencies:
python3 -m pip install apachelogs
Examples¶
Parse a single log entry:
>>> from apachelogs import LogParser
>>> parser = LogParser("%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"")
>>> # The above log format is also available as the constant `apachelogs.COMBINED`.
>>> entry = parser.parse('209.126.136.4 - - [01/Nov/2017:07:28:29 +0000] "GET / HTTP/1.1" 301 521 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"\n')
>>> entry.remote_host
'209.126.136.4'
>>> entry.request_time
datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc)
>>> entry.request_line
'GET / HTTP/1.1'
>>> entry.final_status
301
>>> entry.bytes_sent
521
>>> entry.headers_in["Referer"] is None
True
>>> entry.headers_in["User-Agent"]
'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'
>>> # Log entry components can also be looked up by directive:
>>> entry.directives["%r"]
'GET / HTTP/1.1'
>>> entry.directives["%>s"]
301
>>> entry.directives["%t"]
datetime.datetime(2017, 11, 1, 7, 28, 29, tzinfo=datetime.timezone.utc)
Parse a file full of log entries:
>>> with open('/var/log/apache2/access.log') as fp:
... for entry in parser.parse_lines(fp):
... print(str(entry.request_time), entry.request_line)
...
2019-01-01 12:34:56-05:00 GET / HTTP/1.1
2019-01-01 12:34:57-05:00 GET /favicon.ico HTTP/1.1
2019-01-01 12:34:57-05:00 GET /styles.css HTTP/1.1
# etc.