Merge ~tcuthbert/container-log-archive-charm:master into container-log-archive-charm:master

Proposed by Thomas Cuthbert
Status: Merged
Approved by: Thomas Cuthbert
Approved revision: a460969380d9726e1bdd205c13a496ca42753532
Merged at revision: c7cf260897b0e3b3441e812797420e4b2523111a
Proposed branch: ~tcuthbert/container-log-archive-charm:master
Merge into: container-log-archive-charm:master
Diff against target: 27 lines (+7/-1)
1 file modified
files/anonymize_web_log_ips.py (+7/-1)
Reviewer Review Type Date Requested Status
Joel Sing (community) +1 Approve
Stuart Bishop (community) Needs Fixing
Review via email: mp+376610@code.launchpad.net

Commit message

Work around nginx general access/error logs that emit the date as the first column

To post a comment you must log in.
Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

This merge proposal is being monitored by mergebot. Change the status to Approved to merge.

Revision history for this message
Stuart Bishop (stub) wrote :

This looks like a library bug to me. Surely this case should be an netaddr.AddrFormatError rather than a generic ValueError? Anyway, given fixing upstream right now and Python stdlib doesn't offer usable IP address validation...

Sniffing the error message is rather fragile, and a last resort. How about:

ip = splits[0].decode('utf-8')
try:
    ip = netaddr.IPAddress(ip)
except (netaddr.AddrFormatError, ValueError) as e:
    [...]

Alternatively, explicitly check if splits[0] is a date before attempting to convert things. This would be easy using a regexp or strptime if the date format is stable.

review: Needs Fixing
Revision history for this message
Joel Sing (jsing) wrote :

LGTM, see nit inline.

review: Approve (+1)
Revision history for this message
🤖 Canonical IS Merge Bot (canonical-is-mergebot) wrote :

Change successfully merged at revision c7cf260897b0e3b3441e812797420e4b2523111a

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1diff --git a/files/anonymize_web_log_ips.py b/files/anonymize_web_log_ips.py
2index fe26128..fa37efc 100755
3--- a/files/anonymize_web_log_ips.py
4+++ b/files/anonymize_web_log_ips.py
5@@ -4,6 +4,7 @@
6 """
7 import argparse
8 import netaddr
9+import re
10 import sys
11
12
13@@ -21,8 +22,13 @@ def anonymize(new_file, log_file, skip_ips=None, skip_private=False):
14 for line in log_file:
15 total += 1
16 splits = line.split()
17+ ip_raw = splits[0].decode('utf-8')
18+ # Work around nginx general access/error logs that emit the date instead of IP as the first column.
19+ if re.match('^\d+/\d+/\d+', ip_raw):
20+ skip += 1
21+ continue
22 try:
23- ip = netaddr.IPAddress(splits[0].decode('utf-8'))
24+ ip = netaddr.IPAddress(ip_raw)
25 except netaddr.AddrFormatError:
26 skip += 1
27 continue

Subscribers

People subscribed via source and target branches