Merge lp:~allenap/maas/make-web-access-logs-prettier into lp:~maas-committers/maas/trunk

Proposed by Gavin Panella
Status: Merged
Approved by: Gavin Panella
Approved revision: no longer in the source branch.
Merged at revision: 5897
Proposed branch: lp:~allenap/maas/make-web-access-logs-prettier
Merge into: lp:~maas-committers/maas/trunk
Diff against target: 366 lines (+207/-6)
6 files modified
src/maasserver/tests/test_webapp.py (+5/-2)
src/maasserver/webapp.py (+5/-2)
src/provisioningserver/rackdservices/image.py (+2/-1)
src/provisioningserver/tests/test_plugin.py (+4/-0)
src/provisioningserver/utils/tests/test_twisted.py (+124/-1)
src/provisioningserver/utils/twisted.py (+67/-0)
To merge this branch: bzr merge lp:~allenap/maas/make-web-access-logs-prettier
Reviewer Review Type Date Requested Status
Mike Pontillo (community) Approve
Review via email: mp+321495@code.launchpad.net

Commit message

Make web access log lines prettier and more readable.

Description of the change

Transforms:

  2017-03-30 23:01:26 regiond: [info] ::ffff:192.168.1.3 - - [30/Mar/2017:22:01:26 +0000] "GET /MAAS/rpc/ HTTP/1.0" 200 208 "-" "provisioningserver.rpc.clusterservice.ClusterClientService"

into:

  2017-03-30 23:01:27 regiond: [info] 192.168.1.3 GET /MAAS/rpc/ HTTP/1.0 --> 200 OK (referrer: -; agent: provisioningserver.rpc.clusterservice.ClusterClientService)

i.e. gets rid of the duplicated timestamp, reorders stuff a bit, and un-IPv6-ifies IPv4 addresses.

To post a comment you must log in.
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I'm on the fence about this one. It looks like our previous logging was /almost/ in the "standard" HTTP log format[1], which has some utility: our current-format might make it easy to analyze the logs with any standard HTTP log analyzer. (ok, granted, maybe after a little grep/sed trickery.)

The best part of this change, to me, is the removal of the IPv4-mapped IPv6 address. That alone makes the log so much better already.

Thoughts?

[1]:
https://en.wikipedia.org/wiki/Common_Log_Format

review: Needs Information
Revision history for this message
Gavin Panella (allenap) wrote :

There is utility in the combined log format, which is what Twisted claimed it was producing, but that's reduced because we're publishing it to a "regular" log, where a namespace and level are added, and an additional timestamp. Tools designed for combined logs can't operate on it without pre-processing. In 5 years we haven't shown an interest in analysing this stuff, and there's not much HTTP traffic anyway: the UI is mostly WebSocket traffic and that is not logged. Then, most HTTP traffic is to /MAAS/rpc/ by racks.

To keep access logs for analysis we should pipe them to their own file. t.w.server.Site would prefer to do this: pushing access logs out via the "normal" loggers is a getting-started kind of mode.

But I avoided this because it goes close to the existing can of logging worms (i.e. logging both to regiond.log and journald), because it would need fiddly packaging work (inc. permissions, rotation), and because no one is yet asking for it.

Or, we could override Site.log to publish structured logs from which we can extract, for example, request errors, and produce some kind of in-app report.

In short, here I've tried to optimise for human consumption. We're not shutting the door to other, e.g. machine readable, options later on if we find a need.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Sounds good. +1 from me.

review: Approve
Revision history for this message
Gavin Panella (allenap) wrote :

Thanks!

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'src/maasserver/tests/test_webapp.py'
2--- src/maasserver/tests/test_webapp.py 2017-03-03 17:20:13 +0000
3+++ src/maasserver/tests/test_webapp.py 2017-03-30 22:13:21 +0000
4@@ -23,6 +23,7 @@
5 from maastesting.matchers import MockCalledOnceWith
6 from maastesting.testcase import MAASTestCase
7 from maastesting.twisted import TwistedLoggerFixture
8+from provisioningserver.utils.twisted import reducedWebLogFormatter
9 from testtools.matchers import (
10 Equals,
11 Is,
12@@ -205,8 +206,10 @@
13 def test__init_creates_site(self):
14 service = self.make_webapp()
15 self.assertThat(service.site, IsInstance(Site))
16- self.assertThat(
17- service.site.requestFactory, Is(webapp.CleanPathRequest))
18+ self.assertThat(service.site, MatchesStructure(
19+ requestFactory=Is(webapp.CleanPathRequest),
20+ _logFormatter=Is(reducedWebLogFormatter),
21+ ))
22 self.assertThat(service.websocket, IsInstance(WebSocketFactory))
23
24 def test__default_site_renders_starting_page(self):
25
26=== modified file 'src/maasserver/webapp.py'
27--- src/maasserver/webapp.py 2017-03-03 17:07:03 +0000
28+++ src/maasserver/webapp.py 2017-03-30 22:13:21 +0000
29@@ -25,6 +25,7 @@
30 from provisioningserver.logger import LegacyLogger
31 from provisioningserver.utils.twisted import (
32 asynchronous,
33+ reducedWebLogFormatter,
34 ThreadPoolLimiter,
35 )
36 from twisted.application.internet import StreamServerEndpointService
37@@ -184,7 +185,8 @@
38 """
39
40 def __init__(self, endpoint, listener, status_worker):
41- self.site = OverlaySite(StartPage())
42+ self.site = OverlaySite(
43+ StartPage(), logFormatter=reducedWebLogFormatter)
44 self.site.requestFactory = CleanPathRequest
45 super(WebApplicationService, self).__init__(endpoint, self.site)
46 self.websocket = WebSocketFactory(listener)
47@@ -235,7 +237,8 @@
48 WSGIResource(reactor, self.threadpool, application))
49 underlay_root = Resource()
50 underlay_root.putChild(b'MAAS', underlay_maas)
51- underlay_site = Site(underlay_root)
52+ underlay_site = Site(
53+ underlay_root, logFormatter=reducedWebLogFormatter)
54 underlay_site.requestFactory = CleanPathRequest
55
56 # Setup the main resource as the twisted handler and the underlay
57
58=== modified file 'src/provisioningserver/rackdservices/image.py'
59--- src/provisioningserver/rackdservices/image.py 2015-12-16 14:24:50 +0000
60+++ src/provisioningserver/rackdservices/image.py 2017-03-30 22:13:21 +0000
61@@ -7,6 +7,7 @@
62 "BootImageEndpointService",
63 ]
64
65+from provisioningserver.utils.twisted import reducedWebLogFormatter
66 from twisted.application.internet import StreamServerEndpointService
67 from twisted.web.resource import Resource
68 from twisted.web.server import Site
69@@ -28,5 +29,5 @@
70 """
71 resource = Resource()
72 resource.putChild(b'images', File(resource_root))
73- self.site = Site(resource)
74+ self.site = Site(resource, logFormatter=reducedWebLogFormatter)
75 super(BootImageEndpointService, self).__init__(endpoint, self.site)
76
77=== modified file 'src/provisioningserver/tests/test_plugin.py'
78--- src/provisioningserver/tests/test_plugin.py 2017-01-28 00:51:47 +0000
79+++ src/provisioningserver/tests/test_plugin.py 2017-03-30 22:13:21 +0000
80@@ -46,10 +46,12 @@
81 )
82 from provisioningserver.rackdservices.tftp_offload import TFTPOffloadService
83 from provisioningserver.testing.config import ClusterConfigurationFixture
84+from provisioningserver.utils.twisted import reducedWebLogFormatter
85 from testtools.matchers import (
86 AfterPreprocessing,
87 Contains,
88 Equals,
89+ Is,
90 IsInstance,
91 KeysEqual,
92 MatchesAll,
93@@ -206,6 +208,8 @@
94 image_service = service.getServiceNamed("image_service")
95 self.assertIsInstance(image_service, BootImageEndpointService)
96 self.assertIsInstance(image_service.site, Site)
97+ self.assertThat(image_service.site, MatchesStructure(
98+ _logFormatter=Is(reducedWebLogFormatter)))
99 resource = image_service.site.resource
100 root = resource.getChildWithDefault(b"images", request=None)
101 self.assertThat(root, IsInstance(FilePath))
102
103=== modified file 'src/provisioningserver/utils/tests/test_twisted.py'
104--- src/provisioningserver/utils/tests/test_twisted.py 2017-03-24 15:28:07 +0000
105+++ src/provisioningserver/utils/tests/test_twisted.py 2017-03-30 22:13:21 +0000
106@@ -63,6 +63,7 @@
107 LONGTIME,
108 makeDeferredWithProcessProtocol,
109 pause,
110+ reducedWebLogFormatter,
111 retries,
112 RPCFetcher,
113 synchronous,
114@@ -71,6 +72,7 @@
115 ThreadPoolLimiter,
116 ThreadUnpool,
117 )
118+from testscenarios import multiply_scenarios
119 from testtools.content import content_from_stream
120 from testtools.deferredruntest import assert_fails_with
121 from testtools.matchers import (
122@@ -88,7 +90,10 @@
123 Raises,
124 )
125 from testtools.testcase import ExpectedException
126-from twisted.internet import reactor
127+from twisted.internet import (
128+ address,
129+ reactor,
130+)
131 from twisted.internet.defer import (
132 AlreadyCalledError,
133 CancelledError,
134@@ -112,6 +117,7 @@
135 threadable,
136 )
137 from twisted.python.failure import Failure
138+from twisted.web.test import requesthelper
139
140
141 def return_args(*args, **kwargs):
142@@ -1883,3 +1889,120 @@
143 ))
144 self.assertThat(
145 twisted_module._os_killpg, MockNotCalled())
146+
147+
148+class TestReducedWebLogFormatter(MAASTestCase):
149+ """Tests for `reducedWebLogFormatter`."""
150+
151+ ipv4_address = factory.make_ipv4_address()
152+ ipv6_address = factory.make_ipv6_address()
153+ simple_http_url = factory.make_simple_http_url()
154+ simple_uri = factory.make_absolute_path()
155+ agent_name = factory.make_name("agent")
156+ status = factory.make_status_code()
157+
158+ scenarios_methods = (
159+ ("no-method", {"method": None, "method_expected": "???"}),
160+ ("method", {"method": "GET", "method_expected": "GET"}),
161+ )
162+
163+ scenarios_clients = (
164+ ("no-client", {
165+ "client": None,
166+ "client_expected": "-",
167+ }),
168+ ("ipv4-client", {
169+ "client": ipv4_address,
170+ "client_expected": ipv4_address,
171+ }),
172+ ("ipv6-client", {
173+ "client": ipv6_address,
174+ "client_expected": ipv6_address,
175+ }),
176+ ("ipv4-mapped-client", {
177+ "client": "::ffff:" + ipv4_address,
178+ "client_expected": ipv4_address,
179+ }),
180+ )
181+
182+ scenarios_referrers = (
183+ ("no-referrer", {
184+ "referrer": None,
185+ "referrer_expected": "-",
186+ }),
187+ ("referrer", {
188+ "referrer": simple_http_url,
189+ "referrer_expected": simple_http_url,
190+ }),
191+ )
192+
193+ scenarios_agents = (
194+ ("no-agent", {
195+ "agent": None,
196+ "agent_expected": "-",
197+ }),
198+ ("agent", {
199+ "agent": agent_name,
200+ "agent_expected": agent_name,
201+ }),
202+ )
203+
204+ scenarios_uris = (
205+ ("no-uri", {
206+ "uri": None,
207+ "uri_expected": "-",
208+ }),
209+ ("uri", {
210+ "uri": simple_uri,
211+ "uri_expected": simple_uri,
212+ }),
213+ )
214+
215+ scenarios_statuses = (
216+ ("no-status", {
217+ "status": None,
218+ "status_expected": "???",
219+ }),
220+ ("status", {
221+ "status": status.value,
222+ "status_expected": "%d %s" % (status.value, status.name),
223+ }),
224+ ("status-unknown", {
225+ "status": 678,
226+ "status_expected": "678",
227+ }),
228+ )
229+
230+ scenarios_types = (
231+ ("plain", {"prep": lambda string: string}),
232+ ("bytes", {"prep": lambda string: (
233+ None if string is None else string.encode("ascii"))}),
234+ )
235+
236+ scenarios = multiply_scenarios(
237+ scenarios_methods, scenarios_clients, scenarios_referrers,
238+ scenarios_agents, scenarios_uris, scenarios_statuses, scenarios_types,
239+ )
240+
241+ def test__renders_full_request(self):
242+ request = requesthelper.DummyRequest("foo/bar")
243+ request.method = self.prep(self.method)
244+ request.client = address.IPv4Address(
245+ "TCP", self.prep(self.client), 12345)
246+ request.requestHeaders.addRawHeader(
247+ "referer", self.prep(self.referrer))
248+ request.requestHeaders.addRawHeader(
249+ "user-agent", self.prep(self.agent))
250+ request.uri = self.prep(self.uri)
251+ request.code = self.status
252+
253+ self.assertThat(
254+ reducedWebLogFormatter(sentinel.timestamp, request),
255+ Equals(
256+ "%s %s %s HTTP/1.0 --> %s (referrer: %s; agent: %s)" % (
257+ self.client_expected, self.method_expected,
258+ self.uri_expected, self.status_expected,
259+ self.referrer_expected, self.agent_expected,
260+ )
261+ ),
262+ )
263
264=== modified file 'src/provisioningserver/utils/twisted.py'
265--- src/provisioningserver/utils/twisted.py 2017-03-24 15:28:07 +0000
266+++ src/provisioningserver/utils/twisted.py 2017-03-30 22:13:21 +0000
267@@ -20,6 +20,7 @@
268 'LONGTIME',
269 'makeDeferredWithProcessProtocol',
270 'pause',
271+ 'reducedWebLogFormatter',
272 'retries',
273 'synchronous',
274 'ThreadPool',
275@@ -35,6 +36,7 @@
276 partial,
277 wraps,
278 )
279+from http import HTTPStatus
280 from itertools import (
281 chain,
282 repeat,
283@@ -50,6 +52,10 @@
284 import threading
285
286 from crochet import run_in_reactor
287+from netaddr import (
288+ AddrFormatError,
289+ IPAddress,
290+)
291 from provisioningserver.logger import LegacyLogger
292 from twisted.internet import reactor
293 from twisted.internet.defer import (
294@@ -70,7 +76,9 @@
295 )
296 from twisted.python.failure import Failure
297 from twisted.python.threadable import isInIOThread
298+from twisted.web.iweb import IAccessLogFormatter
299 from zope import interface
300+from zope.interface import provider
301
302
303 log = LegacyLogger()
304@@ -1043,3 +1051,62 @@
305 killer.cancel()
306
307 done.addBoth(callOut, ended)
308+
309+
310+@provider(IAccessLogFormatter)
311+def reducedWebLogFormatter(timestamp, request):
312+ """Return a reduced formatted log line for the given request.
313+
314+ The `timestamp` argument is ignored. The line returned is expected to be
315+ sent out by a logger which will add its own timestamp, so this one is
316+ superfluous.
317+
318+ :see: `IAccessLogFormatter`
319+ :see: `combinedLogFormatter`
320+ """
321+ template = (
322+ "{origin} {method} {uri} {proto} --> {status} "
323+ "(referrer: {referrer}; agent: {agent})"
324+ )
325+
326+ def field(value, default):
327+ if value is None or len(value) == 0 or value.isspace():
328+ return default
329+ elif isinstance(value, bytes):
330+ return value.decode("ascii", "replace")
331+ else:
332+ return value
333+
334+ def normaliseAddress(address):
335+ """Normalise an IP address."""
336+ try:
337+ address = IPAddress(address)
338+ except AddrFormatError:
339+ return address # Hostname?
340+ else:
341+ if address.is_ipv4_mapped():
342+ return address.ipv4()
343+ else:
344+ return address
345+
346+ def describeHttpStatus(code):
347+ try:
348+ code = HTTPStatus(code)
349+ except ValueError:
350+ if isinstance(code, int):
351+ return str(code)
352+ else:
353+ return "???"
354+ else:
355+ return "{code.value:d} {code.name}".format(code=code)
356+
357+ origin = field(request.getClientIP(), None)
358+ origin = "-" if origin is None else normaliseAddress(origin)
359+
360+ return template.format(
361+ referrer=field(request.getHeader(b"referer"), "-"),
362+ agent=field(request.getHeader(b"user-agent"), "-"),
363+ status=describeHttpStatus(request.code), origin=origin,
364+ method=field(request.method, "???"), uri=field(request.uri, "-"),
365+ proto=field(request.clientproto, "-"),
366+ )