Merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs into lp:percona-toolkit/2.1

Proposed by Baron Schwartz
Status: Merged
Approved by: Daniel Nichter
Approved revision: 226
Merged at revision: 226
Proposed branch: lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs
Merge into: lp:percona-toolkit/2.1
Diff against target: 464 lines (+107/-204)
1 file modified
bin/pt-table-usage (+107/-204)
To merge this branch: bzr merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs
Reviewer Review Type Date Requested Status
Daniel Nichter Approve
Review via email: mp+100264@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Daniel Nichter (daniel-nichter) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bin/pt-table-usage'
2--- bin/pt-table-usage 2012-03-30 22:45:10 +0000
3+++ bin/pt-table-usage 2012-03-31 00:51:21 +0000
4@@ -5864,12 +5864,12 @@
5 # ###########################################################################
6
7 # ###########################################################################
8-# MysqldumpParser package 7500
9+# MysqldumpParser package
10 # This package is a copy without comments from the original. The original
11-# with comments and its test file can be found in the SVN repository at,
12-# trunk/common/MysqldumpParser.pm
13-# trunk/common/t/MysqldumpParser.t
14-# See http://code.google.com/p/maatkit/wiki/Developers for more information.
15+# with comments and its test file can be found in the Bazaar repository at,
16+# lib/MysqldumpParser.pm
17+# t/lib/MysqldumpParser.t
18+# See https://launchpad.net/percona-toolkit for more information.
19 # ###########################################################################
20 package MysqldumpParser;
21
22@@ -5968,12 +5968,12 @@
23 # ###########################################################################
24
25 # ###########################################################################
26-# SchemaQualifier package 7499
27+# SchemaQualifier package
28 # This package is a copy without comments from the original. The original
29 # with comments and its test file can be found in the SVN repository at,
30-# trunk/common/SchemaQualifier.pm
31-# trunk/common/t/SchemaQualifier.t
32-# See http://code.google.com/p/maatkit/wiki/Developers for more information.
33+# lib/SchemaQualifier.pm
34+# t/lib/SchemaQualifier.t
35+# See https://launchpad.net/percona-toolkit for more information.
36 # ###########################################################################
37 package SchemaQualifier;
38
39@@ -6680,24 +6680,22 @@
40
41 =head1 NAME
42
43-pt-table-usage - Read queries from a log and analyze how they use tables.
44+pt-table-usage - Analyze how queries use tables.
45
46 =head1 SYNOPSIS
47
48 Usage: pt-table-usage [OPTIONS] [FILES]
49
50-pt-table-usage reads queries from slow query logs and analyzes how they use
51-tables. If no FILE is specified, STDIN is read. Table usage for every query
52-is printed to STDOUT.
53+pt-table-usage reads queries from a log and analyzes how they use tables. If no
54+FILE is specified, it reads STDIN. It prints a report for each query.
55
56 =head1 RISKS
57
58-pt-table-use is very low risk because it only reads and examines queries from
59-a log and executes C<EXPLAIN EXTENDED> if the L<"--explain-extended"> option
60-is specified.
61+pt-table-use is very low risk. By default, it simply reads queries from a log.
62+It executes C<EXPLAIN EXTENDED> if you specify the L<"--explain-extended">
63+option.
64
65-At the time of this release, there are no known bugs that could cause serious
66-harm to users.
67+At the time of this release, we know of no bugs that could harm users.
68
69 The authoritative source for updated information is always the online issue
70 tracking system. Issues that affect this tool will be marked as such. You can
71@@ -6708,40 +6706,32 @@
72
73 =head1 DESCRIPTION
74
75-pt-table-usage reads queries from slow query logs and analyzes how they use
76-tables. Table usage indicates more than just which tables are read from or
77-written to by the query, it also indicates data flow: data in and data out.
78-Data flow is determined by the contexts in which tables are used by the query.
79-A single table can be used in several different contexts in the same query.
80-The reported table usage for each query lists every context for every table.
81-This CONTEXT-TABLE list tells how and where data flows, i.e. the query's table
82-usage. The L<"OUTPUT"> section lists the possible contexts and describes how
83-to read a table usage report.
84-
85-Since this tool analyzes table usage, it's important that queries use
86-table-qualified columns. If a query uses only one table, then all columns
87-must be from that table and there's no problem. But if a query uses
88-multiple tables and the columns are not table-qualified, then that creates a
89-problem that can only be solved by knowing the query's database and specifying
90-L<"--explain-extended">. If the slow log does not specify the database
91-used by the query, then you can specify a default database with L<"--database">.
92-There is no other way to know or guess the database, so the query will be
93-skipped. Secondly, if the database is known, then specifying
94-L<"--explain-extended"> causes pt-table-usage to do C<EXPLAIN EXTENDED ...>
95-C<SHOW WARNINGS> to get the fully qualified query as reported by MySQL
96-(i.e. all identifiers are fully database- and/or table-qualified). For
97-best results, you should specify L<"--explain-extended"> and
98-L<"--database"> if you know that all queries use the same database.
99-
100-Each query is identified in the output by either an MD5 hex checksum
101-of the query's fingerprint or the query's value for the specified
102-L<"--id-attribute">. The query ID is for parsing and storing the table
103-usage reports in a table that is keyed on the query ID. See L<"OUTPUT">
104-for more information.
105+pt-table-usage reads queries from a log and analyzes how they use tables. The
106+log should be in MySQL's slow query log format.
107+
108+Table usage is more than simply an indication of which tables the query reads or
109+writes. It also indicates data flow: data in and data out. The tool determines
110+the data flow by the contexts in which tables appear. A single query can use a
111+table in several different contexts simultaneously. The tool's output lists
112+every context for every table. This CONTEXT-TABLE list indicates how data flows
113+between tables. The L<"OUTPUT"> section lists the possible contexts and
114+describes how to read a table usage report.
115+
116+The tool analyzes data flow down to the level of individual columns, so it is
117+helpful if columns are identified unambiguously in the query. If a query uses
118+only one table, then all columns must be from that table, and there's no
119+difficulty. But if a query uses multiple tables and the column names are not
120+table-qualified, then it is necessary to use C<EXPLAIN EXTENDED>, followed by
121+C<SHOW WARNINGS>, to determine to which tables the columns belong.
122+
123+If the tool does not know the query's default database, which can occur when the
124+database is not printed in the log, then C<EXPLAIN EXTENDED> can fail. In this
125+case, you can specify a default database with L<"--database">. You can also use
126+the L<"--create-table-definitions"> option to help resolve ambiguities.
127
128 =head1 OUTPUT
129
130-The table usage report that is printed for each query looks similar to the
131+The tool prints a usage report for each table in every query, similar to the
132 following:
133
134 Query_id: 0x1CD27577D202A339.1
135@@ -6758,45 +6748,43 @@
136 JOIN t2
137 WHERE t1
138
139-Usage reports are separated by blank lines. The first line is always the
140-query ID: a unique ID that can be used to parse the output and store the
141-usage reports in a table keyed on this ID. The query ID has two parts
142-separated by a period: the query ID and the target table number.
143-
144-If L<"--id-attribute"> is not specified, then query IDs are automatically
145-created by making an MD5 hex checksum of the query's fingerprint
146-(as shown above, e.g. C<0x1CD27577D202A339>); otherwise, the query ID is the
147-query's value for the given attribute.
148-
149-The target table number starts at 1 and increments by 1 for each table that
150-the query affects. Only multi-table UPDATE queries can affect
151-multiple tables with a single query, so this number is 1 for all other types
152-of queries. (Multi-table DELETE queries are not supported.)
153-The example output above is from this query:
154+The first line contains the query ID, which by default is the same as those
155+shown in pt-query-digest reports. It is an MD5 checksum of the query's
156+"fingerprint," which is what remains after removing literals, collapsing white
157+space, and a variety of other transformations. The query ID has two parts
158+separated by a period: the query ID and the table number. If you wish to use a
159+different value to identify the query, you can specify the L<"--id-attribute">
160+option.
161+
162+The previous example shows two paragraphs for a single query, not two queries.
163+Note that the query ID is identical for the two, but the table number differs.
164+The table number increments by 1 for each table that the query updates. Only
165+multi-table UPDATE queries can update multiple tables with a single query, so
166+the table number is 1 for all other types of queries. (The tool does not
167+support multi-table DELETE queries.) The example output above is from this
168+query:
169
170 UPDATE t1 AS a JOIN t2 AS b USING (id)
171 SET a.foo="bar", b.foo="bat"
172 WHERE a.id=1;
173
174-The C<SET> clause indicates that two tables are updated: C<a> aliased as C<t1>,
175-and C<b> aliased as C<t2>. So two usage reports are printed, one for each
176-table, and this is indicated in the output by their common query ID but
177-incrementing target table number.
178+The C<SET> clause indicates that the query updates two tables: C<a> aliased as
179+C<t1>, and C<b> aliased as C<t2>.
180
181-After the first line is a variable number of CONTEXT-TABLE lines. Possible
182-contexts are:
183+After the first line, the tool prints a variable number of CONTEXT-TABLE lines.
184+Possible contexts are as follows:
185
186 =over
187
188 =item * SELECT
189
190-SELECT means that data is taken out of the table for one of two reasons:
191-to be returned to the user as part of a result set, or to be put into another
192-table as part of an INSERT or UPDATE. In the first case, since only SELECT
193-queries return result sets, a SELECT context is always listed for SELECT
194-queries. In the second case, data from one table is used to insert or
195-update rows in another table. For example, the UPDATE query in the example
196-above has the usage:
197+SELECT means that the query retrieves data from the table for one of two
198+reasons. The first is to be returned to the user as part of a result set. Only
199+SELECT queries return result sets, so the report always shows a SELECT context
200+for SELECT queries.
201+
202+The second case is when data flows to another table as part of an INSERT or
203+UPDATE. For example, the UPDATE query in the example above has the usage:
204
205 SELECT DUAL
206
207@@ -6804,9 +6792,9 @@
208
209 SET a.foo="bar", b.foo="bat"
210
211-DUAL is used for any values that does not originate in a table, in this case the
212-literal values "bar" and "bat". If that C<SET> clause were C<SET a.foo=b.foo>
213-instead, then the complete usage would be:
214+The tool uses DUAL for any values that do not originate in a table, in this case
215+the literal values "bar" and "bat". If that C<SET> clause were C<SET
216+a.foo=b.foo> instead, then the complete usage would be:
217
218 Query_id: 0x1CD27577D202A339.1
219 UPDATE t1
220@@ -6820,20 +6808,15 @@
221 immediately above reflects an UPDATE query that updates rows in table C<t1>
222 with data from table C<t2>.
223
224-=item * Any other query type
225-
226-Any other query type, such as INSERT, UPDATE, DELETE, etc. may be a context.
227-All these types indicate that the table is written or altered in some way.
228-If a SELECT context follows one of these types, then data is read from the
229-SELECT table and written to this table. This happens, for example, with
230-INSERT..SELECT or UPDATE queries that set column values using values from
231-tables instead of constant values.
232-
233-These query types are not supported:
234-
235- SET
236- LOAD
237- multi-table DELETE
238+=item * Any other verb
239+
240+Any other verb, such as INSERT, UPDATE, DELETE, etc. may be a context. These
241+verbs indicate that the query modifies data in some way. If a SELECT context
242+follows one of these verbs, then the query reads data from the SELECT table and
243+writes it to this table. This happens, for example, with INSERT..SELECT or
244+UPDATE queries that use values from tables instead of constant values.
245+
246+These query types are not supported: SET, LOAD, and multi-table DELETE.
247
248 =item * JOIN
249
250@@ -6853,14 +6836,14 @@
251 WHERE t1
252 WHERE t2
253
254-Only unique tables are listed; that is why table C<t1> is listed only once.
255+The tool lists only distinct tables; that is why table C<t1> is listed only
256+once.
257
258 =item * TLIST
259
260-The TLIST context lists tables that are accessed by the query but do not
261-appear in any other context. These tables are usually an implicit
262-full cartesian join, so they should be avoided. For example, the query
263-C<SELECT * FROM t1, t2> results in:
264+The TLIST context lists tables that the query accesses, but which do not appear
265+in any other context. These tables are usually an implicit cartesian join. For
266+example, the query C<SELECT * FROM t1, t2> results in:
267
268 Query_id: 0xBDDEB6EDA41897A8.1
269 SELECT t1
270@@ -6871,7 +6854,7 @@
271 First of all, there are two SELECT contexts, because C<SELECT *> selects
272 rows from all tables; C<t1> and C<t2> in this case. Secondly, the tables
273 are implicitly joined, but without any kind of join condition, which results
274-in a full cartesian join as indicated by the TLIST context for each.
275+in a cartesian join as indicated by the TLIST context for each.
276
277 =back
278
279@@ -6911,24 +6894,23 @@
280
281 type: string; default: DUAL
282
283-Value to print for constant data. Constant data means all data not
284-from tables (or subqueries since subqueries are not supported). For example,
285-real constant values like strings ("foo") and numbers (42), and data from
286-functions like C<NOW()>. For example, in the query
287-C<INSERT INTO t (c) VALUES ('a')>, the string 'a' is constant data, so the
288-table usage report is:
289+Table to print as the source for constant data (literals). This is any data not
290+retrieved from tables (or subqueries, because subqueries are not supported).
291+This includes literal values such as strings ("foo") and numbers (42), or
292+functions such as C<NOW()>. For example, in the query C<INSERT INTO t (c)
293+VALUES ('a')>, the string 'a' is constant data, so the table usage report is:
294
295 INSERT t
296 SELECT DUAL
297
298-The first line indicates that data is inserted into table C<t> and the second
299-line indicates that that data comes from some constant value.
300+The first line indicates that the query inserts data into table C<t>, and the
301+second line indicates that the inserted data comes from some constant value.
302
303 =item --[no]continue-on-error
304
305 default: yes
306
307-Continue parsing even if there is an error.
308+Continue to work even if there is an error.
309
310 =item --create-table-definitions
311
312@@ -6939,9 +6921,9 @@
313 names, you can save the output of C<mysqldump --no-data> to one or more files
314 and specify those files with this option. The tool will parse all
315 C<CREATE TABLE> definitions from the files and use this information to
316-qualify table and column names. If a column name is used in multiple tables,
317-or table name is used in multiple databases, these duplicates cannot be
318-qualified.
319+qualify table and column names. If a column name appears in multiple tables,
320+or a table name appears in multiple databases, the ambiguities cannot be
321+resolved.
322
323 =item --daemonize
324
325@@ -6964,7 +6946,8 @@
326
327 type: DSN
328
329-EXPLAIN EXTENDED queries on this host to fully qualify table and column names.
330+A server to execute EXPLAIN EXTENDED queries. This may be necessary to resolve
331+ambiguous (unqualified) column and table names.
332
333 =item --filter
334
335@@ -6972,89 +6955,13 @@
336
337 Discard events for which this Perl code doesn't return true.
338
339-This option is a string of Perl code or a file containing Perl code that gets
340-compiled into a subroutine with one argument: $event. This is a hashref.
341-If the given value is a readable file, then pt-table-usage reads the entire
342-file and uses its contents as the code. The file should not contain
343-a shebang (#!/usr/bin/perl) line.
344-
345-If the code returns true, the chain of callbacks continues; otherwise it ends.
346-The code is the last statement in the subroutine other than C<return $event>.
347-The subroutine template is:
348-
349- sub { $event = shift; filter && return $event; }
350-
351-Filters given on the command line are wrapped inside parentheses like like
352-C<( filter )>. For complex, multi-line filters, you must put the code inside
353-a file so it will not be wrapped inside parentheses. Either way, the filter
354-must produce syntactically valid code given the template. For example, an
355-if-else branch given on the command line would not be valid:
356-
357- --filter 'if () { } else { }' # WRONG
358-
359-Since it's given on the command line, the if-else branch would be wrapped inside
360-parentheses which is not syntactically valid. So to accomplish something more
361-complex like this would require putting the code in a file, for example
362-filter.txt:
363-
364- my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
365-
366-Then specify C<--filter filter.txt> to read the code from filter.txt.
367-
368-If the filter code won't compile, pt-table-usage will die with an error.
369-If the filter code does compile, an error may still occur at runtime if the
370-code tries to do something wrong (like pattern match an undefined value).
371-pt-table-usage does not provide any safeguards so code carefully!
372-
373-An example filter that discards everything but SELECT statements:
374-
375- --filter '$event->{arg} =~ m/^select/i'
376-
377-This is compiled into a subroutine like the following:
378-
379- sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
380-
381-It is permissible for the code to have side effects (to alter C<$event>).
382-
383-You can find an explanation of the structure of $event at
384-L<http://code.google.com/p/maatkit/wiki/EventAttributes>.
385-
386-Here are more examples of filter code:
387-
388-=over
389-
390-=item Host/IP matches domain.com
391-
392---filter '($event->{host} || $event->{ip} || "") =~ m/domain.com/'
393-
394-Sometimes MySQL logs the host where the IP is expected. Therefore, we
395-check both.
396-
397-=item User matches john
398-
399---filter '($event->{user} || "") =~ m/john/'
400-
401-=item More than 1 warning
402-
403---filter '($event->{Warning_count} || 0) > 1'
404-
405-=item Query does full table scan or full join
406-
407---filter '(($event->{Full_scan} || "") eq "Yes") || (($event->{Full_join} || "") eq "Yes")'
408-
409-=item Query was not served from query cache
410-
411---filter '($event->{QC_Hit} || "") eq "No"'
412-
413-=item Query is 1 MB or larger
414-
415---filter '$event->{bytes} >= 1_048_576'
416-
417-=back
418-
419-Since L<"--filter"> allows you to alter C<$event>, you can use it to do other
420-things, like create new attributes.
421-
422+This option is a string of Perl code or a file containing Perl code that is
423+compiled into a subroutine with one argument: $event. If the given value is a
424+readable file, then pt-table-usage reads the entire file and uses its contents
425+as the code.
426+
427+Filters are implemented in the same fashion as in the pt-query-digest tool, so
428+please refer to its documentation for more information.
429
430 =item --help
431
432@@ -7070,9 +6977,8 @@
433
434 type: string
435
436-Identify each event using this attribute. If not ID attribute is given, then
437-events are identified with the query's checksum: an MD5 hex checksum of the
438-query's fingerprint.
439+Identify each event using this attribute. The default is to use a query ID,
440+which is an MD5 checksum of the query's fingerprint.
441
442 =item --log
443
444@@ -7115,10 +7021,7 @@
445
446 type: string
447
448-Analyze only this given query. If you want to analyze the table usage of
449-one simple query by providing on the command line instead of reading it
450-from a slow log file, then specify that query with this option. The default
451-L<"--id-attribute"> will be used which is the query's checksum.
452+Analyze the specified query instead of reading a log file.
453
454 =item --read-timeout
455
456@@ -7127,7 +7030,7 @@
457 Wait this long for an event from the input; 0 to wait forever.
458
459 This option sets the maximum time to wait for an event from the input. If an
460-event is not received after the specified time, the script stops reading the
461+event is not received after the specified time, the tool stops reading the
462 input and prints its reports.
463
464 This option requires the Perl POSIX module.

Subscribers

People subscribed via source and target branches