Merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs into lp:percona-toolkit/2.1
- pt-table-usage-docs
- Merge into 2.1
Proposed by
Baron Schwartz
Status: | Merged |
---|---|
Approved by: | Daniel Nichter |
Approved revision: | 226 |
Merged at revision: | 226 |
Proposed branch: | lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs |
Merge into: | lp:percona-toolkit/2.1 |
Diff against target: |
464 lines (+107/-204) 1 file modified
bin/pt-table-usage (+107/-204) |
To merge this branch: | bzr merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs |
Related bugs: | |
Related blueprints: |
Add pt-table-usage
(Medium)
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Daniel Nichter | Approve | ||
Review via email:
|
Commit message
Description of the change
To post a comment you must log in.
Revision history for this message

Daniel Nichter (daniel-nichter) : | # |
review:
Approve
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'bin/pt-table-usage' |
2 | --- bin/pt-table-usage 2012-03-30 22:45:10 +0000 |
3 | +++ bin/pt-table-usage 2012-03-31 00:51:21 +0000 |
4 | @@ -5864,12 +5864,12 @@ |
5 | # ########################################################################### |
6 | |
7 | # ########################################################################### |
8 | -# MysqldumpParser package 7500 |
9 | +# MysqldumpParser package |
10 | # This package is a copy without comments from the original. The original |
11 | -# with comments and its test file can be found in the SVN repository at, |
12 | -# trunk/common/MysqldumpParser.pm |
13 | -# trunk/common/t/MysqldumpParser.t |
14 | -# See http://code.google.com/p/maatkit/wiki/Developers for more information. |
15 | +# with comments and its test file can be found in the Bazaar repository at, |
16 | +# lib/MysqldumpParser.pm |
17 | +# t/lib/MysqldumpParser.t |
18 | +# See https://launchpad.net/percona-toolkit for more information. |
19 | # ########################################################################### |
20 | package MysqldumpParser; |
21 | |
22 | @@ -5968,12 +5968,12 @@ |
23 | # ########################################################################### |
24 | |
25 | # ########################################################################### |
26 | -# SchemaQualifier package 7499 |
27 | +# SchemaQualifier package |
28 | # This package is a copy without comments from the original. The original |
29 | # with comments and its test file can be found in the SVN repository at, |
30 | -# trunk/common/SchemaQualifier.pm |
31 | -# trunk/common/t/SchemaQualifier.t |
32 | -# See http://code.google.com/p/maatkit/wiki/Developers for more information. |
33 | +# lib/SchemaQualifier.pm |
34 | +# t/lib/SchemaQualifier.t |
35 | +# See https://launchpad.net/percona-toolkit for more information. |
36 | # ########################################################################### |
37 | package SchemaQualifier; |
38 | |
39 | @@ -6680,24 +6680,22 @@ |
40 | |
41 | =head1 NAME |
42 | |
43 | -pt-table-usage - Read queries from a log and analyze how they use tables. |
44 | +pt-table-usage - Analyze how queries use tables. |
45 | |
46 | =head1 SYNOPSIS |
47 | |
48 | Usage: pt-table-usage [OPTIONS] [FILES] |
49 | |
50 | -pt-table-usage reads queries from slow query logs and analyzes how they use |
51 | -tables. If no FILE is specified, STDIN is read. Table usage for every query |
52 | -is printed to STDOUT. |
53 | +pt-table-usage reads queries from a log and analyzes how they use tables. If no |
54 | +FILE is specified, it reads STDIN. It prints a report for each query. |
55 | |
56 | =head1 RISKS |
57 | |
58 | -pt-table-use is very low risk because it only reads and examines queries from |
59 | -a log and executes C<EXPLAIN EXTENDED> if the L<"--explain-extended"> option |
60 | -is specified. |
61 | +pt-table-use is very low risk. By default, it simply reads queries from a log. |
62 | +It executes C<EXPLAIN EXTENDED> if you specify the L<"--explain-extended"> |
63 | +option. |
64 | |
65 | -At the time of this release, there are no known bugs that could cause serious |
66 | -harm to users. |
67 | +At the time of this release, we know of no bugs that could harm users. |
68 | |
69 | The authoritative source for updated information is always the online issue |
70 | tracking system. Issues that affect this tool will be marked as such. You can |
71 | @@ -6708,40 +6706,32 @@ |
72 | |
73 | =head1 DESCRIPTION |
74 | |
75 | -pt-table-usage reads queries from slow query logs and analyzes how they use |
76 | -tables. Table usage indicates more than just which tables are read from or |
77 | -written to by the query, it also indicates data flow: data in and data out. |
78 | -Data flow is determined by the contexts in which tables are used by the query. |
79 | -A single table can be used in several different contexts in the same query. |
80 | -The reported table usage for each query lists every context for every table. |
81 | -This CONTEXT-TABLE list tells how and where data flows, i.e. the query's table |
82 | -usage. The L<"OUTPUT"> section lists the possible contexts and describes how |
83 | -to read a table usage report. |
84 | - |
85 | -Since this tool analyzes table usage, it's important that queries use |
86 | -table-qualified columns. If a query uses only one table, then all columns |
87 | -must be from that table and there's no problem. But if a query uses |
88 | -multiple tables and the columns are not table-qualified, then that creates a |
89 | -problem that can only be solved by knowing the query's database and specifying |
90 | -L<"--explain-extended">. If the slow log does not specify the database |
91 | -used by the query, then you can specify a default database with L<"--database">. |
92 | -There is no other way to know or guess the database, so the query will be |
93 | -skipped. Secondly, if the database is known, then specifying |
94 | -L<"--explain-extended"> causes pt-table-usage to do C<EXPLAIN EXTENDED ...> |
95 | -C<SHOW WARNINGS> to get the fully qualified query as reported by MySQL |
96 | -(i.e. all identifiers are fully database- and/or table-qualified). For |
97 | -best results, you should specify L<"--explain-extended"> and |
98 | -L<"--database"> if you know that all queries use the same database. |
99 | - |
100 | -Each query is identified in the output by either an MD5 hex checksum |
101 | -of the query's fingerprint or the query's value for the specified |
102 | -L<"--id-attribute">. The query ID is for parsing and storing the table |
103 | -usage reports in a table that is keyed on the query ID. See L<"OUTPUT"> |
104 | -for more information. |
105 | +pt-table-usage reads queries from a log and analyzes how they use tables. The |
106 | +log should be in MySQL's slow query log format. |
107 | + |
108 | +Table usage is more than simply an indication of which tables the query reads or |
109 | +writes. It also indicates data flow: data in and data out. The tool determines |
110 | +the data flow by the contexts in which tables appear. A single query can use a |
111 | +table in several different contexts simultaneously. The tool's output lists |
112 | +every context for every table. This CONTEXT-TABLE list indicates how data flows |
113 | +between tables. The L<"OUTPUT"> section lists the possible contexts and |
114 | +describes how to read a table usage report. |
115 | + |
116 | +The tool analyzes data flow down to the level of individual columns, so it is |
117 | +helpful if columns are identified unambiguously in the query. If a query uses |
118 | +only one table, then all columns must be from that table, and there's no |
119 | +difficulty. But if a query uses multiple tables and the column names are not |
120 | +table-qualified, then it is necessary to use C<EXPLAIN EXTENDED>, followed by |
121 | +C<SHOW WARNINGS>, to determine to which tables the columns belong. |
122 | + |
123 | +If the tool does not know the query's default database, which can occur when the |
124 | +database is not printed in the log, then C<EXPLAIN EXTENDED> can fail. In this |
125 | +case, you can specify a default database with L<"--database">. You can also use |
126 | +the L<"--create-table-definitions"> option to help resolve ambiguities. |
127 | |
128 | =head1 OUTPUT |
129 | |
130 | -The table usage report that is printed for each query looks similar to the |
131 | +The tool prints a usage report for each table in every query, similar to the |
132 | following: |
133 | |
134 | Query_id: 0x1CD27577D202A339.1 |
135 | @@ -6758,45 +6748,43 @@ |
136 | JOIN t2 |
137 | WHERE t1 |
138 | |
139 | -Usage reports are separated by blank lines. The first line is always the |
140 | -query ID: a unique ID that can be used to parse the output and store the |
141 | -usage reports in a table keyed on this ID. The query ID has two parts |
142 | -separated by a period: the query ID and the target table number. |
143 | - |
144 | -If L<"--id-attribute"> is not specified, then query IDs are automatically |
145 | -created by making an MD5 hex checksum of the query's fingerprint |
146 | -(as shown above, e.g. C<0x1CD27577D202A339>); otherwise, the query ID is the |
147 | -query's value for the given attribute. |
148 | - |
149 | -The target table number starts at 1 and increments by 1 for each table that |
150 | -the query affects. Only multi-table UPDATE queries can affect |
151 | -multiple tables with a single query, so this number is 1 for all other types |
152 | -of queries. (Multi-table DELETE queries are not supported.) |
153 | -The example output above is from this query: |
154 | +The first line contains the query ID, which by default is the same as those |
155 | +shown in pt-query-digest reports. It is an MD5 checksum of the query's |
156 | +"fingerprint," which is what remains after removing literals, collapsing white |
157 | +space, and a variety of other transformations. The query ID has two parts |
158 | +separated by a period: the query ID and the table number. If you wish to use a |
159 | +different value to identify the query, you can specify the L<"--id-attribute"> |
160 | +option. |
161 | + |
162 | +The previous example shows two paragraphs for a single query, not two queries. |
163 | +Note that the query ID is identical for the two, but the table number differs. |
164 | +The table number increments by 1 for each table that the query updates. Only |
165 | +multi-table UPDATE queries can update multiple tables with a single query, so |
166 | +the table number is 1 for all other types of queries. (The tool does not |
167 | +support multi-table DELETE queries.) The example output above is from this |
168 | +query: |
169 | |
170 | UPDATE t1 AS a JOIN t2 AS b USING (id) |
171 | SET a.foo="bar", b.foo="bat" |
172 | WHERE a.id=1; |
173 | |
174 | -The C<SET> clause indicates that two tables are updated: C<a> aliased as C<t1>, |
175 | -and C<b> aliased as C<t2>. So two usage reports are printed, one for each |
176 | -table, and this is indicated in the output by their common query ID but |
177 | -incrementing target table number. |
178 | +The C<SET> clause indicates that the query updates two tables: C<a> aliased as |
179 | +C<t1>, and C<b> aliased as C<t2>. |
180 | |
181 | -After the first line is a variable number of CONTEXT-TABLE lines. Possible |
182 | -contexts are: |
183 | +After the first line, the tool prints a variable number of CONTEXT-TABLE lines. |
184 | +Possible contexts are as follows: |
185 | |
186 | =over |
187 | |
188 | =item * SELECT |
189 | |
190 | -SELECT means that data is taken out of the table for one of two reasons: |
191 | -to be returned to the user as part of a result set, or to be put into another |
192 | -table as part of an INSERT or UPDATE. In the first case, since only SELECT |
193 | -queries return result sets, a SELECT context is always listed for SELECT |
194 | -queries. In the second case, data from one table is used to insert or |
195 | -update rows in another table. For example, the UPDATE query in the example |
196 | -above has the usage: |
197 | +SELECT means that the query retrieves data from the table for one of two |
198 | +reasons. The first is to be returned to the user as part of a result set. Only |
199 | +SELECT queries return result sets, so the report always shows a SELECT context |
200 | +for SELECT queries. |
201 | + |
202 | +The second case is when data flows to another table as part of an INSERT or |
203 | +UPDATE. For example, the UPDATE query in the example above has the usage: |
204 | |
205 | SELECT DUAL |
206 | |
207 | @@ -6804,9 +6792,9 @@ |
208 | |
209 | SET a.foo="bar", b.foo="bat" |
210 | |
211 | -DUAL is used for any values that does not originate in a table, in this case the |
212 | -literal values "bar" and "bat". If that C<SET> clause were C<SET a.foo=b.foo> |
213 | -instead, then the complete usage would be: |
214 | +The tool uses DUAL for any values that do not originate in a table, in this case |
215 | +the literal values "bar" and "bat". If that C<SET> clause were C<SET |
216 | +a.foo=b.foo> instead, then the complete usage would be: |
217 | |
218 | Query_id: 0x1CD27577D202A339.1 |
219 | UPDATE t1 |
220 | @@ -6820,20 +6808,15 @@ |
221 | immediately above reflects an UPDATE query that updates rows in table C<t1> |
222 | with data from table C<t2>. |
223 | |
224 | -=item * Any other query type |
225 | - |
226 | -Any other query type, such as INSERT, UPDATE, DELETE, etc. may be a context. |
227 | -All these types indicate that the table is written or altered in some way. |
228 | -If a SELECT context follows one of these types, then data is read from the |
229 | -SELECT table and written to this table. This happens, for example, with |
230 | -INSERT..SELECT or UPDATE queries that set column values using values from |
231 | -tables instead of constant values. |
232 | - |
233 | -These query types are not supported: |
234 | - |
235 | - SET |
236 | - LOAD |
237 | - multi-table DELETE |
238 | +=item * Any other verb |
239 | + |
240 | +Any other verb, such as INSERT, UPDATE, DELETE, etc. may be a context. These |
241 | +verbs indicate that the query modifies data in some way. If a SELECT context |
242 | +follows one of these verbs, then the query reads data from the SELECT table and |
243 | +writes it to this table. This happens, for example, with INSERT..SELECT or |
244 | +UPDATE queries that use values from tables instead of constant values. |
245 | + |
246 | +These query types are not supported: SET, LOAD, and multi-table DELETE. |
247 | |
248 | =item * JOIN |
249 | |
250 | @@ -6853,14 +6836,14 @@ |
251 | WHERE t1 |
252 | WHERE t2 |
253 | |
254 | -Only unique tables are listed; that is why table C<t1> is listed only once. |
255 | +The tool lists only distinct tables; that is why table C<t1> is listed only |
256 | +once. |
257 | |
258 | =item * TLIST |
259 | |
260 | -The TLIST context lists tables that are accessed by the query but do not |
261 | -appear in any other context. These tables are usually an implicit |
262 | -full cartesian join, so they should be avoided. For example, the query |
263 | -C<SELECT * FROM t1, t2> results in: |
264 | +The TLIST context lists tables that the query accesses, but which do not appear |
265 | +in any other context. These tables are usually an implicit cartesian join. For |
266 | +example, the query C<SELECT * FROM t1, t2> results in: |
267 | |
268 | Query_id: 0xBDDEB6EDA41897A8.1 |
269 | SELECT t1 |
270 | @@ -6871,7 +6854,7 @@ |
271 | First of all, there are two SELECT contexts, because C<SELECT *> selects |
272 | rows from all tables; C<t1> and C<t2> in this case. Secondly, the tables |
273 | are implicitly joined, but without any kind of join condition, which results |
274 | -in a full cartesian join as indicated by the TLIST context for each. |
275 | +in a cartesian join as indicated by the TLIST context for each. |
276 | |
277 | =back |
278 | |
279 | @@ -6911,24 +6894,23 @@ |
280 | |
281 | type: string; default: DUAL |
282 | |
283 | -Value to print for constant data. Constant data means all data not |
284 | -from tables (or subqueries since subqueries are not supported). For example, |
285 | -real constant values like strings ("foo") and numbers (42), and data from |
286 | -functions like C<NOW()>. For example, in the query |
287 | -C<INSERT INTO t (c) VALUES ('a')>, the string 'a' is constant data, so the |
288 | -table usage report is: |
289 | +Table to print as the source for constant data (literals). This is any data not |
290 | +retrieved from tables (or subqueries, because subqueries are not supported). |
291 | +This includes literal values such as strings ("foo") and numbers (42), or |
292 | +functions such as C<NOW()>. For example, in the query C<INSERT INTO t (c) |
293 | +VALUES ('a')>, the string 'a' is constant data, so the table usage report is: |
294 | |
295 | INSERT t |
296 | SELECT DUAL |
297 | |
298 | -The first line indicates that data is inserted into table C<t> and the second |
299 | -line indicates that that data comes from some constant value. |
300 | +The first line indicates that the query inserts data into table C<t>, and the |
301 | +second line indicates that the inserted data comes from some constant value. |
302 | |
303 | =item --[no]continue-on-error |
304 | |
305 | default: yes |
306 | |
307 | -Continue parsing even if there is an error. |
308 | +Continue to work even if there is an error. |
309 | |
310 | =item --create-table-definitions |
311 | |
312 | @@ -6939,9 +6921,9 @@ |
313 | names, you can save the output of C<mysqldump --no-data> to one or more files |
314 | and specify those files with this option. The tool will parse all |
315 | C<CREATE TABLE> definitions from the files and use this information to |
316 | -qualify table and column names. If a column name is used in multiple tables, |
317 | -or table name is used in multiple databases, these duplicates cannot be |
318 | -qualified. |
319 | +qualify table and column names. If a column name appears in multiple tables, |
320 | +or a table name appears in multiple databases, the ambiguities cannot be |
321 | +resolved. |
322 | |
323 | =item --daemonize |
324 | |
325 | @@ -6964,7 +6946,8 @@ |
326 | |
327 | type: DSN |
328 | |
329 | -EXPLAIN EXTENDED queries on this host to fully qualify table and column names. |
330 | +A server to execute EXPLAIN EXTENDED queries. This may be necessary to resolve |
331 | +ambiguous (unqualified) column and table names. |
332 | |
333 | =item --filter |
334 | |
335 | @@ -6972,89 +6955,13 @@ |
336 | |
337 | Discard events for which this Perl code doesn't return true. |
338 | |
339 | -This option is a string of Perl code or a file containing Perl code that gets |
340 | -compiled into a subroutine with one argument: $event. This is a hashref. |
341 | -If the given value is a readable file, then pt-table-usage reads the entire |
342 | -file and uses its contents as the code. The file should not contain |
343 | -a shebang (#!/usr/bin/perl) line. |
344 | - |
345 | -If the code returns true, the chain of callbacks continues; otherwise it ends. |
346 | -The code is the last statement in the subroutine other than C<return $event>. |
347 | -The subroutine template is: |
348 | - |
349 | - sub { $event = shift; filter && return $event; } |
350 | - |
351 | -Filters given on the command line are wrapped inside parentheses like like |
352 | -C<( filter )>. For complex, multi-line filters, you must put the code inside |
353 | -a file so it will not be wrapped inside parentheses. Either way, the filter |
354 | -must produce syntactically valid code given the template. For example, an |
355 | -if-else branch given on the command line would not be valid: |
356 | - |
357 | - --filter 'if () { } else { }' # WRONG |
358 | - |
359 | -Since it's given on the command line, the if-else branch would be wrapped inside |
360 | -parentheses which is not syntactically valid. So to accomplish something more |
361 | -complex like this would require putting the code in a file, for example |
362 | -filter.txt: |
363 | - |
364 | - my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok |
365 | - |
366 | -Then specify C<--filter filter.txt> to read the code from filter.txt. |
367 | - |
368 | -If the filter code won't compile, pt-table-usage will die with an error. |
369 | -If the filter code does compile, an error may still occur at runtime if the |
370 | -code tries to do something wrong (like pattern match an undefined value). |
371 | -pt-table-usage does not provide any safeguards so code carefully! |
372 | - |
373 | -An example filter that discards everything but SELECT statements: |
374 | - |
375 | - --filter '$event->{arg} =~ m/^select/i' |
376 | - |
377 | -This is compiled into a subroutine like the following: |
378 | - |
379 | - sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; } |
380 | - |
381 | -It is permissible for the code to have side effects (to alter C<$event>). |
382 | - |
383 | -You can find an explanation of the structure of $event at |
384 | -L<http://code.google.com/p/maatkit/wiki/EventAttributes>. |
385 | - |
386 | -Here are more examples of filter code: |
387 | - |
388 | -=over |
389 | - |
390 | -=item Host/IP matches domain.com |
391 | - |
392 | ---filter '($event->{host} || $event->{ip} || "") =~ m/domain.com/' |
393 | - |
394 | -Sometimes MySQL logs the host where the IP is expected. Therefore, we |
395 | -check both. |
396 | - |
397 | -=item User matches john |
398 | - |
399 | ---filter '($event->{user} || "") =~ m/john/' |
400 | - |
401 | -=item More than 1 warning |
402 | - |
403 | ---filter '($event->{Warning_count} || 0) > 1' |
404 | - |
405 | -=item Query does full table scan or full join |
406 | - |
407 | ---filter '(($event->{Full_scan} || "") eq "Yes") || (($event->{Full_join} || "") eq "Yes")' |
408 | - |
409 | -=item Query was not served from query cache |
410 | - |
411 | ---filter '($event->{QC_Hit} || "") eq "No"' |
412 | - |
413 | -=item Query is 1 MB or larger |
414 | - |
415 | ---filter '$event->{bytes} >= 1_048_576' |
416 | - |
417 | -=back |
418 | - |
419 | -Since L<"--filter"> allows you to alter C<$event>, you can use it to do other |
420 | -things, like create new attributes. |
421 | - |
422 | +This option is a string of Perl code or a file containing Perl code that is |
423 | +compiled into a subroutine with one argument: $event. If the given value is a |
424 | +readable file, then pt-table-usage reads the entire file and uses its contents |
425 | +as the code. |
426 | + |
427 | +Filters are implemented in the same fashion as in the pt-query-digest tool, so |
428 | +please refer to its documentation for more information. |
429 | |
430 | =item --help |
431 | |
432 | @@ -7070,9 +6977,8 @@ |
433 | |
434 | type: string |
435 | |
436 | -Identify each event using this attribute. If not ID attribute is given, then |
437 | -events are identified with the query's checksum: an MD5 hex checksum of the |
438 | -query's fingerprint. |
439 | +Identify each event using this attribute. The default is to use a query ID, |
440 | +which is an MD5 checksum of the query's fingerprint. |
441 | |
442 | =item --log |
443 | |
444 | @@ -7115,10 +7021,7 @@ |
445 | |
446 | type: string |
447 | |
448 | -Analyze only this given query. If you want to analyze the table usage of |
449 | -one simple query by providing on the command line instead of reading it |
450 | -from a slow log file, then specify that query with this option. The default |
451 | -L<"--id-attribute"> will be used which is the query's checksum. |
452 | +Analyze the specified query instead of reading a log file. |
453 | |
454 | =item --read-timeout |
455 | |
456 | @@ -7127,7 +7030,7 @@ |
457 | Wait this long for an event from the input; 0 to wait forever. |
458 | |
459 | This option sets the maximum time to wait for an event from the input. If an |
460 | -event is not received after the specified time, the script stops reading the |
461 | +event is not received after the specified time, the tool stops reading the |
462 | input and prints its reports. |
463 | |
464 | This option requires the Perl POSIX module. |