Merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs into lp:percona-toolkit/2.1

Proposed by Baron Schwartz
Status: Merged
Approved by: Daniel Nichter
Approved revision: 226
Merged at revision: 226
Proposed branch: lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs
Merge into: lp:percona-toolkit/2.1
Diff against target: 464 lines (+107/-204)
1 file modified
bin/pt-table-usage (+107/-204)
To merge this branch: bzr merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs
Reviewer Review Type Date Requested Status
Daniel Nichter Approve
Review via email: mp+100264@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Daniel Nichter (daniel-nichter) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'bin/pt-table-usage'
--- bin/pt-table-usage 2012-03-30 22:45:10 +0000
+++ bin/pt-table-usage 2012-03-31 00:51:21 +0000
@@ -5864,12 +5864,12 @@
5864# ###########################################################################5864# ###########################################################################
58655865
5866# ###########################################################################5866# ###########################################################################
5867# MysqldumpParser package 75005867# MysqldumpParser package
5868# This package is a copy without comments from the original. The original5868# This package is a copy without comments from the original. The original
5869# with comments and its test file can be found in the SVN repository at,5869# with comments and its test file can be found in the Bazaar repository at,
5870# trunk/common/MysqldumpParser.pm5870# lib/MysqldumpParser.pm
5871# trunk/common/t/MysqldumpParser.t5871# t/lib/MysqldumpParser.t
5872# See http://code.google.com/p/maatkit/wiki/Developers for more information.5872# See https://launchpad.net/percona-toolkit for more information.
5873# ###########################################################################5873# ###########################################################################
5874package MysqldumpParser;5874package MysqldumpParser;
58755875
@@ -5968,12 +5968,12 @@
5968# ###########################################################################5968# ###########################################################################
59695969
5970# ###########################################################################5970# ###########################################################################
5971# SchemaQualifier package 74995971# SchemaQualifier package
5972# This package is a copy without comments from the original. The original5972# This package is a copy without comments from the original. The original
5973# with comments and its test file can be found in the SVN repository at,5973# with comments and its test file can be found in the SVN repository at,
5974# trunk/common/SchemaQualifier.pm5974# lib/SchemaQualifier.pm
5975# trunk/common/t/SchemaQualifier.t5975# t/lib/SchemaQualifier.t
5976# See http://code.google.com/p/maatkit/wiki/Developers for more information.5976# See https://launchpad.net/percona-toolkit for more information.
5977# ###########################################################################5977# ###########################################################################
5978package SchemaQualifier;5978package SchemaQualifier;
59795979
@@ -6680,24 +6680,22 @@
66806680
6681=head1 NAME6681=head1 NAME
66826682
6683pt-table-usage - Read queries from a log and analyze how they use tables.6683pt-table-usage - Analyze how queries use tables.
66846684
6685=head1 SYNOPSIS6685=head1 SYNOPSIS
66866686
6687Usage: pt-table-usage [OPTIONS] [FILES]6687Usage: pt-table-usage [OPTIONS] [FILES]
66886688
6689pt-table-usage reads queries from slow query logs and analyzes how they use6689pt-table-usage reads queries from a log and analyzes how they use tables. If no
6690tables. If no FILE is specified, STDIN is read. Table usage for every query6690FILE is specified, it reads STDIN. It prints a report for each query.
6691is printed to STDOUT.
66926691
6693=head1 RISKS6692=head1 RISKS
66946693
6695pt-table-use is very low risk because it only reads and examines queries from6694pt-table-use is very low risk. By default, it simply reads queries from a log.
6696a log and executes C<EXPLAIN EXTENDED> if the L<"--explain-extended"> option6695It executes C<EXPLAIN EXTENDED> if you specify the L<"--explain-extended">
6697is specified.6696option.
66986697
6699At the time of this release, there are no known bugs that could cause serious6698At the time of this release, we know of no bugs that could harm users.
6700harm to users.
67016699
6702The authoritative source for updated information is always the online issue6700The authoritative source for updated information is always the online issue
6703tracking system. Issues that affect this tool will be marked as such. You can6701tracking system. Issues that affect this tool will be marked as such. You can
@@ -6708,40 +6706,32 @@
67086706
6709=head1 DESCRIPTION6707=head1 DESCRIPTION
67106708
6711pt-table-usage reads queries from slow query logs and analyzes how they use6709pt-table-usage reads queries from a log and analyzes how they use tables. The
6712tables. Table usage indicates more than just which tables are read from or6710log should be in MySQL's slow query log format.
6713written to by the query, it also indicates data flow: data in and data out.6711
6714Data flow is determined by the contexts in which tables are used by the query.6712Table usage is more than simply an indication of which tables the query reads or
6715A single table can be used in several different contexts in the same query.6713writes. It also indicates data flow: data in and data out. The tool determines
6716The reported table usage for each query lists every context for every table.6714the data flow by the contexts in which tables appear. A single query can use a
6717This CONTEXT-TABLE list tells how and where data flows, i.e. the query's table6715table in several different contexts simultaneously. The tool's output lists
6718usage. The L<"OUTPUT"> section lists the possible contexts and describes how6716every context for every table. This CONTEXT-TABLE list indicates how data flows
6719to read a table usage report.6717between tables. The L<"OUTPUT"> section lists the possible contexts and
67206718describes how to read a table usage report.
6721Since this tool analyzes table usage, it's important that queries use6719
6722table-qualified columns. If a query uses only one table, then all columns6720The tool analyzes data flow down to the level of individual columns, so it is
6723must be from that table and there's no problem. But if a query uses6721helpful if columns are identified unambiguously in the query. If a query uses
6724multiple tables and the columns are not table-qualified, then that creates a6722only one table, then all columns must be from that table, and there's no
6725problem that can only be solved by knowing the query's database and specifying6723difficulty. But if a query uses multiple tables and the column names are not
6726L<"--explain-extended">. If the slow log does not specify the database6724table-qualified, then it is necessary to use C<EXPLAIN EXTENDED>, followed by
6727used by the query, then you can specify a default database with L<"--database">.6725C<SHOW WARNINGS>, to determine to which tables the columns belong.
6728There is no other way to know or guess the database, so the query will be6726
6729skipped. Secondly, if the database is known, then specifying6727If the tool does not know the query's default database, which can occur when the
6730L<"--explain-extended"> causes pt-table-usage to do C<EXPLAIN EXTENDED ...>6728database is not printed in the log, then C<EXPLAIN EXTENDED> can fail. In this
6731C<SHOW WARNINGS> to get the fully qualified query as reported by MySQL6729case, you can specify a default database with L<"--database">. You can also use
6732(i.e. all identifiers are fully database- and/or table-qualified). For6730the L<"--create-table-definitions"> option to help resolve ambiguities.
6733best results, you should specify L<"--explain-extended"> and
6734L<"--database"> if you know that all queries use the same database.
6735
6736Each query is identified in the output by either an MD5 hex checksum
6737of the query's fingerprint or the query's value for the specified
6738L<"--id-attribute">. The query ID is for parsing and storing the table
6739usage reports in a table that is keyed on the query ID. See L<"OUTPUT">
6740for more information.
67416731
6742=head1 OUTPUT6732=head1 OUTPUT
67436733
6744The table usage report that is printed for each query looks similar to the6734The tool prints a usage report for each table in every query, similar to the
6745following:6735following:
67466736
6747 Query_id: 0x1CD27577D202A339.16737 Query_id: 0x1CD27577D202A339.1
@@ -6758,45 +6748,43 @@
6758 JOIN t26748 JOIN t2
6759 WHERE t16749 WHERE t1
67606750
6761Usage reports are separated by blank lines. The first line is always the6751The first line contains the query ID, which by default is the same as those
6762query ID: a unique ID that can be used to parse the output and store the6752shown in pt-query-digest reports. It is an MD5 checksum of the query's
6763usage reports in a table keyed on this ID. The query ID has two parts6753"fingerprint," which is what remains after removing literals, collapsing white
6764separated by a period: the query ID and the target table number.6754space, and a variety of other transformations. The query ID has two parts
67656755separated by a period: the query ID and the table number. If you wish to use a
6766If L<"--id-attribute"> is not specified, then query IDs are automatically6756different value to identify the query, you can specify the L<"--id-attribute">
6767created by making an MD5 hex checksum of the query's fingerprint6757option.
6768(as shown above, e.g. C<0x1CD27577D202A339>); otherwise, the query ID is the6758
6769query's value for the given attribute.6759The previous example shows two paragraphs for a single query, not two queries.
67706760Note that the query ID is identical for the two, but the table number differs.
6771The target table number starts at 1 and increments by 1 for each table that6761The table number increments by 1 for each table that the query updates. Only
6772the query affects. Only multi-table UPDATE queries can affect6762multi-table UPDATE queries can update multiple tables with a single query, so
6773multiple tables with a single query, so this number is 1 for all other types6763the table number is 1 for all other types of queries. (The tool does not
6774of queries. (Multi-table DELETE queries are not supported.)6764support multi-table DELETE queries.) The example output above is from this
6775The example output above is from this query:6765query:
67766766
6777 UPDATE t1 AS a JOIN t2 AS b USING (id)6767 UPDATE t1 AS a JOIN t2 AS b USING (id)
6778 SET a.foo="bar", b.foo="bat"6768 SET a.foo="bar", b.foo="bat"
6779 WHERE a.id=1;6769 WHERE a.id=1;
67806770
6781The C<SET> clause indicates that two tables are updated: C<a> aliased as C<t1>,6771The C<SET> clause indicates that the query updates two tables: C<a> aliased as
6782and C<b> aliased as C<t2>. So two usage reports are printed, one for each6772C<t1>, and C<b> aliased as C<t2>.
6783table, and this is indicated in the output by their common query ID but
6784incrementing target table number.
67856773
6786After the first line is a variable number of CONTEXT-TABLE lines. Possible6774After the first line, the tool prints a variable number of CONTEXT-TABLE lines.
6787contexts are:6775Possible contexts are as follows:
67886776
6789=over6777=over
67906778
6791=item * SELECT6779=item * SELECT
67926780
6793SELECT means that data is taken out of the table for one of two reasons:6781SELECT means that the query retrieves data from the table for one of two
6794to be returned to the user as part of a result set, or to be put into another6782reasons. The first is to be returned to the user as part of a result set. Only
6795table as part of an INSERT or UPDATE. In the first case, since only SELECT6783SELECT queries return result sets, so the report always shows a SELECT context
6796queries return result sets, a SELECT context is always listed for SELECT6784for SELECT queries.
6797queries. In the second case, data from one table is used to insert or6785
6798update rows in another table. For example, the UPDATE query in the example6786The second case is when data flows to another table as part of an INSERT or
6799above has the usage:6787UPDATE. For example, the UPDATE query in the example above has the usage:
68006788
6801 SELECT DUAL6789 SELECT DUAL
68026790
@@ -6804,9 +6792,9 @@
68046792
6805 SET a.foo="bar", b.foo="bat"6793 SET a.foo="bar", b.foo="bat"
68066794
6807DUAL is used for any values that does not originate in a table, in this case the6795The tool uses DUAL for any values that do not originate in a table, in this case
6808literal values "bar" and "bat". If that C<SET> clause were C<SET a.foo=b.foo>6796the literal values "bar" and "bat". If that C<SET> clause were C<SET
6809instead, then the complete usage would be:6797a.foo=b.foo> instead, then the complete usage would be:
68106798
6811 Query_id: 0x1CD27577D202A339.16799 Query_id: 0x1CD27577D202A339.1
6812 UPDATE t16800 UPDATE t1
@@ -6820,20 +6808,15 @@
6820immediately above reflects an UPDATE query that updates rows in table C<t1>6808immediately above reflects an UPDATE query that updates rows in table C<t1>
6821with data from table C<t2>.6809with data from table C<t2>.
68226810
6823=item * Any other query type6811=item * Any other verb
68246812
6825Any other query type, such as INSERT, UPDATE, DELETE, etc. may be a context.6813Any other verb, such as INSERT, UPDATE, DELETE, etc. may be a context. These
6826All these types indicate that the table is written or altered in some way.6814verbs indicate that the query modifies data in some way. If a SELECT context
6827If a SELECT context follows one of these types, then data is read from the6815follows one of these verbs, then the query reads data from the SELECT table and
6828SELECT table and written to this table. This happens, for example, with6816writes it to this table. This happens, for example, with INSERT..SELECT or
6829INSERT..SELECT or UPDATE queries that set column values using values from6817UPDATE queries that use values from tables instead of constant values.
6830tables instead of constant values.6818
68316819These query types are not supported: SET, LOAD, and multi-table DELETE.
6832These query types are not supported:
6833
6834 SET
6835 LOAD
6836 multi-table DELETE
68376820
6838=item * JOIN6821=item * JOIN
68396822
@@ -6853,14 +6836,14 @@
6853 WHERE t16836 WHERE t1
6854 WHERE t26837 WHERE t2
68556838
6856Only unique tables are listed; that is why table C<t1> is listed only once.6839The tool lists only distinct tables; that is why table C<t1> is listed only
6840once.
68576841
6858=item * TLIST6842=item * TLIST
68596843
6860The TLIST context lists tables that are accessed by the query but do not6844The TLIST context lists tables that the query accesses, but which do not appear
6861appear in any other context. These tables are usually an implicit6845in any other context. These tables are usually an implicit cartesian join. For
6862full cartesian join, so they should be avoided. For example, the query6846example, the query C<SELECT * FROM t1, t2> results in:
6863C<SELECT * FROM t1, t2> results in:
68646847
6865 Query_id: 0xBDDEB6EDA41897A8.16848 Query_id: 0xBDDEB6EDA41897A8.1
6866 SELECT t16849 SELECT t1
@@ -6871,7 +6854,7 @@
6871First of all, there are two SELECT contexts, because C<SELECT *> selects6854First of all, there are two SELECT contexts, because C<SELECT *> selects
6872rows from all tables; C<t1> and C<t2> in this case. Secondly, the tables6855rows from all tables; C<t1> and C<t2> in this case. Secondly, the tables
6873are implicitly joined, but without any kind of join condition, which results6856are implicitly joined, but without any kind of join condition, which results
6874in a full cartesian join as indicated by the TLIST context for each.6857in a cartesian join as indicated by the TLIST context for each.
68756858
6876=back6859=back
68776860
@@ -6911,24 +6894,23 @@
69116894
6912type: string; default: DUAL6895type: string; default: DUAL
69136896
6914Value to print for constant data. Constant data means all data not6897Table to print as the source for constant data (literals). This is any data not
6915from tables (or subqueries since subqueries are not supported). For example,6898retrieved from tables (or subqueries, because subqueries are not supported).
6916real constant values like strings ("foo") and numbers (42), and data from6899This includes literal values such as strings ("foo") and numbers (42), or
6917functions like C<NOW()>. For example, in the query6900functions such as C<NOW()>. For example, in the query C<INSERT INTO t (c)
6918C<INSERT INTO t (c) VALUES ('a')>, the string 'a' is constant data, so the6901VALUES ('a')>, the string 'a' is constant data, so the table usage report is:
6919table usage report is:
69206902
6921 INSERT t6903 INSERT t
6922 SELECT DUAL6904 SELECT DUAL
69236905
6924The first line indicates that data is inserted into table C<t> and the second6906The first line indicates that the query inserts data into table C<t>, and the
6925line indicates that that data comes from some constant value.6907second line indicates that the inserted data comes from some constant value.
69266908
6927=item --[no]continue-on-error6909=item --[no]continue-on-error
69286910
6929default: yes6911default: yes
69306912
6931Continue parsing even if there is an error.6913Continue to work even if there is an error.
69326914
6933=item --create-table-definitions6915=item --create-table-definitions
69346916
@@ -6939,9 +6921,9 @@
6939names, you can save the output of C<mysqldump --no-data> to one or more files6921names, you can save the output of C<mysqldump --no-data> to one or more files
6940and specify those files with this option. The tool will parse all6922and specify those files with this option. The tool will parse all
6941C<CREATE TABLE> definitions from the files and use this information to6923C<CREATE TABLE> definitions from the files and use this information to
6942qualify table and column names. If a column name is used in multiple tables,6924qualify table and column names. If a column name appears in multiple tables,
6943or table name is used in multiple databases, these duplicates cannot be6925or a table name appears in multiple databases, the ambiguities cannot be
6944qualified.6926resolved.
69456927
6946=item --daemonize6928=item --daemonize
69476929
@@ -6964,7 +6946,8 @@
69646946
6965type: DSN6947type: DSN
69666948
6967EXPLAIN EXTENDED queries on this host to fully qualify table and column names.6949A server to execute EXPLAIN EXTENDED queries. This may be necessary to resolve
6950ambiguous (unqualified) column and table names.
69686951
6969=item --filter6952=item --filter
69706953
@@ -6972,89 +6955,13 @@
69726955
6973Discard events for which this Perl code doesn't return true.6956Discard events for which this Perl code doesn't return true.
69746957
6975This option is a string of Perl code or a file containing Perl code that gets6958This option is a string of Perl code or a file containing Perl code that is
6976compiled into a subroutine with one argument: $event. This is a hashref.6959compiled into a subroutine with one argument: $event. If the given value is a
6977If the given value is a readable file, then pt-table-usage reads the entire6960readable file, then pt-table-usage reads the entire file and uses its contents
6978file and uses its contents as the code. The file should not contain6961as the code.
6979a shebang (#!/usr/bin/perl) line.6962
69806963Filters are implemented in the same fashion as in the pt-query-digest tool, so
6981If the code returns true, the chain of callbacks continues; otherwise it ends.6964please refer to its documentation for more information.
6982The code is the last statement in the subroutine other than C<return $event>.
6983The subroutine template is:
6984
6985 sub { $event = shift; filter && return $event; }
6986
6987Filters given on the command line are wrapped inside parentheses like like
6988C<( filter )>. For complex, multi-line filters, you must put the code inside
6989a file so it will not be wrapped inside parentheses. Either way, the filter
6990must produce syntactically valid code given the template. For example, an
6991if-else branch given on the command line would not be valid:
6992
6993 --filter 'if () { } else { }' # WRONG
6994
6995Since it's given on the command line, the if-else branch would be wrapped inside
6996parentheses which is not syntactically valid. So to accomplish something more
6997complex like this would require putting the code in a file, for example
6998filter.txt:
6999
7000 my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
7001
7002Then specify C<--filter filter.txt> to read the code from filter.txt.
7003
7004If the filter code won't compile, pt-table-usage will die with an error.
7005If the filter code does compile, an error may still occur at runtime if the
7006code tries to do something wrong (like pattern match an undefined value).
7007pt-table-usage does not provide any safeguards so code carefully!
7008
7009An example filter that discards everything but SELECT statements:
7010
7011 --filter '$event->{arg} =~ m/^select/i'
7012
7013This is compiled into a subroutine like the following:
7014
7015 sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
7016
7017It is permissible for the code to have side effects (to alter C<$event>).
7018
7019You can find an explanation of the structure of $event at
7020L<http://code.google.com/p/maatkit/wiki/EventAttributes>.
7021
7022Here are more examples of filter code:
7023
7024=over
7025
7026=item Host/IP matches domain.com
7027
7028--filter '($event->{host} || $event->{ip} || "") =~ m/domain.com/'
7029
7030Sometimes MySQL logs the host where the IP is expected. Therefore, we
7031check both.
7032
7033=item User matches john
7034
7035--filter '($event->{user} || "") =~ m/john/'
7036
7037=item More than 1 warning
7038
7039--filter '($event->{Warning_count} || 0) > 1'
7040
7041=item Query does full table scan or full join
7042
7043--filter '(($event->{Full_scan} || "") eq "Yes") || (($event->{Full_join} || "") eq "Yes")'
7044
7045=item Query was not served from query cache
7046
7047--filter '($event->{QC_Hit} || "") eq "No"'
7048
7049=item Query is 1 MB or larger
7050
7051--filter '$event->{bytes} >= 1_048_576'
7052
7053=back
7054
7055Since L<"--filter"> allows you to alter C<$event>, you can use it to do other
7056things, like create new attributes.
7057
70586965
7059=item --help6966=item --help
70606967
@@ -7070,9 +6977,8 @@
70706977
7071type: string6978type: string
70726979
7073Identify each event using this attribute. If not ID attribute is given, then6980Identify each event using this attribute. The default is to use a query ID,
7074events are identified with the query's checksum: an MD5 hex checksum of the6981which is an MD5 checksum of the query's fingerprint.
7075query's fingerprint.
70766982
7077=item --log6983=item --log
70786984
@@ -7115,10 +7021,7 @@
71157021
7116type: string7022type: string
71177023
7118Analyze only this given query. If you want to analyze the table usage of7024Analyze the specified query instead of reading a log file.
7119one simple query by providing on the command line instead of reading it
7120from a slow log file, then specify that query with this option. The default
7121L<"--id-attribute"> will be used which is the query's checksum.
71227025
7123=item --read-timeout7026=item --read-timeout
71247027
@@ -7127,7 +7030,7 @@
7127Wait this long for an event from the input; 0 to wait forever.7030Wait this long for an event from the input; 0 to wait forever.
71287031
7129This option sets the maximum time to wait for an event from the input. If an7032This option sets the maximum time to wait for an event from the input. If an
7130event is not received after the specified time, the script stops reading the7033event is not received after the specified time, the tool stops reading the
7131input and prints its reports.7034input and prints its reports.
71327035
7133This option requires the Perl POSIX module.7036This option requires the Perl POSIX module.

Subscribers

People subscribed via source and target branches