Percona Toolkit moved to https://jira.percona.com/projects/PT

Merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs into lp:percona-toolkit/2.1

pt-table-usage-docs
Merge into 2.1

Proposed by Baron Schwartz on 2012-03-31

Status:	Merged
Approved by:	Daniel Nichter on 2012-04-02
Approved revision:	226
Merged at revision:	226
Proposed branch:	lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs
Merge into:	lp:percona-toolkit/2.1
Diff against target:	464 lines (+107/-204) 1 file modified bin/pt-table-usage (+107/-204)
To merge this branch:	bzr merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs
Related bugs:	Link a bug report
Related blueprints:	Add pt-table-usage (Medium)

Reviewer	Review Type	Date Requested	Status
Daniel Nichter		2012-03-31	Approve on 2012-04-02
Review via email: mp+100264@code.launchpad.net

Revision history for this message

Daniel Nichter (daniel-nichter) on 2012-04-02:

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Jonathan Quimbly

Percona Toolkit developers

Percona Toolkit moved to https://jira.percona.com/projects/PT

Merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs into lp:percona-toolkit/2.1

Commit message

Description of the change

Preview Diff

Subscribers

 === modified file 'bin/pt-table-usage'
 --- bin/pt-table-usage	2012-03-30 22:45:10 +0000
 +++ bin/pt-table-usage	2012-03-31 00:51:21 +0000
@@ -5864,12 +5864,12 @@
  # ###########################################################################
  # ###########################################################################
--# MysqldumpParser package 7500
++# MysqldumpParser package
  # This package is a copy without comments from the original.  The original
--# with comments and its test file can be found in the SVN repository at,
--#   trunk/common/MysqldumpParser.pm
--#   trunk/common/t/MysqldumpParser.t
--# See http://code.google.com/p/maatkit/wiki/Developers for more information.
++# with comments and its test file can be found in the Bazaar repository at,
++#   lib/MysqldumpParser.pm
++#   t/lib/MysqldumpParser.t
++# See https://launchpad.net/percona-toolkit for more information.
  # ###########################################################################
  package MysqldumpParser;
@@ -5968,12 +5968,12 @@
  # ###########################################################################
  # ###########################################################################
--# SchemaQualifier package 7499
++# SchemaQualifier package
  # This package is a copy without comments from the original.  The original
  # with comments and its test file can be found in the SVN repository at,
--#   trunk/common/SchemaQualifier.pm
--#   trunk/common/t/SchemaQualifier.t
--# See http://code.google.com/p/maatkit/wiki/Developers for more information.
++#   lib/SchemaQualifier.pm
++#   t/lib/SchemaQualifier.t
++# See https://launchpad.net/percona-toolkit for more information.
  # ###########################################################################
  package SchemaQualifier;
@@ -6680,24 +6680,22 @@
  =head1 NAME
--pt-table-usage - Read queries from a log and analyze how they use tables.
++pt-table-usage - Analyze how queries use tables.
  =head1 SYNOPSIS
  Usage: pt-table-usage [OPTIONS] [FILES]
--pt-table-usage reads queries from slow query logs and analyzes how they use
--tables.  If no FILE is specified, STDIN is read.  Table usage for every query
--is printed to STDOUT.
++pt-table-usage reads queries from a log and analyzes how they use tables.  If no
++FILE is specified, it reads STDIN.  It prints a report for each query.
  =head1 RISKS
--pt-table-use is very low risk because it only reads and examines queries from
--a log and executes C<EXPLAIN EXTENDED> if the L<"--explain-extended"> option
--is specified.
++pt-table-use is very low risk. By default, it simply reads queries from a log.
++It executes C<EXPLAIN EXTENDED> if you specify the L<"--explain-extended">
++option.
--At the time of this release, there are no known bugs that could cause serious
--harm to users.
++At the time of this release, we know of no bugs that could harm users.
  The authoritative source for updated information is always the online issue
  tracking system.  Issues that affect this tool will be marked as such.  You can
@@ -6708,40 +6706,32 @@
  =head1 DESCRIPTION
--pt-table-usage reads queries from slow query logs and analyzes how they use
--tables.  Table usage indicates more than just which tables are read from or
--written to by the query, it also indicates data flow: data in and data out.
--Data flow is determined by the contexts in which tables are used by the query.
--A single table can be used in several different contexts in the same query.
--The reported table usage for each query lists every context for every table.
--This CONTEXT-TABLE list tells how and where data flows, i.e. the query's table
--usage.  The L<"OUTPUT"> section lists the possible contexts and describes how
--to read a table usage report.
--
--Since this tool analyzes table usage, it's important that queries use
--table-qualified columns.  If a query uses only one table, then all columns
--must be from that table and there's no problem.  But if a query uses
--multiple tables and the columns are not table-qualified, then that creates a
--problem that can only be solved by knowing the query's database and specifying
--L<"--explain-extended">.  If the slow log does not specify the database
--used by the query, then you can specify a default database with L<"--database">.
--There is no other way to know or guess the database, so the query will be
--skipped.  Secondly, if the database is known, then specifying
--L<"--explain-extended"> causes pt-table-usage to do C<EXPLAIN EXTENDED ...>
--C<SHOW WARNINGS> to get the fully qualified query as reported by MySQL
--(i.e. all identifiers are fully database- and/or table-qualified).  For
--best results, you should specify L<"--explain-extended"> and
--L<"--database"> if you know that all queries use the same database.
--
--Each query is identified in the output by either an MD5 hex checksum
--of the query's fingerprint or the query's value for the specified
--L<"--id-attribute">.  The query ID is for parsing and storing the table
--usage reports in a table that is keyed on the query ID.  See L<"OUTPUT">
--for more information.
++pt-table-usage reads queries from a log and analyzes how they use tables.  The
++log should be in MySQL's slow query log format.
++
++Table usage is more than simply an indication of which tables the query reads or
++writes.  It also indicates data flow: data in and data out.  The tool determines
++the data flow by the contexts in which tables appear.  A single query can use a
++table in several different contexts simultaneously.  The tool's output lists
++every context for every table.  This CONTEXT-TABLE list indicates how data flows
++between tables.  The L<"OUTPUT"> section lists the possible contexts and
++describes how to read a table usage report.
++
++The tool analyzes data flow down to the level of individual columns, so it is
++helpful if columns are identified unambiguously in the query.  If a query uses
++only one table, then all columns must be from that table, and there's no
++difficulty.  But if a query uses multiple tables and the column names are not
++table-qualified, then it is necessary to use C<EXPLAIN EXTENDED>, followed by
++C<SHOW WARNINGS>, to determine to which tables the columns belong.
++
++If the tool does not know the query's default database, which can occur when the
++database is not printed in the log, then C<EXPLAIN EXTENDED> can fail. In this
++case, you can specify a default database with L<"--database">. You can also use
++the L<"--create-table-definitions"> option to help resolve ambiguities.
  =head1 OUTPUT
--The table usage report that is printed for each query looks similar to the
++The tool prints a usage report for each table in every query, similar to the
  following:
    Query_id: 0x1CD27577D202A339.1
@@ -6758,45 +6748,43 @@
    JOIN t2
    WHERE t1
--Usage reports are separated by blank lines.  The first line is always the
--query ID: a unique ID that can be used to parse the output and store the
--usage reports in a table keyed on this ID.  The query ID has two parts
--separated by a period: the query ID and the target table number.
--
--If L<"--id-attribute"> is not specified, then query IDs are automatically
--created by making an MD5 hex checksum of the query's fingerprint
--(as shown above, e.g. C<0x1CD27577D202A339>); otherwise, the query ID is the
--query's value for the given attribute.
--
--The target table number starts at 1 and increments by 1 for each table that
--the query affects.  Only multi-table UPDATE queries can affect
--multiple tables with a single query, so this number is 1 for all other types
--of queries.  (Multi-table DELETE queries are not supported.)
--The example output above is from this query:
++The first line contains the query ID, which by default is the same as those
++shown in pt-query-digest reports. It is an MD5 checksum of the query's
++"fingerprint," which is what remains after removing literals, collapsing white
++space, and a variety of other transformations. The query ID has two parts
++separated by a period: the query ID and the table number. If you wish to use a
++different value to identify the query, you can specify the L<"--id-attribute">
++option.
++
++The previous example shows two paragraphs for a single query, not two queries.
++Note that the query ID is identical for the two, but the table number differs.
++The table number increments by 1 for each table that the query updates.  Only
++multi-table UPDATE queries can update multiple tables with a single query, so
++the table number is 1 for all other types of queries.  (The tool does not
++support multi-table DELETE queries.) The example output above is from this
++query:
    UPDATE t1 AS a JOIN t2 AS b USING (id)
    SET a.foo="bar", b.foo="bat"
    WHERE a.id=1;
--The C<SET> clause indicates that two tables are updated: C<a> aliased as C<t1>,
--and C<b> aliased as C<t2>.  So two usage reports are printed, one for each
--table, and this is indicated in the output by their common query ID but
--incrementing target table number.
++The C<SET> clause indicates that the query updates two tables: C<a> aliased as
++C<t1>, and C<b> aliased as C<t2>.
--After the first line is a variable number of CONTEXT-TABLE lines.  Possible
--contexts are:
++After the first line, the tool prints a variable number of CONTEXT-TABLE lines.
++Possible contexts are as follows:
  =over
  =item * SELECT
--SELECT means that data is taken out of the table for one of two reasons:
--to be returned to the user as part of a result set, or to be put into another
--table as part of an INSERT or UPDATE.  In the first case, since only SELECT
--queries return result sets, a SELECT context is always listed for SELECT
--queries.  In the second case, data from one table is used to insert or
--update rows in another table.  For example, the UPDATE query in the example
--above has the usage:
++SELECT means that the query retrieves data from the table for one of two
++reasons. The first is to be returned to the user as part of a result set. Only
++SELECT queries return result sets, so the report always shows a SELECT context
++for SELECT queries.
++
++The second case is when data flows to another table as part of an INSERT or
++UPDATE.  For example, the UPDATE query in the example above has the usage:
    SELECT DUAL
@@ -6804,9 +6792,9 @@
    SET a.foo="bar", b.foo="bat"
--DUAL is used for any values that does not originate in a table, in this case the
--literal values "bar" and "bat".  If that C<SET> clause were C<SET a.foo=b.foo>
--instead, then the complete usage would be:
++The tool uses DUAL for any values that do not originate in a table, in this case
++the literal values "bar" and "bat".  If that C<SET> clause were C<SET
++a.foo=b.foo> instead, then the complete usage would be:
    Query_id: 0x1CD27577D202A339.1
    UPDATE t1
@@ -6820,20 +6808,15 @@
  immediately above reflects an UPDATE query that updates rows in table C<t1>
  with data from table C<t2>.
--=item * Any other query type
--
--Any other query type, such as INSERT, UPDATE, DELETE, etc. may be a context.
--All these types indicate that the table is written or altered in some way.
--If a SELECT context follows one of these types, then data is read from the
--SELECT table and written to this table.  This happens, for example, with
--INSERT..SELECT or UPDATE queries that set column values using values from
--tables instead of constant values.
--
--These query types are not supported:
--
--  SET
--  LOAD
--  multi-table DELETE
++=item * Any other verb
++
++Any other verb, such as INSERT, UPDATE, DELETE, etc. may be a context.  These
++verbs indicate that the query modifies data in some way.  If a SELECT context
++follows one of these verbs, then the query reads data from the SELECT table and
++writes it to this table.  This happens, for example, with INSERT..SELECT or
++UPDATE queries that use values from tables instead of constant values.
++
++These query types are not supported: SET, LOAD, and multi-table DELETE.
  =item * JOIN
@@ -6853,14 +6836,14 @@
    WHERE t1
    WHERE t2
--Only unique tables are listed; that is why table C<t1> is listed only once.
++The tool lists only distinct tables; that is why table C<t1> is listed only
++once.
  =item * TLIST
--The TLIST context lists tables that are accessed by the query but do not
--appear in any other context.  These tables are usually an implicit
--full cartesian join, so they should be avoided.  For example, the query
--C<SELECT * FROM t1, t2> results in:
++The TLIST context lists tables that the query accesses, but which do not appear
++in any other context.  These tables are usually an implicit cartesian join.  For
++example, the query C<SELECT * FROM t1, t2> results in:
    Query_id: 0xBDDEB6EDA41897A8.1
    SELECT t1
@@ -6871,7 +6854,7 @@
  First of all, there are two SELECT contexts, because C<SELECT *> selects
  rows from all tables; C<t1> and C<t2> in this case.  Secondly, the tables
  are implicitly joined, but without any kind of join condition, which results
--in a full cartesian join as indicated by the TLIST context for each.
++in a cartesian join as indicated by the TLIST context for each.
  =back
@@ -6911,24 +6894,23 @@
  type: string; default: DUAL
--Value to print for constant data.  Constant data means all data not
--from tables (or subqueries since subqueries are not supported).  For example,
--real constant values like strings ("foo") and numbers (42), and data from
--functions like C<NOW()>.  For example, in the query
--C<INSERT INTO t (c) VALUES ('a')>, the string 'a' is constant data, so the
--table usage report is:
++Table to print as the source for constant data (literals).  This is any data not
++retrieved from tables (or subqueries, because subqueries are not supported).
++This includes literal values such as strings ("foo") and numbers (42), or
++functions such as C<NOW()>.  For example, in the query C<INSERT INTO t (c)
++VALUES ('a')>, the string 'a' is constant data, so the table usage report is:
    INSERT t
    SELECT DUAL
--The first line indicates that data is inserted into table C<t> and the second
--line indicates that that data comes from some constant value.
++The first line indicates that the query inserts data into table C<t>, and the
++second line indicates that the inserted data comes from some constant value.
  =item --[no]continue-on-error
  default: yes
--Continue parsing even if there is an error.
++Continue to work even if there is an error.
  =item --create-table-definitions
@@ -6939,9 +6921,9 @@
  names, you can save the output of C<mysqldump --no-data> to one or more files
  and specify those files with this option.  The tool will parse all
  C<CREATE TABLE> definitions from the files and use this information to
--qualify table and column names.  If a column name is used in multiple tables,
--or table name is used in multiple databases, these duplicates cannot be
--qualified.
++qualify table and column names.  If a column name appears in multiple tables,
++or a table name appears in multiple databases, the ambiguities cannot be
++resolved.
  =item --daemonize
@@ -6964,7 +6946,8 @@
  type: DSN
--EXPLAIN EXTENDED queries on this host to fully qualify table and column names.
++A server to execute EXPLAIN EXTENDED queries. This may be necessary to resolve
++ambiguous (unqualified) column and table names.
  =item --filter
@@ -6972,89 +6955,13 @@
  Discard events for which this Perl code doesn't return true.
--This option is a string of Perl code or a file containing Perl code that gets
--compiled into a subroutine with one argument: $event.  This is a hashref.
--If the given value is a readable file, then pt-table-usage reads the entire
--file and uses its contents as the code.  The file should not contain
--a shebang (#!/usr/bin/perl) line.
--
--If the code returns true, the chain of callbacks continues; otherwise it ends.
--The code is the last statement in the subroutine other than C<return $event>.
--The subroutine template is:
--
--  sub { $event = shift; filter && return $event; }
--
--Filters given on the command line are wrapped inside parentheses like like
--C<( filter )>.  For complex, multi-line filters, you must put the code inside
--a file so it will not be wrapped inside parentheses.  Either way, the filter
--must produce syntactically valid code given the template.  For example, an
--if-else branch given on the command line would not be valid:
--
--  --filter 'if () { } else { }'  # WRONG
--
--Since it's given on the command line, the if-else branch would be wrapped inside
--parentheses which is not syntactically valid.  So to accomplish something more
--complex like this would require putting the code in a file, for example
--filter.txt:
--
--  my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
--
--Then specify C<--filter filter.txt> to read the code from filter.txt.
--
--If the filter code won't compile, pt-table-usage will die with an error.
--If the filter code does compile, an error may still occur at runtime if the
--code tries to do something wrong (like pattern match an undefined value).
--pt-table-usage does not provide any safeguards so code carefully!
--
--An example filter that discards everything but SELECT statements:
--
--  --filter '$event->{arg} =~ m/^select/i'
--
--This is compiled into a subroutine like the following:
--
--  sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
--
--It is permissible for the code to have side effects (to alter C<$event>).
--
--You can find an explanation of the structure of $event at
--L<http://code.google.com/p/maatkit/wiki/EventAttributes>.
--
--Here are more examples of filter code:
--
--=over
--
--=item Host/IP matches domain.com
--
----filter '($event->{host} || $event->{ip} || "") =~ m/domain.com/'
--
--Sometimes MySQL logs the host where the IP is expected.  Therefore, we
--check both.
--
--=item User matches john
--
----filter '($event->{user} || "") =~ m/john/'
--
--=item More than 1 warning
--
----filter '($event->{Warning_count} || 0) > 1'
--
--=item Query does full table scan or full join
--
----filter '(($event->{Full_scan} || "") eq "Yes") || (($event->{Full_join} || "") eq "Yes")'
--
--=item Query was not served from query cache
--
----filter '($event->{QC_Hit} || "") eq "No"'
--
--=item Query is 1 MB or larger
--
----filter '$event->{bytes} >= 1_048_576'
--
--=back
--
--Since L<"--filter"> allows you to alter C<$event>, you can use it to do other
--things, like create new attributes.
--
++This option is a string of Perl code or a file containing Perl code that is
++compiled into a subroutine with one argument: $event.  If the given value is a
++readable file, then pt-table-usage reads the entire file and uses its contents
++as the code.
++
++Filters are implemented in the same fashion as in the pt-query-digest tool, so
++please refer to its documentation for more information.
  =item --help
@@ -7070,9 +6977,8 @@
  type: string
--Identify each event using this attribute.  If not ID attribute is given, then
--events are identified with the query's checksum: an MD5 hex checksum of the
--query's fingerprint.
++Identify each event using this attribute.  The default is to use a query ID,
++which is an MD5 checksum of the query's fingerprint.
  =item --log
@@ -7115,10 +7021,7 @@
  type: string
--Analyze only this given query.  If you want to analyze the table usage of
--one simple query by providing on the command line instead of reading it
--from a slow log file, then specify that query with this option.  The default
--L<"--id-attribute"> will be used which is the query's checksum.
++Analyze the specified query instead of reading a log file.
  =item --read-timeout
@@ -7127,7 +7030,7 @@
  Wait this long for an event from the input; 0 to wait forever.
  This option sets the maximum time to wait for an event from the input.  If an
--event is not received after the specified time, the script stops reading the
++event is not received after the specified time, the tool stops reading the
  input and prints its reports.
  This option requires the Perl POSIX module.