Merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs into lp:percona-toolkit/2.1
- pt-table-usage-docs
- Merge into 2.1
Proposed by
Baron Schwartz
Status: | Merged |
---|---|
Approved by: | Daniel Nichter |
Approved revision: | 226 |
Merged at revision: | 226 |
Proposed branch: | lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs |
Merge into: | lp:percona-toolkit/2.1 |
Diff against target: |
464 lines (+107/-204) 1 file modified
bin/pt-table-usage (+107/-204) |
To merge this branch: | bzr merge lp:~percona-toolkit-dev/percona-toolkit/pt-table-usage-docs |
Related bugs: | |
Related blueprints: |
Add pt-table-usage
(Medium)
|
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Daniel Nichter | Approve | ||
Review via email:
|
Commit message
Description of the change
To post a comment you must log in.
Revision history for this message

Daniel Nichter (daniel-nichter) : | # |
review:
Approve
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === modified file 'bin/pt-table-usage' | |||
2 | --- bin/pt-table-usage 2012-03-30 22:45:10 +0000 | |||
3 | +++ bin/pt-table-usage 2012-03-31 00:51:21 +0000 | |||
4 | @@ -5864,12 +5864,12 @@ | |||
5 | 5864 | # ########################################################################### | 5864 | # ########################################################################### |
6 | 5865 | 5865 | ||
7 | 5866 | # ########################################################################### | 5866 | # ########################################################################### |
9 | 5867 | # MysqldumpParser package 7500 | 5867 | # MysqldumpParser package |
10 | 5868 | # This package is a copy without comments from the original. The original | 5868 | # This package is a copy without comments from the original. The original |
15 | 5869 | # with comments and its test file can be found in the SVN repository at, | 5869 | # with comments and its test file can be found in the Bazaar repository at, |
16 | 5870 | # trunk/common/MysqldumpParser.pm | 5870 | # lib/MysqldumpParser.pm |
17 | 5871 | # trunk/common/t/MysqldumpParser.t | 5871 | # t/lib/MysqldumpParser.t |
18 | 5872 | # See http://code.google.com/p/maatkit/wiki/Developers for more information. | 5872 | # See https://launchpad.net/percona-toolkit for more information. |
19 | 5873 | # ########################################################################### | 5873 | # ########################################################################### |
20 | 5874 | package MysqldumpParser; | 5874 | package MysqldumpParser; |
21 | 5875 | 5875 | ||
22 | @@ -5968,12 +5968,12 @@ | |||
23 | 5968 | # ########################################################################### | 5968 | # ########################################################################### |
24 | 5969 | 5969 | ||
25 | 5970 | # ########################################################################### | 5970 | # ########################################################################### |
27 | 5971 | # SchemaQualifier package 7499 | 5971 | # SchemaQualifier package |
28 | 5972 | # This package is a copy without comments from the original. The original | 5972 | # This package is a copy without comments from the original. The original |
29 | 5973 | # with comments and its test file can be found in the SVN repository at, | 5973 | # with comments and its test file can be found in the SVN repository at, |
33 | 5974 | # trunk/common/SchemaQualifier.pm | 5974 | # lib/SchemaQualifier.pm |
34 | 5975 | # trunk/common/t/SchemaQualifier.t | 5975 | # t/lib/SchemaQualifier.t |
35 | 5976 | # See http://code.google.com/p/maatkit/wiki/Developers for more information. | 5976 | # See https://launchpad.net/percona-toolkit for more information. |
36 | 5977 | # ########################################################################### | 5977 | # ########################################################################### |
37 | 5978 | package SchemaQualifier; | 5978 | package SchemaQualifier; |
38 | 5979 | 5979 | ||
39 | @@ -6680,24 +6680,22 @@ | |||
40 | 6680 | 6680 | ||
41 | 6681 | =head1 NAME | 6681 | =head1 NAME |
42 | 6682 | 6682 | ||
44 | 6683 | pt-table-usage - Read queries from a log and analyze how they use tables. | 6683 | pt-table-usage - Analyze how queries use tables. |
45 | 6684 | 6684 | ||
46 | 6685 | =head1 SYNOPSIS | 6685 | =head1 SYNOPSIS |
47 | 6686 | 6686 | ||
48 | 6687 | Usage: pt-table-usage [OPTIONS] [FILES] | 6687 | Usage: pt-table-usage [OPTIONS] [FILES] |
49 | 6688 | 6688 | ||
53 | 6689 | pt-table-usage reads queries from slow query logs and analyzes how they use | 6689 | pt-table-usage reads queries from a log and analyzes how they use tables. If no |
54 | 6690 | tables. If no FILE is specified, STDIN is read. Table usage for every query | 6690 | FILE is specified, it reads STDIN. It prints a report for each query. |
52 | 6691 | is printed to STDOUT. | ||
55 | 6692 | 6691 | ||
56 | 6693 | =head1 RISKS | 6692 | =head1 RISKS |
57 | 6694 | 6693 | ||
61 | 6695 | pt-table-use is very low risk because it only reads and examines queries from | 6694 | pt-table-use is very low risk. By default, it simply reads queries from a log. |
62 | 6696 | a log and executes C<EXPLAIN EXTENDED> if the L<"--explain-extended"> option | 6695 | It executes C<EXPLAIN EXTENDED> if you specify the L<"--explain-extended"> |
63 | 6697 | is specified. | 6696 | option. |
64 | 6698 | 6697 | ||
67 | 6699 | At the time of this release, there are no known bugs that could cause serious | 6698 | At the time of this release, we know of no bugs that could harm users. |
66 | 6700 | harm to users. | ||
68 | 6701 | 6699 | ||
69 | 6702 | The authoritative source for updated information is always the online issue | 6700 | The authoritative source for updated information is always the online issue |
70 | 6703 | tracking system. Issues that affect this tool will be marked as such. You can | 6701 | tracking system. Issues that affect this tool will be marked as such. You can |
71 | @@ -6708,40 +6706,32 @@ | |||
72 | 6708 | 6706 | ||
73 | 6709 | =head1 DESCRIPTION | 6707 | =head1 DESCRIPTION |
74 | 6710 | 6708 | ||
105 | 6711 | pt-table-usage reads queries from slow query logs and analyzes how they use | 6709 | pt-table-usage reads queries from a log and analyzes how they use tables. The |
106 | 6712 | tables. Table usage indicates more than just which tables are read from or | 6710 | log should be in MySQL's slow query log format. |
107 | 6713 | written to by the query, it also indicates data flow: data in and data out. | 6711 | |
108 | 6714 | Data flow is determined by the contexts in which tables are used by the query. | 6712 | Table usage is more than simply an indication of which tables the query reads or |
109 | 6715 | A single table can be used in several different contexts in the same query. | 6713 | writes. It also indicates data flow: data in and data out. The tool determines |
110 | 6716 | The reported table usage for each query lists every context for every table. | 6714 | the data flow by the contexts in which tables appear. A single query can use a |
111 | 6717 | This CONTEXT-TABLE list tells how and where data flows, i.e. the query's table | 6715 | table in several different contexts simultaneously. The tool's output lists |
112 | 6718 | usage. The L<"OUTPUT"> section lists the possible contexts and describes how | 6716 | every context for every table. This CONTEXT-TABLE list indicates how data flows |
113 | 6719 | to read a table usage report. | 6717 | between tables. The L<"OUTPUT"> section lists the possible contexts and |
114 | 6720 | 6718 | describes how to read a table usage report. | |
115 | 6721 | Since this tool analyzes table usage, it's important that queries use | 6719 | |
116 | 6722 | table-qualified columns. If a query uses only one table, then all columns | 6720 | The tool analyzes data flow down to the level of individual columns, so it is |
117 | 6723 | must be from that table and there's no problem. But if a query uses | 6721 | helpful if columns are identified unambiguously in the query. If a query uses |
118 | 6724 | multiple tables and the columns are not table-qualified, then that creates a | 6722 | only one table, then all columns must be from that table, and there's no |
119 | 6725 | problem that can only be solved by knowing the query's database and specifying | 6723 | difficulty. But if a query uses multiple tables and the column names are not |
120 | 6726 | L<"--explain-extended">. If the slow log does not specify the database | 6724 | table-qualified, then it is necessary to use C<EXPLAIN EXTENDED>, followed by |
121 | 6727 | used by the query, then you can specify a default database with L<"--database">. | 6725 | C<SHOW WARNINGS>, to determine to which tables the columns belong. |
122 | 6728 | There is no other way to know or guess the database, so the query will be | 6726 | |
123 | 6729 | skipped. Secondly, if the database is known, then specifying | 6727 | If the tool does not know the query's default database, which can occur when the |
124 | 6730 | L<"--explain-extended"> causes pt-table-usage to do C<EXPLAIN EXTENDED ...> | 6728 | database is not printed in the log, then C<EXPLAIN EXTENDED> can fail. In this |
125 | 6731 | C<SHOW WARNINGS> to get the fully qualified query as reported by MySQL | 6729 | case, you can specify a default database with L<"--database">. You can also use |
126 | 6732 | (i.e. all identifiers are fully database- and/or table-qualified). For | 6730 | the L<"--create-table-definitions"> option to help resolve ambiguities. |
97 | 6733 | best results, you should specify L<"--explain-extended"> and | ||
98 | 6734 | L<"--database"> if you know that all queries use the same database. | ||
99 | 6735 | |||
100 | 6736 | Each query is identified in the output by either an MD5 hex checksum | ||
101 | 6737 | of the query's fingerprint or the query's value for the specified | ||
102 | 6738 | L<"--id-attribute">. The query ID is for parsing and storing the table | ||
103 | 6739 | usage reports in a table that is keyed on the query ID. See L<"OUTPUT"> | ||
104 | 6740 | for more information. | ||
127 | 6741 | 6731 | ||
128 | 6742 | =head1 OUTPUT | 6732 | =head1 OUTPUT |
129 | 6743 | 6733 | ||
131 | 6744 | The table usage report that is printed for each query looks similar to the | 6734 | The tool prints a usage report for each table in every query, similar to the |
132 | 6745 | following: | 6735 | following: |
133 | 6746 | 6736 | ||
134 | 6747 | Query_id: 0x1CD27577D202A339.1 | 6737 | Query_id: 0x1CD27577D202A339.1 |
135 | @@ -6758,45 +6748,43 @@ | |||
136 | 6758 | JOIN t2 | 6748 | JOIN t2 |
137 | 6759 | WHERE t1 | 6749 | WHERE t1 |
138 | 6760 | 6750 | ||
154 | 6761 | Usage reports are separated by blank lines. The first line is always the | 6751 | The first line contains the query ID, which by default is the same as those |
155 | 6762 | query ID: a unique ID that can be used to parse the output and store the | 6752 | shown in pt-query-digest reports. It is an MD5 checksum of the query's |
156 | 6763 | usage reports in a table keyed on this ID. The query ID has two parts | 6753 | "fingerprint," which is what remains after removing literals, collapsing white |
157 | 6764 | separated by a period: the query ID and the target table number. | 6754 | space, and a variety of other transformations. The query ID has two parts |
158 | 6765 | 6755 | separated by a period: the query ID and the table number. If you wish to use a | |
159 | 6766 | If L<"--id-attribute"> is not specified, then query IDs are automatically | 6756 | different value to identify the query, you can specify the L<"--id-attribute"> |
160 | 6767 | created by making an MD5 hex checksum of the query's fingerprint | 6757 | option. |
161 | 6768 | (as shown above, e.g. C<0x1CD27577D202A339>); otherwise, the query ID is the | 6758 | |
162 | 6769 | query's value for the given attribute. | 6759 | The previous example shows two paragraphs for a single query, not two queries. |
163 | 6770 | 6760 | Note that the query ID is identical for the two, but the table number differs. | |
164 | 6771 | The target table number starts at 1 and increments by 1 for each table that | 6761 | The table number increments by 1 for each table that the query updates. Only |
165 | 6772 | the query affects. Only multi-table UPDATE queries can affect | 6762 | multi-table UPDATE queries can update multiple tables with a single query, so |
166 | 6773 | multiple tables with a single query, so this number is 1 for all other types | 6763 | the table number is 1 for all other types of queries. (The tool does not |
167 | 6774 | of queries. (Multi-table DELETE queries are not supported.) | 6764 | support multi-table DELETE queries.) The example output above is from this |
168 | 6775 | The example output above is from this query: | 6765 | query: |
169 | 6776 | 6766 | ||
170 | 6777 | UPDATE t1 AS a JOIN t2 AS b USING (id) | 6767 | UPDATE t1 AS a JOIN t2 AS b USING (id) |
171 | 6778 | SET a.foo="bar", b.foo="bat" | 6768 | SET a.foo="bar", b.foo="bat" |
172 | 6779 | WHERE a.id=1; | 6769 | WHERE a.id=1; |
173 | 6780 | 6770 | ||
178 | 6781 | The C<SET> clause indicates that two tables are updated: C<a> aliased as C<t1>, | 6771 | The C<SET> clause indicates that the query updates two tables: C<a> aliased as |
179 | 6782 | and C<b> aliased as C<t2>. So two usage reports are printed, one for each | 6772 | C<t1>, and C<b> aliased as C<t2>. |
176 | 6783 | table, and this is indicated in the output by their common query ID but | ||
177 | 6784 | incrementing target table number. | ||
180 | 6785 | 6773 | ||
183 | 6786 | After the first line is a variable number of CONTEXT-TABLE lines. Possible | 6774 | After the first line, the tool prints a variable number of CONTEXT-TABLE lines. |
184 | 6787 | contexts are: | 6775 | Possible contexts are as follows: |
185 | 6788 | 6776 | ||
186 | 6789 | =over | 6777 | =over |
187 | 6790 | 6778 | ||
188 | 6791 | =item * SELECT | 6779 | =item * SELECT |
189 | 6792 | 6780 | ||
197 | 6793 | SELECT means that data is taken out of the table for one of two reasons: | 6781 | SELECT means that the query retrieves data from the table for one of two |
198 | 6794 | to be returned to the user as part of a result set, or to be put into another | 6782 | reasons. The first is to be returned to the user as part of a result set. Only |
199 | 6795 | table as part of an INSERT or UPDATE. In the first case, since only SELECT | 6783 | SELECT queries return result sets, so the report always shows a SELECT context |
200 | 6796 | queries return result sets, a SELECT context is always listed for SELECT | 6784 | for SELECT queries. |
201 | 6797 | queries. In the second case, data from one table is used to insert or | 6785 | |
202 | 6798 | update rows in another table. For example, the UPDATE query in the example | 6786 | The second case is when data flows to another table as part of an INSERT or |
203 | 6799 | above has the usage: | 6787 | UPDATE. For example, the UPDATE query in the example above has the usage: |
204 | 6800 | 6788 | ||
205 | 6801 | SELECT DUAL | 6789 | SELECT DUAL |
206 | 6802 | 6790 | ||
207 | @@ -6804,9 +6792,9 @@ | |||
208 | 6804 | 6792 | ||
209 | 6805 | SET a.foo="bar", b.foo="bat" | 6793 | SET a.foo="bar", b.foo="bat" |
210 | 6806 | 6794 | ||
214 | 6807 | DUAL is used for any values that does not originate in a table, in this case the | 6795 | The tool uses DUAL for any values that do not originate in a table, in this case |
215 | 6808 | literal values "bar" and "bat". If that C<SET> clause were C<SET a.foo=b.foo> | 6796 | the literal values "bar" and "bat". If that C<SET> clause were C<SET |
216 | 6809 | instead, then the complete usage would be: | 6797 | a.foo=b.foo> instead, then the complete usage would be: |
217 | 6810 | 6798 | ||
218 | 6811 | Query_id: 0x1CD27577D202A339.1 | 6799 | Query_id: 0x1CD27577D202A339.1 |
219 | 6812 | UPDATE t1 | 6800 | UPDATE t1 |
220 | @@ -6820,20 +6808,15 @@ | |||
221 | 6820 | immediately above reflects an UPDATE query that updates rows in table C<t1> | 6808 | immediately above reflects an UPDATE query that updates rows in table C<t1> |
222 | 6821 | with data from table C<t2>. | 6809 | with data from table C<t2>. |
223 | 6822 | 6810 | ||
238 | 6823 | =item * Any other query type | 6811 | =item * Any other verb |
239 | 6824 | 6812 | ||
240 | 6825 | Any other query type, such as INSERT, UPDATE, DELETE, etc. may be a context. | 6813 | Any other verb, such as INSERT, UPDATE, DELETE, etc. may be a context. These |
241 | 6826 | All these types indicate that the table is written or altered in some way. | 6814 | verbs indicate that the query modifies data in some way. If a SELECT context |
242 | 6827 | If a SELECT context follows one of these types, then data is read from the | 6815 | follows one of these verbs, then the query reads data from the SELECT table and |
243 | 6828 | SELECT table and written to this table. This happens, for example, with | 6816 | writes it to this table. This happens, for example, with INSERT..SELECT or |
244 | 6829 | INSERT..SELECT or UPDATE queries that set column values using values from | 6817 | UPDATE queries that use values from tables instead of constant values. |
245 | 6830 | tables instead of constant values. | 6818 | |
246 | 6831 | 6819 | These query types are not supported: SET, LOAD, and multi-table DELETE. | |
233 | 6832 | These query types are not supported: | ||
234 | 6833 | |||
235 | 6834 | SET | ||
236 | 6835 | LOAD | ||
237 | 6836 | multi-table DELETE | ||
247 | 6837 | 6820 | ||
248 | 6838 | =item * JOIN | 6821 | =item * JOIN |
249 | 6839 | 6822 | ||
250 | @@ -6853,14 +6836,14 @@ | |||
251 | 6853 | WHERE t1 | 6836 | WHERE t1 |
252 | 6854 | WHERE t2 | 6837 | WHERE t2 |
253 | 6855 | 6838 | ||
255 | 6856 | Only unique tables are listed; that is why table C<t1> is listed only once. | 6839 | The tool lists only distinct tables; that is why table C<t1> is listed only |
256 | 6840 | once. | ||
257 | 6857 | 6841 | ||
258 | 6858 | =item * TLIST | 6842 | =item * TLIST |
259 | 6859 | 6843 | ||
264 | 6860 | The TLIST context lists tables that are accessed by the query but do not | 6844 | The TLIST context lists tables that the query accesses, but which do not appear |
265 | 6861 | appear in any other context. These tables are usually an implicit | 6845 | in any other context. These tables are usually an implicit cartesian join. For |
266 | 6862 | full cartesian join, so they should be avoided. For example, the query | 6846 | example, the query C<SELECT * FROM t1, t2> results in: |
263 | 6863 | C<SELECT * FROM t1, t2> results in: | ||
267 | 6864 | 6847 | ||
268 | 6865 | Query_id: 0xBDDEB6EDA41897A8.1 | 6848 | Query_id: 0xBDDEB6EDA41897A8.1 |
269 | 6866 | SELECT t1 | 6849 | SELECT t1 |
270 | @@ -6871,7 +6854,7 @@ | |||
271 | 6871 | First of all, there are two SELECT contexts, because C<SELECT *> selects | 6854 | First of all, there are two SELECT contexts, because C<SELECT *> selects |
272 | 6872 | rows from all tables; C<t1> and C<t2> in this case. Secondly, the tables | 6855 | rows from all tables; C<t1> and C<t2> in this case. Secondly, the tables |
273 | 6873 | are implicitly joined, but without any kind of join condition, which results | 6856 | are implicitly joined, but without any kind of join condition, which results |
275 | 6874 | in a full cartesian join as indicated by the TLIST context for each. | 6857 | in a cartesian join as indicated by the TLIST context for each. |
276 | 6875 | 6858 | ||
277 | 6876 | =back | 6859 | =back |
278 | 6877 | 6860 | ||
279 | @@ -6911,24 +6894,23 @@ | |||
280 | 6911 | 6894 | ||
281 | 6912 | type: string; default: DUAL | 6895 | type: string; default: DUAL |
282 | 6913 | 6896 | ||
289 | 6914 | Value to print for constant data. Constant data means all data not | 6897 | Table to print as the source for constant data (literals). This is any data not |
290 | 6915 | from tables (or subqueries since subqueries are not supported). For example, | 6898 | retrieved from tables (or subqueries, because subqueries are not supported). |
291 | 6916 | real constant values like strings ("foo") and numbers (42), and data from | 6899 | This includes literal values such as strings ("foo") and numbers (42), or |
292 | 6917 | functions like C<NOW()>. For example, in the query | 6900 | functions such as C<NOW()>. For example, in the query C<INSERT INTO t (c) |
293 | 6918 | C<INSERT INTO t (c) VALUES ('a')>, the string 'a' is constant data, so the | 6901 | VALUES ('a')>, the string 'a' is constant data, so the table usage report is: |
288 | 6919 | table usage report is: | ||
294 | 6920 | 6902 | ||
295 | 6921 | INSERT t | 6903 | INSERT t |
296 | 6922 | SELECT DUAL | 6904 | SELECT DUAL |
297 | 6923 | 6905 | ||
300 | 6924 | The first line indicates that data is inserted into table C<t> and the second | 6906 | The first line indicates that the query inserts data into table C<t>, and the |
301 | 6925 | line indicates that that data comes from some constant value. | 6907 | second line indicates that the inserted data comes from some constant value. |
302 | 6926 | 6908 | ||
303 | 6927 | =item --[no]continue-on-error | 6909 | =item --[no]continue-on-error |
304 | 6928 | 6910 | ||
305 | 6929 | default: yes | 6911 | default: yes |
306 | 6930 | 6912 | ||
308 | 6931 | Continue parsing even if there is an error. | 6913 | Continue to work even if there is an error. |
309 | 6932 | 6914 | ||
310 | 6933 | =item --create-table-definitions | 6915 | =item --create-table-definitions |
311 | 6934 | 6916 | ||
312 | @@ -6939,9 +6921,9 @@ | |||
313 | 6939 | names, you can save the output of C<mysqldump --no-data> to one or more files | 6921 | names, you can save the output of C<mysqldump --no-data> to one or more files |
314 | 6940 | and specify those files with this option. The tool will parse all | 6922 | and specify those files with this option. The tool will parse all |
315 | 6941 | C<CREATE TABLE> definitions from the files and use this information to | 6923 | C<CREATE TABLE> definitions from the files and use this information to |
319 | 6942 | qualify table and column names. If a column name is used in multiple tables, | 6924 | qualify table and column names. If a column name appears in multiple tables, |
320 | 6943 | or table name is used in multiple databases, these duplicates cannot be | 6925 | or a table name appears in multiple databases, the ambiguities cannot be |
321 | 6944 | qualified. | 6926 | resolved. |
322 | 6945 | 6927 | ||
323 | 6946 | =item --daemonize | 6928 | =item --daemonize |
324 | 6947 | 6929 | ||
325 | @@ -6964,7 +6946,8 @@ | |||
326 | 6964 | 6946 | ||
327 | 6965 | type: DSN | 6947 | type: DSN |
328 | 6966 | 6948 | ||
330 | 6967 | EXPLAIN EXTENDED queries on this host to fully qualify table and column names. | 6949 | A server to execute EXPLAIN EXTENDED queries. This may be necessary to resolve |
331 | 6950 | ambiguous (unqualified) column and table names. | ||
332 | 6968 | 6951 | ||
333 | 6969 | =item --filter | 6952 | =item --filter |
334 | 6970 | 6953 | ||
335 | @@ -6972,89 +6955,13 @@ | |||
336 | 6972 | 6955 | ||
337 | 6973 | Discard events for which this Perl code doesn't return true. | 6956 | Discard events for which this Perl code doesn't return true. |
338 | 6974 | 6957 | ||
422 | 6975 | This option is a string of Perl code or a file containing Perl code that gets | 6958 | This option is a string of Perl code or a file containing Perl code that is |
423 | 6976 | compiled into a subroutine with one argument: $event. This is a hashref. | 6959 | compiled into a subroutine with one argument: $event. If the given value is a |
424 | 6977 | If the given value is a readable file, then pt-table-usage reads the entire | 6960 | readable file, then pt-table-usage reads the entire file and uses its contents |
425 | 6978 | file and uses its contents as the code. The file should not contain | 6961 | as the code. |
426 | 6979 | a shebang (#!/usr/bin/perl) line. | 6962 | |
427 | 6980 | 6963 | Filters are implemented in the same fashion as in the pt-query-digest tool, so | |
428 | 6981 | If the code returns true, the chain of callbacks continues; otherwise it ends. | 6964 | please refer to its documentation for more information. |
346 | 6982 | The code is the last statement in the subroutine other than C<return $event>. | ||
347 | 6983 | The subroutine template is: | ||
348 | 6984 | |||
349 | 6985 | sub { $event = shift; filter && return $event; } | ||
350 | 6986 | |||
351 | 6987 | Filters given on the command line are wrapped inside parentheses like like | ||
352 | 6988 | C<( filter )>. For complex, multi-line filters, you must put the code inside | ||
353 | 6989 | a file so it will not be wrapped inside parentheses. Either way, the filter | ||
354 | 6990 | must produce syntactically valid code given the template. For example, an | ||
355 | 6991 | if-else branch given on the command line would not be valid: | ||
356 | 6992 | |||
357 | 6993 | --filter 'if () { } else { }' # WRONG | ||
358 | 6994 | |||
359 | 6995 | Since it's given on the command line, the if-else branch would be wrapped inside | ||
360 | 6996 | parentheses which is not syntactically valid. So to accomplish something more | ||
361 | 6997 | complex like this would require putting the code in a file, for example | ||
362 | 6998 | filter.txt: | ||
363 | 6999 | |||
364 | 7000 | my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok | ||
365 | 7001 | |||
366 | 7002 | Then specify C<--filter filter.txt> to read the code from filter.txt. | ||
367 | 7003 | |||
368 | 7004 | If the filter code won't compile, pt-table-usage will die with an error. | ||
369 | 7005 | If the filter code does compile, an error may still occur at runtime if the | ||
370 | 7006 | code tries to do something wrong (like pattern match an undefined value). | ||
371 | 7007 | pt-table-usage does not provide any safeguards so code carefully! | ||
372 | 7008 | |||
373 | 7009 | An example filter that discards everything but SELECT statements: | ||
374 | 7010 | |||
375 | 7011 | --filter '$event->{arg} =~ m/^select/i' | ||
376 | 7012 | |||
377 | 7013 | This is compiled into a subroutine like the following: | ||
378 | 7014 | |||
379 | 7015 | sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; } | ||
380 | 7016 | |||
381 | 7017 | It is permissible for the code to have side effects (to alter C<$event>). | ||
382 | 7018 | |||
383 | 7019 | You can find an explanation of the structure of $event at | ||
384 | 7020 | L<http://code.google.com/p/maatkit/wiki/EventAttributes>. | ||
385 | 7021 | |||
386 | 7022 | Here are more examples of filter code: | ||
387 | 7023 | |||
388 | 7024 | =over | ||
389 | 7025 | |||
390 | 7026 | =item Host/IP matches domain.com | ||
391 | 7027 | |||
392 | 7028 | --filter '($event->{host} || $event->{ip} || "") =~ m/domain.com/' | ||
393 | 7029 | |||
394 | 7030 | Sometimes MySQL logs the host where the IP is expected. Therefore, we | ||
395 | 7031 | check both. | ||
396 | 7032 | |||
397 | 7033 | =item User matches john | ||
398 | 7034 | |||
399 | 7035 | --filter '($event->{user} || "") =~ m/john/' | ||
400 | 7036 | |||
401 | 7037 | =item More than 1 warning | ||
402 | 7038 | |||
403 | 7039 | --filter '($event->{Warning_count} || 0) > 1' | ||
404 | 7040 | |||
405 | 7041 | =item Query does full table scan or full join | ||
406 | 7042 | |||
407 | 7043 | --filter '(($event->{Full_scan} || "") eq "Yes") || (($event->{Full_join} || "") eq "Yes")' | ||
408 | 7044 | |||
409 | 7045 | =item Query was not served from query cache | ||
410 | 7046 | |||
411 | 7047 | --filter '($event->{QC_Hit} || "") eq "No"' | ||
412 | 7048 | |||
413 | 7049 | =item Query is 1 MB or larger | ||
414 | 7050 | |||
415 | 7051 | --filter '$event->{bytes} >= 1_048_576' | ||
416 | 7052 | |||
417 | 7053 | =back | ||
418 | 7054 | |||
419 | 7055 | Since L<"--filter"> allows you to alter C<$event>, you can use it to do other | ||
420 | 7056 | things, like create new attributes. | ||
421 | 7057 | |||
429 | 7058 | 6965 | ||
430 | 7059 | =item --help | 6966 | =item --help |
431 | 7060 | 6967 | ||
432 | @@ -7070,9 +6977,8 @@ | |||
433 | 7070 | 6977 | ||
434 | 7071 | type: string | 6978 | type: string |
435 | 7072 | 6979 | ||
439 | 7073 | Identify each event using this attribute. If not ID attribute is given, then | 6980 | Identify each event using this attribute. The default is to use a query ID, |
440 | 7074 | events are identified with the query's checksum: an MD5 hex checksum of the | 6981 | which is an MD5 checksum of the query's fingerprint. |
438 | 7075 | query's fingerprint. | ||
441 | 7076 | 6982 | ||
442 | 7077 | =item --log | 6983 | =item --log |
443 | 7078 | 6984 | ||
444 | @@ -7115,10 +7021,7 @@ | |||
445 | 7115 | 7021 | ||
446 | 7116 | type: string | 7022 | type: string |
447 | 7117 | 7023 | ||
452 | 7118 | Analyze only this given query. If you want to analyze the table usage of | 7024 | Analyze the specified query instead of reading a log file. |
449 | 7119 | one simple query by providing on the command line instead of reading it | ||
450 | 7120 | from a slow log file, then specify that query with this option. The default | ||
451 | 7121 | L<"--id-attribute"> will be used which is the query's checksum. | ||
453 | 7122 | 7025 | ||
454 | 7123 | =item --read-timeout | 7026 | =item --read-timeout |
455 | 7124 | 7027 | ||
456 | @@ -7127,7 +7030,7 @@ | |||
457 | 7127 | Wait this long for an event from the input; 0 to wait forever. | 7030 | Wait this long for an event from the input; 0 to wait forever. |
458 | 7128 | 7031 | ||
459 | 7129 | This option sets the maximum time to wait for an event from the input. If an | 7032 | This option sets the maximum time to wait for an event from the input. If an |
461 | 7130 | event is not received after the specified time, the script stops reading the | 7033 | event is not received after the specified time, the tool stops reading the |
462 | 7131 | input and prints its reports. | 7034 | input and prints its reports. |
463 | 7132 | 7035 | ||
464 | 7133 | This option requires the Perl POSIX module. | 7036 | This option requires the Perl POSIX module. |