pt-online-schema-change fails when table is empty

Bug #1020997 reported by Moritz Krinke
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
Medium
Daniel Nichter
2.0
Won't Fix
Undecided
Unassigned
2.1
Fix Released
Medium
Daniel Nichter

Bug Description

When issuing

pt-online-schema-change --alter "ENGINE=InnoDB" --recursion-method none --execute D=database,t=table

where the table does not contain a single row, pt-online-schema-change fails

Altering `database`.`table`...
Creating new table...
Created new table database._table_new OK.
Altering new table...
Altered `database`.`_table_new` OK.
Creating triggers...
Created triggers OK.
Copying approximately 4686227 rows...
Dropping triggers...
Dropped triggers OK.
Dropping new table...
Dropped new table OK.
`database`.`table` was not altered.
Error copying rows from `database`.`table` to `database`.`_table_new`: I need a vals argument at /usr/bin/pt-online-schema-change line 5239.

I think this is due to the fact that XtraDB stores an approximate number of rows per table somewhere but this information is not beeing updated live. I had this issue with a couple of tables, in all of them i deleted all rows prior to executing the above pt-online-schema-change command.

A temporary "fix" is to insert one row in those tables, running pt-online-schema-change, and deleted that row.
After doing this pt-online-schema-change runs without errors even on an empty table.

I'm using

mysqld Ver 5.1.59-rel13.0-log for debian-linux-gnu on x86_64 ((Percona Server (GPL), 13.0, Revision 325))
pt-online-schema-change 2.1.2

Related branches

Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I tested with pt-osc 2.1.2 and mysql 5.1.63-rel13.4 Percona Server (GPL), 13.4, Revision 443, but couldn't reproduce it it. (tested with full table, and then truncated and tested with both, even with innodb_stats_auto_update = OFF).

However, from what I checked in the code, you should be able to alleviate this --nocheck-plan for those tables.

Revision history for this message
Moritz Krinke (mad-krinke) wrote :

i havent deleted the table using truncate but using DELETE FROM, maybe this behaves a bit differently in that manner.

tags: added: crash empty-table pt-online-schema-change
Changed in percona-toolkit:
milestone: none → 2.1.3
status: New → Triaged
Changed in percona-toolkit:
importance: Undecided → Medium
Changed in percona-toolkit:
assignee: nobody → Daniel Nichter (daniel-nichter)
status: Triaged → In Progress
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

True, EXPLAIN row estimates can be off by a bit, but 4,686,227 vs. 0 is a pretty big difference. Maybe XtraDB allows a wider margin of error, but I can't reproduce this either with regular InnoDB. Perhaps something else is causing this problem.

Moritz, if you can reproduce the problem, can you run with PTDEBUG=1 and attach the output please?

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

I vaguely recall some DELETE performance optimizations that might be a factor, so I'll ask the XtraDB developers what they think about this.

Revision history for this message
Moritz Krinke (mad-krinke) wrote :

Daniel,
no problem. I have attached the output with PTDEBUG enabled. Hope it helps!

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Moritz, thanks for the debug. Vadim said that DELETE shouldn't cause a statistics problem like this, else it's a variable setting or a bug. From the debug, we see:

# rows => '6031531',

So MySQL is definitely reporting that many rows, which causes the tool to do:

SELECT /*!40001 SQL_NO_CACHE */ `post_id` FROM `database`.`big_test_table` FORCE INDEX(`PRIMARY`) ORDER BY `post_id` LIMIT 1 /*first lower boundary*/

But no rows are returned, which causes the crash because the tool expects rows at this point. I can make the tool handle when that SQL returns no rows (i.e. if there's no first row, then there's no rows at all), but I'd also like to find out why MySQL is so wrong about the row count.

What is the value of innodb_stats_auto_update on your system? And while we're at it, what's the row for the table in question from INFORMATION_SCHEMA.INNODB_TABLE_STATS?

Revision history for this message
Moritz Krinke (mad-krinke) wrote :

Daniel,
thanks for working on this.

The metrics you requested:

Variable_name: innodb_stats_auto_update
        Value: 1

        rows: 5024984
  clust_size: 200960
  other_size: 53779
    modified: 170620

(When i reproduced the problem yesterday i did that on a copy of that table, i deleted the copy therefore i can only provide you with stats from original. The original problem however occured on the original, i conclude it does not make any difference)

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Moritz,

Thanks for the values. They may explain why the server hasn't update the stats yet. According to http://www.percona.com/doc/percona-server/5.5/diagnostics/innodb_stats.html?id=percona-server:features:innodb_stats

"""
If the value of modified column exceeds “rows / 16” or 2000000000, the statistics recalculation is done when innodb_stats_auto_update == 1.
"""

5024984 / 16 = 314,061.5 which is < 170620 (modified), so no update. I'll have Vadim double check my reading of this. If correct, it at least explains why MySQL/XtraDB is behaving this way.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Moritz,

Could you download the latest pt-osc from trunk (download link at http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/2.1/view/head:/bin/pt-online-schema-change) and try again please? It should say "<table> is empty, no rows to copy." and then proceed to work normally.

Since I can't reproduce this locally, I wasn't able to write a test case, but the new code is pretty simple and it doesn't break any of the existing tests.

Revision history for this message
Moritz Krinke (mad-krinke) wrote :

Daniel,

i verified your fix, for my specific test case it does work now.
Thanks again!

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Excellent, thanks Moritz.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-550

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.