pt-online-schema-change gets stuck looking for its own _new table

Bug #1195628 reported by Elton M. Labajo
28
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter

Bug Description

running this command

PTDEBUG=1 pt-online-schema-change --max-load Threads_running=100 --critical-load Threads_running=450 --nocheck-replication-filters --execute --alter 'engine=innodb' t=orders_status_history,D=xxxx,h=x.x.x.x > pt-osc.log 2>&1

get's stuck and checking the debug file shows these logs.

 tail -f pt-osc.log
# TableParser:3293 13801 Table does not exist
# TableParser:3279 13801 Checking `xxxx`.`_orders_status_history_new`
# TableParser:3283 13801 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new'
# TableParser:3293 13801 Table does not exist
# TableParser:3279 13801 Checking `xxxx`.`_orders_status_history_new`
# TableParser:3283 13801 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new'
# TableParser:3293 13801 Table does not exist
# TableParser:3279 13801 Checking `xxx`.`_orders_status_history_new`
# TableParser:3283 13801 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new'
# TableParser:3293 13801 Table does not exist
# TableParser:3279 13801 Checking `xxxx`.`_orders_status_history_new`
# TableParser:3283 13801 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new

however the table does exist.

 SHOW TABLES FROM `xxxx` LIKE '\_orders\_status\_history\_new';
+-----------------------------------------------------+
| Tables_in_xxxx (\_orders\_status\_history\_new) |
+-----------------------------------------------------+
| _orders_status_history_new |
+-----------------------------------------------------+
1 row in set (0.01 sec)

see attached log file pt-osc.log for further details.

Related branches

Revision history for this message
Elton M. Labajo (elton-labajo) wrote :
description: updated
description: updated
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Interesting, thanks for the report and log. We'll look into it.

tags: added: pt-online-schema-change
Changed in percona-toolkit:
status: New → Confirmed
Changed in percona-toolkit:
milestone: none → 2.2.5
summary: - running pt-online-schema-change gets stuck and the temp file created
- _table_name_new the size doesn't grow
+ pt-online-schema-change gets stuck looking for its own _table_new table
summary: - pt-online-schema-change gets stuck looking for its own _table_new table
+ pt-online-schema-change gets stuck looking for its own _new table
Changed in percona-toolkit:
importance: Undecided → Medium
Changed in percona-toolkit:
milestone: 2.2.5 → none
Changed in percona-toolkit:
milestone: none → 2.2.6
Changed in percona-toolkit:
importance: Medium → High
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

I think this is not a bug but poor feedback from the tool. After the tool creates the new table, it waits for the new table to appear on all slaves. So either 1) there's a slave that's really lagged or 2) there are replication filters preventing the CREATE TABLE _newt_table from replicating to one or more slaves.

A workaround is: --recursion-method none. This prevents the tool from doing anything with slaves.

The fix here is making the tool report what it's doing so users aren't left wondering.

tags: added: percona-37252
Revision history for this message
Jacky Leung (jacky-5) wrote :

That doesn't sound right, i have few time attempt to run this tool to run on a DB with no slave lagging behind. but the tool just stuck and not doing/printing anything for like 2 hours (which if i manually run it will bring do an alter table and lock up table for 30mins).

a proper first step maybe adding more logging around it as i am not sure how to reproduce it

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Jacky, running with PTDEBUG=1 will confirm if the tool was waiting for a slave, as in the case provided by Elton.

Changed in percona-toolkit:
assignee: nobody → Daniel Nichter (daniel-nichter)
status: Confirmed → In Progress
Revision history for this message
Jacky Leung (jacky-5) wrote :

Daniel, is that environment variable?

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Jacky, it's PTDEBUG, so run the tool like:

PTDEBUG=1 pt-online-schema-change ... > dbg 2>&1

If it gets stuck, CTRL-C to kill it. Then dbg will contain a lot of debug output. All debug output is printed to STDERR.

Changed in percona-toolkit:
status: In Progress → Fix Committed
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Jacky Leung (jacky-5) wrote :

Daniel thanks now i can see the debug log and found the problem.

I have a server that is a bit multi purpose. It got a standalone mysql for solr (not a slave of the master) and that server also have a open replicator running to get the binlog data for incremental update for elastic search.

now here is the problem, the pt-online-schema-change mistaken my standalone mysql server is a slave of the master and didn't realise in fact it is a java open replicator replicating. so that standalone will never have this new table and then it just stuck forever to wait for that server replicate.

From our server setup point of view, we don't need a seperate server, and also adding new server will add additional cost (of course) so we will not separate the server (not to mention it will require us to change lot of configuration).

For now i will by pass the slave check features with recursion method none, but i think it will be better for the pt-online-schema-change to check that the server is actually a slave.

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-361

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.