pt-archiver --sleep conflicts with bulk operations

Bug #979092 reported by Miguel Angel Nieto
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter
2.0
Fix Released
Medium
Daniel Nichter
2.1
Fix Released
High
Daniel Nichter

Bug Description

for example, if you use --limit 1000 and --sleep 60 instead of fetching 1000 rows and then sleep 60 seconds, it sleeps 60 seconds every row. So for example purge processes needs lot of time:

pt-archiver \
--source h=localhost,D=test,t=t,u=root \
--purge \
--where 'i > 1' \
--limit 5 \
--sleep 5 \
--bulk-delete \
--pid=/tmp/mk-archiver-sessions.pid

# pt_archiver:4201 2597 Got another row in this chunk
# pt_archiver:4218 2597 Sleeping 5
# pt_archiver:4201 2597 Got another row in this chunk
# pt_archiver:4218 2597 Sleeping 5
# pt_archiver:4201 2597 Got another row in this chunk
# pt_archiver:4218 2597 Sleeping 5
# pt_archiver:4201 2597 Got another row in this chunk
# pt_archiver:4218 2597 Sleeping 5
# pt_archiver:4112 2597 No more rows in this chunk; doing bulk operations
# pt_archiver:4165 2597 Bulk deleted 5 rows

So, it deletes data in groups of 5 rows, but it stops after every fetched row. If I remove the sleep option:

pt-archiver \
--source h=localhost,D=test,t=t,u=root \
--purge \
--where 'i > 1' \
--limit 5 \
--bulk-delete \
--pid=/tmp/mk-archiver-sessions.pid

# pt_archiver:4201 2602 Got another row in this chunk
# pt_archiver:4201 2602 Got another row in this chunk
# pt_archiver:4201 2602 Got another row in this chunk
# pt_archiver:4201 2602 Got another row in this chunk
# pt_archiver:4112 2602 No more rows in this chunk; doing bulk operations
# pt_archiver:4165 2602 Bulk deleted 5 rows

To solve the problem I've moved the sleep section of the code to the end of the "This code is for the bulk archiving functionality" section.

# pt_archiver:4211 2110 Got another row in this chunk
# pt_archiver:4211 2110 Got another row in this chunk
# pt_archiver:4211 2110 Got another row in this chunk
# pt_archiver:4211 2110 Got another row in this chunk
# pt_archiver:4112 2110 No more rows in this chunk; doing bulk operations
# pt_archiver:4165 2110 Bulk deleted 5 rows
# pt_archiver:4179 2110 Sleeping 5
# pt_archiver:4191 2110 Fetching rows in next chunk
# pt_archiver:4196 2110 Fetched 5 rows
# pt_archiver:4211 2110 Got another row in this chunk
# pt_archiver:4211 2110 Got another row in this chunk
# pt_archiver:4211 2110 Got another row in this chunk
# pt_archiver:4211 2110 Got another row in this chunk
# pt_archiver:4112 2110 No more rows in this chunk; doing bulk operations
# pt_archiver:4165 2110 Bulk deleted 5 rows
# pt_archiver:4179 2110 Sleeping 5

Related branches

Revision history for this message
Miguel Angel Nieto (miguelangelnieto) wrote :
description: updated
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

The patch is only safe with a bulk operation like --bulk-delete. The problem is this: the tool has two modes of operation per-row and bulk archiving. --sleep is meant for per-row, to make the tool do:

* SELECT row
* Archive that row
* --sleep

So, this is mode is slow, careful nibbling/archiving without overloading the server. The other mode does:

* SELECT row
* Repeat until last row
* Bulk archive first-last row

Then the tool should --sleep after the bulk archive, which the patch makes happens. But currently the tool is doing --sleep between fetching each row, but it doesn't need to because it's not operating on those rows, it's just trying to advance to the last row so it can do the bulk archive. Then it should sleep (between bulk operations).

summary: - pt-archiver stops --sleep X seconds after fetching one row, not every
- --limit X rows
+ pt-archiver --sleep conflicts with bulk operations
tags: added: pt-archiver sleep
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Customer was using http://code.google.com/p/maatkit/source/browse/trunk/mk-archiver/mk-archiver?r=3701

My comment #2 is wrong. The tool should --sleep between SELECT --limit rows, not between each delete. So the sleep code simply needs to be moved back to where it used to be: just before SELECT --limit rows. The tool is supposed to delete rows one by one as fast as it can, or bulk delete rows; --sleep doesn't apply to the archiving of rows, only to the selecting of them.

tags: added: percona-22758
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-306

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.