pt-table-sync false-positive error "Cannot nibble table because MySQL chose no index instead of the PRIMARY index"

Bug #918056 reported by Miguel Angel Nieto
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter

Bug Description

If I try to use pt-table-sync on the employees database (https://launchpad.net/test-db/) to correct some mistakes I receive this error message:

# pt-table-sync --print --replicate percona.checksums u=root,p=msandbox,h=127.0.0.1,P=26165
Cannot nibble table `employees`.`dept_emp` because MySQL chose no index instead of the `PRIMARY` index at /usr/bin/pt-table-sync line 4748. while doing employees.dept_emp on SBslave1
Cannot nibble table `employees`.`salaries` because MySQL chose no index instead of the `PRIMARY` index at /usr/bin/pt-table-sync line 4748. while doing employees.salaries on SBslave1
Cannot nibble table `employees`.`dept_emp` because MySQL chose no index instead of the `PRIMARY` index at /usr/bin/pt-table-sync line 4748. while doing employees.dept_emp on SBslave2
Cannot nibble table `employees`.`salaries` because MySQL chose no index instead of the `PRIMARY` index at /usr/bin/pt-table-sync line 4748. while doing employees.salaries on SBslave2

slave1 [localhost] {msandbox} (employees) > show create table dept_emp\G
*************************** 1. row ***************************
       Table: dept_emp
Create Table: CREATE TABLE `dept_emp` (
  `emp_no` int(11) NOT NULL,
  `dept_no` char(4) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`dept_no`),
  KEY `emp_no` (`emp_no`),
  KEY `dept_no` (`dept_no`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

slave1 [localhost] {msandbox} (employees) > show create table salaries\G
*************************** 1. row ***************************
       Table: salaries
Create Table: CREATE TABLE `salaries` (
  `emp_no` int(11) NOT NULL,
  `salary` int(11) NOT NULL,
  `from_date` date NOT NULL,
  `to_date` date NOT NULL,
  PRIMARY KEY (`emp_no`,`from_date`),
  KEY `emp_no` (`emp_no`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Related branches

tags: added: chunking pt-table-sync
Changed in percona-toolkit:
status: New → Confirmed
Revision history for this message
Dave Juntgen (djuntgen) wrote :

Daniel - can you give a reason to why and maybe a quick work around?

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

I think we're going to need to make the code handle this by testing whether EXPLAIN shows not only no index chosen, but also a large number of rows (larger than expected). If for some reason it's ignoring the index but there aren't many rows, that should be OK.

Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

Dave, a workaround is probably to comment out that test in the code if you're sure it's OK (as per my previous comment on this bug)

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Dave or Miguel: can you reproduce this problem? Re Baron's comment, if the problem can be reproduced, I'd like to see the full EXPLAIN output, which PTDEBUG should show.

Changed in percona-toolkit:
importance: Undecided → Medium
milestone: none → 2.1.4
Changed in percona-toolkit:
assignee: nobody → Brian Fraser (fraserbn)
Brian Fraser (fraserbn)
Changed in percona-toolkit:
status: Confirmed → Incomplete
Revision history for this message
Brian Fraser (fraserbn) wrote :

It appears that this bug only affected earlier releases of the toolkit, so I'm going to mark it as Invalid. If someone manages to reproduce it with newer versions of pt-table-sync, please either reply here or file a new bug.

Changed in percona-toolkit:
status: Incomplete → Invalid
Changed in percona-toolkit:
milestone: 2.1.4 → none
Revision history for this message
kedar (k-vaijanapurkar) wrote :

Using percona toolkit 2.1.7 I'm still getting the error:

perl percona-toolkit-2.1.7/bin/pt-table-sync --print --verbose --set-vars wait_timeout=30000 -t xxxxxxx --ignore-columns xxxxxxx --no-foreign-key-checks --sync-to-master --chunk-size=200 --chunk-column ID h=xxxxxxx,u=xxxxxxx,p=xxxxxxx >> 12282012_4.log
Cannot nibble table `xxxxxxx`.`xxxxxxx` because MySQL chose no index instead of the `PRIMARY` index at percona-toolkit-2.1.7/bin/pt-table-sync line 5407. while doing DB.TABLE on HOST.

Do we have any workaround?

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

There's no workaround because, afaik, this is the first reported case of this happening. It's generally not safe to workaround unless you understand why MySQL has chosen no index. If the table is small, maybe MySQL thinks it's faster to scan the whole table--and it could be correct.

I'm not sure yet if this a bug or the tool doing what it's supposed to. If you can easily reproduce this and provide PTDEBUG output, we could see if it's a bug. Else, the only workaround of sorts is to comment out the if block that starts like:

      if ( $self->{nibble} == 0 && !$self->{small_table} ) {
         my $explain_index = $self->__get_explain_index($sql);
         if ( lc($explain_index || '') ne lc($s->{index}) ) {
            die 'Cannot nibble table '.$q->quote($args{database}, $args{table})

I.e. prefix each line in that whole if { } block with #. At the moment, there's no easy way to implement an official workaround.

summary: - "Cannot nibble table" when running pt-table-sync
+ pt-table-sync can't nibble table because MySQL chose no index
Changed in percona-toolkit:
status: Invalid → Triaged
assignee: Brian Fraser (fraserbn) → nobody
importance: Medium → Undecided
Revision history for this message
Przemek (pmalkowski) wrote : Re: pt-table-sync can't nibble table because MySQL chose no index

I have fully reproducible case here.
To repeat the problem, load the history_master.sql table to master node, then history_slave.sql to slave node. Slave will have slightly more rows then master.
Then on master:
pt-table-checksum --databases=drupal
pt-table-sync --execute --print --replicate percona.checksums localhost

Also included full query log from pt-table-sync session as well as the tool's debug log.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

Thanks Przemek, your test case works:

Cannot nibble table `test`.`history` because MySQL chose no index instead of the `PRIMARY` index at bin/pt-table-sync line 5405. while doing test.history on 127.0.0.1

I'll look into and fix this.

Changed in percona-toolkit:
status: Triaged → In Progress
assignee: nobody → Daniel Nichter (daniel-nichter)
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

I found the problem. First, the differing chunk is what we can an "oob" (out-of-bounds) chunk:

master:
*************************** 3. row ***************************
            db: test
           tbl: history
         chunk: 3
    chunk_time: 0.000658
   chunk_index: PRIMARY
lower_boundary: NULL
upper_boundary: 21,21,1045
      this_crc: 0
      this_cnt: 0
    master_crc: 0
    master_cnt: 0
            ts: 2013-01-23 08:00:17

slave:
*************************** 3. row ***************************
            db: test
           tbl: history
         chunk: 3
    chunk_time: 0.000658
   chunk_index: PRIMARY
lower_boundary: NULL
upper_boundary: 21,21,1045
      this_crc: 0
      this_cnt: 49
    master_crc: 0
    master_cnt: 0
            ts: 2013-01-23 08:00:17

this_cnt differe, and notice that lower_boundary=NULL: this (or upper_boundary=NULL) signal an oob chunk. When lower_boundary=NULL it's pt-table-checksum's chunk WHERE index_cols < first_row. The reason for that is precisely this test case:

master has row 5,6,7,8
slave has rows 1,2,3,4,5,6,7,8

So doing WHERE rows < 5 on master will detect the out-of-sync rows 1,2,3,4 on the slave.

Problem is, pt-table-checksum and pt-table-sync use different chunking methods. We make them compatible by converting ptc boundaries to pts boundaries in sub diff_where(), but that sub is not currently handling oob chunks correctly, resulting in a chunk like:

SELECT /*nibble boundary 0*/ `uid`,`nid`,`timestamp` FROM `test`.`history` FORCE INDEX (`PRIMARY`) WHERE (((`uid` > NULL) OR (`uid` = NULL AND `nid` >= NULL)) AND ((`uid` < '21') OR (`uid` = '21' AND `nid` <= '1045'))) ORDER BY `uid`,`nid` LIMIT 999, 1

which is an impossible WHERE clause, hence MySQL chooses no index.

So the fix is making diff_where() handle oob chunks correctly.

summary: - pt-table-sync can't nibble table because MySQL chose no index
+ pt-table-sync false-positive error "Cannot nibble table because MySQL
+ chose no index instead of the PRIMARY index"
tags: added: false-positive-error
Changed in percona-toolkit:
importance: Undecided → High
milestone: none → 2.1.9
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

This has been tested and fixed, will be released in2.1.9.

Changed in percona-toolkit:
status: In Progress → Fix Committed
Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Michelle (sup7ort) wrote :

I'm using pt-table-sync 2.2.16 and the bug is still present. Doing a table sync I get:

Cannot nibble table `asterisk`.`kvstore` because MySQL chose no index instead of the `index3` index at /usr/bin/pt-table-sync line 5484. while doing asterisk.kvstore on localhost.

Table is:
+--------+-----------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-----------+------+-----+---------+-------+
| module | char(64) | NO | MUL | NULL | |
| key | char(255) | NO | MUL | NULL | |
| val | longblob | YES | | NULL | |
| type | char(16) | YES | | NULL | |
| id | char(255) | YES | | NULL | |
+--------+-----------+------+-----+---------+-------+
5 rows in set (0.00 sec)

and indeces:
show index from kvstore;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| kvstore | 0 | index3 | 1 | module | A | NULL | NULL | NULL | | BTREE | | |
| kvstore | 0 | index3 | 2 | key | A | NULL | 50 | NULL | | BTREE | | |
| kvstore | 0 | index3 | 3 | id | A | NULL | 50 | NULL | YES | BTREE | | |
| kvstore | 1 | index1 | 1 | key | A | NULL | 50 | NULL | | BTREE | | |
| kvstore | 1 | index5 | 1 | module | A | NULL | NULL | NULL | | BTREE | | |
| kvstore | 1 | index5 | 2 | id | A | NULL | 50 | NULL | YES | BTREE | | |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
6 rows in set (0.00 sec)

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-289

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.