pt-table-checksum reports wrong number of DIFFS

Bug #1030031 reported by Daniel Nichter
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit moved to https://jira.percona.com/projects/PT
Fix Released
High
Daniel Nichter

Bug Description

Due to this code:

            # Check each slave for checksum diffs.
            foreach my $slave ( @$slaves ) {
               eval {
                  my $diffs = $rc->find_replication_differences(
                     dbh => $slave->dbh(),
                     repl_table => $repl_table,
                     where => "db='$tbl->{db}' AND tbl='$tbl->{tbl}'",
                  );
                  PTDEBUG && _d(scalar @$diffs, 'checksum diffs on',
                     $slave->name());
                  if ( @$diffs ) {
                     $tbl->{checksum_results}->{diffs} = scalar @$diffs;
                  }
               };

Only the last slave's number of diffs is reported for the table. So with master --> slave1 --> slave2, if slave1 has 2 diffs (in 2 different chunks) but slave2 only has 1, the tool will report 1 for DIFFS for the table. Solution: report the max(diffs) for all slaves for each table, i.e. 2 in this case.

As a reminder, from the tool's docs:

"""
=item DIFFS

The number of chunks that differ from the master on one or more replicas.
"""

So 2 is the correct number to report because, all in all, 2 chunks do differ, not just 1.

Related branches

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

2nd problem:

"""
brian: If slave1 reported two chunks with diffs, and slave2 reported one chunk with a diff, it doesn't mean that the differing chunk in slave2 is one of the chunks differing in slave1
"""

So if chunk# 5 and #7 on slave1 differ, but chunk# 10 on slave2 differs, too, then that's actually 3 chunks that differ, not just 2.

Changed in percona-toolkit:
status: In Progress → Fix Committed
Revision history for this message
Baron Schwartz (baron-xaprb) wrote :

My memory of the original spec for this was that we should count the number of distinct chunk IDs that are different. Looks like we aren't doing that for some reason.

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

We weren't, but now we are:

                  if ( scalar @$diffs ) {
                     # "chunk" is the chunk number. See the SELECT
                     # statement in RowChecksum::find_replication_differences()
                     # for the full list of columns.
                     map { $diff_chunks{ $_->{chunk} }++ } @$diffs;
                  }

Then scalar keys %$diff_chunks is the number of distinct chunks that differ on all slaves.

Changed in percona-toolkit:
status: Fix Committed → Fix Released
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PT-322

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.