Percona Toolkit moved to https://jira.percona.com/projects/PT

Merge lp:~percona-toolkit-dev/percona-toolkit/fix-1127450-pt-archiver-bulk-insert-encoding into lp:percona-toolkit/2.2

fix-1127450-pt-archiver-bulk-insert-encoding
Merge into 2.2

Proposed by Brian Fraser on 2013-04-12

Status:

Merged

Approved by:

Daniel Nichter on 2013-04-12

Approved revision:

569

Merged at revision:

577

Proposed branch:

lp:~percona-toolkit-dev/percona-toolkit/fix-1127450-pt-archiver-bulk-insert-encoding

Merge into:

lp:percona-toolkit/2.2

Diff against target:

161 lines (+77/-19)

3 files modified

bin/pt-archiver (+29/-19)
t/pt-archiver/bulk_insert.t (+36/-0)
t/pt-archiver/samples/bug_1127450.sql (+12/-0)

To merge this branch:

bzr merge lp:~percona-toolkit-dev/percona-toolkit/fix-1127450-pt-archiver-bulk-insert-encoding

High

Fix Released

Link a bug report

Reviewer	Review Type	Date Requested	Status
Daniel Nichter		2013-04-12	Approve on 2013-04-12
Review via email: mp+158686@code.launchpad.net

Revision history for this message

Daniel Nichter (daniel-nichter) wrote on 2013-04-12:

99 +use charnames ':full';

Will this work in 5.8?

8 + my $got_charset = $o->get('charset');
36 + my $charset = $got_charset || '';

Those seem redundant; just my $charset = $o->get('charset'); should work for both.

17 + . " INTO TABLE $dst->{db_tbl}"
18 + . ($got_charset ? "CHARACTER SET $got_charset" : "")
19 + . "("

I'm not aware of the "INSERT INTO TABLE foo CHARACTER SET bar" syntax, where is it documented? Is this tested?

review: Needs Fixing

Revision history for this message

Brian Fraser (fraserbn) wrote on 2013-04-12:

> 99 +use charnames ':full';
>
> Will this work in 5.8?
>

Yes. I actually added that line explicitly to be compatible with older Perls, since newer ones don't need it.

> 8 + my $got_charset = $o->get('charset');
> 36 + my $charset = $got_charset || '';
>
> Those seem redundant; just my $charset = $o->get('charset'); should work for
> both.

Technically yes, but further down pt-archiver overloads $charset to be the character set name *to perl*, which might be different than the MySQL charset. Simple example: --charset latin1, $got_charset would always remain 'latin1', but $charset later becomes 'iso-8859-1', which MySQL doesn't understand.
I want to refactor the charset handling code for 3.0, but figured that adding the new variable would be the least intrusive change for 2.2.

>
> 17 + . " INTO TABLE $dst->{db_tbl}"
> 18 + . ($got_charset ? "CHARACTER SET $got_charset" :
> "")
> 19 + . "("
>
> I'm not aware of the "INSERT INTO TABLE foo CHARACTER SET bar" syntax, where
> is it documented? Is this tested?

The diff probably cut off the important bit from that query: it's not INSERT INTO, it's "LOAD DATA LOCAL INFILE INTO TABLE foo CHARACTER SET doof"
It's documented on mysql's LOAD DATA doc from 5.0 onwards.

Revision history for this message

Daniel Nichter (daniel-nichter) wrote on 2013-04-12:

Ok, sounds good.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Ives Stoddard

Percona Toolkit developers

 === modified file 'bin/pt-archiver'
 --- bin/pt-archiver	2013-03-20 17:53:36 +0000
 +++ bin/pt-archiver	2013-04-12 17:46:29 +0000
@@ -5450,6 +5450,7 @@
     my $archive_file = $o->get('file');
     my $txnsize      = $o->get('txn-size');
     my $quiet        = $o->get('quiet');
++   my $got_charset  = $o->get('charset');
     # First things first: if --stop was given, create the sentinel file.
     if ( $o->get('stop') ) {
@@ -5833,7 +5834,9 @@
                    . ' LOCAL INFILE ?'
                    . ($o->get('replace')    ? ' REPLACE'      : '')
                    . ($o->get('ignore')     ? ' IGNORE'       : '')
--                  . " INTO TABLE $dst->{db_tbl}("
++                  . " INTO TABLE $dst->{db_tbl}"
++                  . ($got_charset ? "CHARACTER SET $got_charset" : "")
++                  . "("
                    . join(",", map { $q->quote($_) } @{$ins_stmt->{cols}} )
                    . ")";
+       }
@@ -5942,28 +5945,29 @@
        return 0;
+    }
--   # Open the file and print the header to it.
--   if ( $archive_file ) {
--      my $need_hdr = $o->get('header') && !-f $archive_file;
--      my $charset  = $o->get('charset') || '';
--      if ($charset eq 'utf8') {
--         $charset = ":$charset";
--      }
--      elsif ($charset) {
--         eval { require Encode }
++   my $charset  = $got_charset || '';
++   if ($charset eq 'utf8') {
++      $charset = ":$charset";
++   }
++   elsif ($charset) {
++      eval { require Encode }
              or (PTDEBUG &&
                 _d("Couldn't load Encode: ", $EVAL_ERROR,
                    "Going to try using the charset ",
                    "passed in without checking it."));
--         # No need to punish a user if they did their
--         # homework and passed in an official charset,
--         # rather than an alias.
--         $charset = ":encoding("
--                  . (defined &Encode::resolve_alias
--                     ? Encode::resolve_alias($charset) || $charset
--                     : $charset)
--                  . ")";
--      }
++      # No need to punish a user if they did their
++      # homework and passed in an official charset,
++      # rather than an alias.
++      $charset = ":encoding("
++               . (defined &Encode::resolve_alias
++                  ? Encode::resolve_alias($charset) || $charset
++                  : $charset)
++               . ")";
++   }
++
++   # Open the file and print the header to it.
++   if ( $archive_file ) {
++      my $need_hdr = $o->get('header') && !-f $archive_file;
        $archive_fh = IO::File->new($archive_file, ">>$charset")
           or die "Cannot open $charset $archive_file: $OS_ERROR\n";
        $archive_fh->autoflush(1) unless $o->get('buffer');
@@ -5979,6 +5983,9 @@
        require File::Temp;
        $bulkins_file = File::Temp->new( SUFFIX => 'pt-archiver' )
           or die "Cannot open temp file: $OS_ERROR\n";
++      binmode($bulkins_file, $charset)
++         or die "Cannot set $charset as an encoding for the bulk-insert "
++              . "file: $OS_ERROR";
+    }
     # This row is the first row fetched from each 'chunk'.
@@ -6205,6 +6212,9 @@
           if ( $o->get('bulk-insert') ) {
              $bulkins_file = File::Temp->new( SUFFIX => 'pt-archiver' )
                 or die "Cannot open temp file: $OS_ERROR\n";
++            binmode($bulkins_file, $charset)
++               or die "Cannot set $charset as an encoding for the bulk-insert "
++                    . "file: $OS_ERROR";
+          }
        }  # no next row (do bulk operations)
        else {
 === modified file 't/pt-archiver/bulk_insert.t'
 --- t/pt-archiver/bulk_insert.t	2012-11-21 16:21:56 +0000
 +++ t/pt-archiver/bulk_insert.t	2013-04-12 17:46:29 +0000
@@ -11,6 +11,8 @@
  use English qw(-no_match_vars);
  use Test::More;
++use charnames ':full';
++
  use PerconaTest;
  use Sandbox;
  require "$trunk/bin/pt-archiver";
@@ -85,6 +87,40 @@
  );
  # #############################################################################
++# pt-archiver wide character errors / corrupted data with UTF-8 + bulk-insert
++# https://bugs.launchpad.net/percona-toolkit/+bug/1127450
++# #############################################################################
++{
++my $utf8_dbh = $sb->get_dbh_for('master', { mysql_enable_utf8 => 1, AutoCommit => 1 });
++
++$sb->load_file('master', 't/pt-archiver/samples/bug_1127450.sql');
++my $sql = qq{INSERT INTO `bug_1127450`.`original` VALUES (1, "\N{KATAKANA LETTER NI}")};
++$utf8_dbh->do($sql);
++
++$output = output(
++   sub { pt_archiver::main(qw(--no-ascend --limit 50 --bulk-insert),
++      qw(--bulk-delete --where 1=1 --statistics --charset utf8),
++      '--source', "L=1,D=bug_1127450,t=original,F=$cnf",
++      '--dest',   "t=copy") }, stderr => 1
++);
++
++my (undef, $val) = $utf8_dbh->selectrow_array('select * from bug_1127450.copy');
++
++ok(
++   utf8::is_utf8($val),
++   "--bulk-insert preserves UTF8ness"
++);
++
++is(
++   $val,
++   "\N{KATAKANA LETTER NI}",
++   "--bulk-insert can handle utf8 characters"
++);
++
++unlike($output, qr/Wide character/, "no wide character warnings")
++
++}
++# #############################################################################
  # Done.
  # #############################################################################
  $sb->wipe_clean($dbh);
 === added file 't/pt-archiver/samples/bug_1127450.sql'
 --- t/pt-archiver/samples/bug_1127450.sql	1970-01-01 00:00:00 +0000
 +++ t/pt-archiver/samples/bug_1127450.sql	2013-04-12 17:46:29 +0000
@@ -0,0 +1,12 @@
++DROP DATABASE IF EXISTS `bug_1127450`;
++CREATE DATABASE `bug_1127450`;
++CREATE TABLE `bug_1127450`.`original` (
++   id int,
++   t text CHARACTER SET utf8,
++   PRIMARY KEY(id)
++) engine=InnoDB DEFAULT CHARSET=utf8;
++CREATE TABLE `bug_1127450`.`copy` (
++   id int,
++   t text CHARACTER SET utf8,
++   PRIMARY KEY(id)
++) engine=InnoDB DEFAULT CHARSET=utf8;

Percona Toolkit moved to https://jira.percona.com/projects/PT

Merge lp:~percona-toolkit-dev/percona-toolkit/fix-1127450-pt-archiver-bulk-insert-encoding into lp:percona-toolkit/2.2

Commit message

Description of the change

Preview Diff

Subscribers