Merge lp:~percona-toolkit-dev/percona-toolkit/pxc-pt-heartbeat into lp:percona-toolkit/2.1

Proposed by Daniel Nichter
Status: Merged
Approved by: Daniel Nichter
Approved revision: 507
Merged at revision: 506
Proposed branch: lp:~percona-toolkit-dev/percona-toolkit/pxc-pt-heartbeat
Merge into: lp:percona-toolkit/2.1
Diff against target: 582 lines (+512/-5)
4 files modified
bin/pt-heartbeat (+117/-4)
sandbox/start-sandbox (+5/-1)
sandbox/test-env (+6/-0)
t/pt-heartbeat/pxc.t (+384/-0)
To merge this branch: bzr merge lp:~percona-toolkit-dev/percona-toolkit/pxc-pt-heartbeat
Reviewer Review Type Date Requested Status
Daniel Nichter Approve
Brian Fraser (community) Approve
Review via email: mp+139339@code.launchpad.net
To post a comment you must log in.
Revision history for this message
Daniel Nichter (daniel-nichter) wrote :

169 + /tmp/12345/stop >/dev/null
170 + /tmp/12345/start >/dev/null

Is that really needed in test-env and at the end of pxt.t?

327 +is(
328 + $output,
329 + "0.00s [ 0.00s, 0.00s, 0.00s ]\n",
330 + "--monitor works"
331 +);

A similar test for regular MySQL often fails because one of those 0.00 will be 0.01 or something.

333 +# Try to generate some lag between cluster nodes. Rather brittle at the moment.

I think we can remove code related to that because it's going to be quite slow and, as you note, brittle.

518 +diag(`$trunk/bin/pt-heartbeat --stop >/dev/null`);
519 +sleep 1;

Looks like a timing-related failure waiting to happen.

Need a "Percona XtraDB Cluster" section in pt-heartbeat mention what we talked about on IRC: that a cluster is as fast as the slowest node, but pt-heartbeat doesn't really report the cluster's after lag. Although --monitor processes on various nodes can reveal how fast events are replicating to that node from the --update process. etc. etc.

Revision history for this message
Brian Fraser (fraserbn) wrote :
Download full text (3.4 KiB)

> 169 + /tmp/12345/stop >/dev/null
> 170 + /tmp/12345/start >/dev/null
>
> Is that really needed in test-env and at the end of pxt.t?
>

"Yes", but also no. For test-env, that's done so that we can be actually sure that the cluster will have 3 members. If after stop/update the cluster has 1 or 2 members, something went awfully wrong. And not catching this here means that code later down the line stop sop/start on 12345 (to set up replication filters, for example) will break the sandbox.
For pxt.t, that's doen to really restore the original state of the node. It's a trivial thing, but it might end up breaking up tests: relay_master_log_file & exec_master_log_pos are both NULL previous to running the file, but become 0 after turning cmaster on & off and calling RESET SLAVE. stop/start return them to NULL.

Unrealistic but easy way of seeing that it might break things: Remove those two from pxt.t, then run

$ prove t/pt-heartbeat/pxc.t t/pt-heartbeat/pxc.t

First one will pass; Second will fail a test. So we could take it out, but risk the admittedly small possibility of some unrelated test breaking later.

> 327 +is(
> 328 + $output,
> 329 + "0.00s [ 0.00s, 0.00s, 0.00s ]\n",
> 330 + "--monitor works"
> 331 +);
>
> A similar test for regular MySQL often fails because one of those 0.00 will be
> 0.01 or something.
>

That's preceded by $output =~ s/\d\.\d{2}/0.00/g, so it should be okay, although changing the is() to a like() might be a good idea.

> 333 +# Try to generate some lag between cluster nodes. Rather brittle at
> the moment.
>
> I think we can remove code related to that because it's going to be quite slow
> and, as you note, brittle.
>

Bit of a sunken cost argument here, but I spent quite a bit of time making that test work, so I'm going to argue against this. But actually, I ran pxc.t in a while true; loop for two hours, and that test never failed. So I think that it's okay -- the comment was written on a version of pxc.t that only reloaded sakila, but combining it with the alter active code seems to have hardened quite a bit. I think that adding a /m to the regex would make it even more resilient.
Also, it's actually pretty fast: Only the 5 seconds that --monitor is told to run, since the rest is running in the background.
If anything, can we try keeping it until it starts causing trouble?

> 518 +diag(`$trunk/bin/pt-heartbeat --stop >/dev/null`);
> 519 +sleep 1;
>
> Looks like a timing-related failure waiting to happen.
>

Definitive yes here. I could increase the sleep time to something much bigger, like 15, since the only thing that interests us of that part is that --stop actually works on instances of pt-heartbeat running on PXC, not that it stops them quickly.
The other option is making away with those tests and just wait_until(sub{!kill 0, $pid}) for @pids, but then we might hang / not be quite sure if --stop really works.

> Need a "Percona XtraDB Cluster" section in pt-heartbeat mention what we talked
> about on IRC: that a cluster is as fast as the slowest node, but pt-heartbeat
> doesn't really report the cluster's after lag. Although --mon...

Read more...

Revision history for this message
Daniel Nichter (daniel-nichter) wrote :
Download full text (4.3 KiB)

On Dec 11, 2012, at 9:47 PM, Brian Fraser wrote:

>> 169 + /tmp/12345/stop >/dev/null
>> 170 + /tmp/12345/start >/dev/null
>>
>> Is that really needed in test-env and at the end of pxt.t?
>>
>
> "Yes", but also no. For test-env, that's done so that we can be actually sure that the cluster will have 3 members. If after stop/update the cluster has 1 or 2 members, something went awfully wrong. And not catching this here means that code later down the line stop sop/start on 12345 (to set up replication filters, for example) will break the sandbox.
> For pxt.t, that's doen to really restore the original state of the node. It's a trivial thing, but it might end up breaking up tests: relay_master_log_file & exec_master_log_pos are both NULL previous to running the file, but become 0 after turning cmaster on & off and calling RESET SLAVE. stop/start return them to NULL.
>
> Unrealistic but easy way of seeing that it might break things: Remove those two from pxt.t, then run
>
> $ prove t/pt-heartbeat/pxc.t t/pt-heartbeat/pxc.t
>
> First one will pass; Second will fail a test. So we could take it out, but risk the admittedly small possibility of some unrelated test breaking later.

Ok, we'll keep them in for now. I just hate to add any more delays in the test suite.

>> 327 +is(
>> 328 + $output,
>> 329 + "0.00s [ 0.00s, 0.00s, 0.00s ]\n",
>> 330 + "--monitor works"
>> 331 +);
>>
>> A similar test for regular MySQL often fails because one of those 0.00 will be
>> 0.01 or something.
>>
>
> That's preceded by $output =~ s/\d\.\d{2}/0.00/g, so it should be okay, although changing the is() to a like() might be a good idea.

Ok, that should work better then.

>> 333 +# Try to generate some lag between cluster nodes. Rather brittle at
>> the moment.
>>
>> I think we can remove code related to that because it's going to be quite slow
>> and, as you note, brittle.
>>
>
> Bit of a sunken cost argument here, but I spent quite a bit of time making that test work, so I'm going to argue against this. But actually, I ran pxc.t in a while true; loop for two hours, and that test never failed. So I think that it's okay -- the comment was written on a version of pxc.t that only reloaded sakila, but combining it with the alter active code seems to have hardened quite a bit. I think that adding a /m to the regex would make it even more resilient.
> Also, it's actually pretty fast: Only the 5 seconds that --monitor is told to run, since the rest is running in the background.
> If anything, can we try keeping it until it starts causing trouble?

Alright, we'll keep it in until/if reasons arise to remove or change it.

>> 518 +diag(`$trunk/bin/pt-heartbeat --stop >/dev/null`);
>> 519 +sleep 1;
>>
>> Looks like a timing-related failure waiting to happen.
>>
>
> Definitive yes here. I could increase the sleep time to something much bigger, like 15, since the only thing that interests us of that part is that --stop actually works on instances of pt-heartbeat running on PXC, not that it stops them quickly.
> The other option is making away with those tests and just wait_until(sub{!kill 0, ...

Read more...

Revision history for this message
Brian Fraser (fraserbn) wrote :

Fixed & documented.

Revision history for this message
Brian Fraser (fraserbn) :
review: Approve
507. By Daniel Nichter

Tweak Percona XtraDB Cluster docs a little.

Revision history for this message
Daniel Nichter (daniel-nichter) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'bin/pt-heartbeat'
2--- bin/pt-heartbeat 2012-12-03 03:48:11 +0000
3+++ bin/pt-heartbeat 2012-12-14 00:42:20 +0000
4@@ -20,6 +20,7 @@
5 Daemon
6 Quoter
7 TableParser
8+ Retry
9 Transformers
10 VersionCheck
11 HTTPMicro
12@@ -2921,6 +2922,84 @@
13 # ###########################################################################
14
15 # ###########################################################################
16+# Retry package
17+# This package is a copy without comments from the original. The original
18+# with comments and its test file can be found in the Bazaar repository at,
19+# lib/Retry.pm
20+# t/lib/Retry.t
21+# See https://launchpad.net/percona-toolkit for more information.
22+# ###########################################################################
23+{
24+package Retry;
25+
26+use strict;
27+use warnings FATAL => 'all';
28+use English qw(-no_match_vars);
29+use constant PTDEBUG => $ENV{PTDEBUG} || 0;
30+
31+sub new {
32+ my ( $class, %args ) = @_;
33+ my $self = {
34+ %args,
35+ };
36+ return bless $self, $class;
37+}
38+
39+sub retry {
40+ my ( $self, %args ) = @_;
41+ my @required_args = qw(try fail final_fail);
42+ foreach my $arg ( @required_args ) {
43+ die "I need a $arg argument" unless $args{$arg};
44+ };
45+ my ($try, $fail, $final_fail) = @args{@required_args};
46+ my $wait = $args{wait} || sub { sleep 1; };
47+ my $tries = $args{tries} || 3;
48+
49+ my $last_error;
50+ my $tryno = 0;
51+ TRY:
52+ while ( ++$tryno <= $tries ) {
53+ PTDEBUG && _d("Try", $tryno, "of", $tries);
54+ my $result;
55+ eval {
56+ $result = $try->(tryno=>$tryno);
57+ };
58+ if ( $EVAL_ERROR ) {
59+ PTDEBUG && _d("Try code failed:", $EVAL_ERROR);
60+ $last_error = $EVAL_ERROR;
61+
62+ if ( $tryno < $tries ) { # more retries
63+ my $retry = $fail->(tryno=>$tryno, error=>$last_error);
64+ last TRY unless $retry;
65+ PTDEBUG && _d("Calling wait code");
66+ $wait->(tryno=>$tryno);
67+ }
68+ }
69+ else {
70+ PTDEBUG && _d("Try code succeeded");
71+ return $result;
72+ }
73+ }
74+
75+ PTDEBUG && _d('Try code did not succeed');
76+ return $final_fail->(error=>$last_error);
77+}
78+
79+sub _d {
80+ my ($package, undef, $line) = caller 0;
81+ @_ = map { (my $temp = $_) =~ s/\n/\n# /g; $temp; }
82+ map { defined $_ ? $_ : 'undef' }
83+ @_;
84+ print STDERR "# $package:$line $PID ", join(' ', @_), "\n";
85+}
86+
87+1;
88+}
89+# ###########################################################################
90+# End Retry package
91+# ###########################################################################
92+
93+# ###########################################################################
94 # Transformers package
95 # This package is a copy without comments from the original. The original
96 # with comments and its test file can be found in the Bazaar repository at,
97@@ -4920,10 +4999,31 @@
98 }
99 }
100
101- $sth->execute(ts(time), @vals);
102- PTDEBUG && _d($sth->{Statement});
103- $sth->finish();
104-
105+ my $retry = Retry->new();
106+ $retry->retry(
107+ tries => 3,
108+ wait => sub { sleep 0.25; return; },
109+ try => sub {
110+ $sth->execute(ts(time), @vals);
111+ PTDEBUG && _d($sth->{Statement});
112+ $sth->finish();
113+ },
114+ fail => sub {
115+ my (%args) = @_;
116+ my $error = $args{error};
117+ if ( $error =~ m/Deadlock found/ ) {
118+ return 1; # try again
119+ }
120+ else {
121+ return 0;
122+ }
123+ },
124+ final_fail => sub {
125+ my (%args) = @_;
126+ die $args{error};
127+ }
128+ );
129+
130 return;
131 };
132 }
133@@ -5387,6 +5487,19 @@
134 columns are optional. If any are present, their corresponding information
135 will be saved.
136
137+=head1 Percona XtraDB Cluster
138+
139+Although pt-heartbeat should work with all supported versions of Percona XtraDB
140+Cluster (PXC), we recommend using 5.5.28-23.7 and newer.
141+
142+If you are setting up heartbeat instances between cluster nodes, keep in mind
143+that, since the speed of the cluster is determined by its slowest node,
144+pt-heartbeat will not report how fast the cluster itself is, but only how
145+fast events are replicating from one node to another.
146+
147+You must specify L<"--master-server-id"> for L<"--monitor"> and L<"--check">
148+instances.
149+
150 =head1 OPTIONS
151
152 Specify at least one of L<"--stop">, L<"--update">, L<"--monitor">, or L<"--check">.
153
154=== modified file 'sandbox/start-sandbox'
155--- sandbox/start-sandbox 2012-11-16 19:08:49 +0000
156+++ sandbox/start-sandbox 2012-12-14 00:42:20 +0000
157@@ -52,6 +52,10 @@
158 if [ -n "${master_port}" ]; then
159 local master_listen_port=$(($master_port + 10))
160 cluster_address="gcomm://$ip:$master_listen_port"
161+
162+ local this_listen_port=$(($port + 10))
163+ local this_cluster_address="gcomm://$ip:$this_listen_port"
164+ sed -e "s!gcomm://\$!$this_cluster_address!g" -i.bak "/tmp/$master_port/my.sandbox.cnf"
165 fi
166
167 sed -e "s/ADDR/$ip/g" -i.bak "/tmp/$port/my.sandbox.cnf"
168@@ -118,7 +122,7 @@
169 debug_sandbox $port
170 exit 1
171 fi
172-
173+
174 # If the sandbox is a slave, start the slave.
175 if [ "$type" = "slave" ]; then
176 /tmp/$port/use -e "change master to master_host='127.0.0.1', master_user='msandbox', master_password='msandbox', master_port=$master_port"
177
178=== modified file 'sandbox/test-env'
179--- sandbox/test-env 2012-12-03 20:06:47 +0000
180+++ sandbox/test-env 2012-12-14 00:42:20 +0000
181@@ -299,6 +299,12 @@
182 exit_status=$((exit_status | $?))
183
184 if [ "${2:-""}" = "cluster" ]; then
185+ # Bit of magic here. 'start-sandbox cluster new_node old_node'
186+ # changes old_node's my.sandbox.cnf's wsrep_cluster_address to
187+ # point to new_node. This is especially useful because otherwise,
188+ # calling stop/start like below on 12345 would create a new cluster.
189+ /tmp/12345/stop >/dev/null
190+ /tmp/12345/start >/dev/null
191 echo -n "Checking that the cluster size is correct... "
192 size=$(/tmp/12345/use -ss -e "SHOW STATUS LIKE 'wsrep_cluster_size'" | awk '{print $2}')
193 if [ ${size:-0} -ne 3 ]; then
194
195=== added file 't/pt-heartbeat/pxc.t'
196--- t/pt-heartbeat/pxc.t 1970-01-01 00:00:00 +0000
197+++ t/pt-heartbeat/pxc.t 2012-12-14 00:42:20 +0000
198@@ -0,0 +1,384 @@
199+#!/usr/bin/env perl
200+
201+BEGIN {
202+ die "The PERCONA_TOOLKIT_BRANCH environment variable is not set.\n"
203+ unless $ENV{PERCONA_TOOLKIT_BRANCH} && -d $ENV{PERCONA_TOOLKIT_BRANCH};
204+ unshift @INC, "$ENV{PERCONA_TOOLKIT_BRANCH}/lib";
205+};
206+
207+use strict;
208+use warnings FATAL => 'all';
209+use English qw(-no_match_vars);
210+use Test::More;
211+use Data::Dumper;
212+
213+use File::Temp qw(tempfile);
214+
215+use PerconaTest;
216+use Sandbox;
217+
218+require "$trunk/bin/pt-heartbeat";
219+# Do this after requiring pt-hb, since it uses Mo
220+require VersionParser;
221+
222+my $dp = new DSNParser(opts=>$dsn_opts);
223+my $sb = new Sandbox(basedir => '/tmp', DSNParser => $dp);
224+my $node1 = $sb->get_dbh_for('node1');
225+my $node2 = $sb->get_dbh_for('node2');
226+my $node3 = $sb->get_dbh_for('node3');
227+
228+if ( !$node1 ) {
229+ plan skip_all => 'Cannot connect to cluster node1';
230+}
231+elsif ( !$node2 ) {
232+ plan skip_all => 'Cannot connect to cluster node2';
233+}
234+elsif ( !$node3 ) {
235+ plan skip_all => 'Cannot connect to cluster node3';
236+}
237+
238+my $db_flavor = VersionParser->new($node1)->flavor();
239+if ( $db_flavor !~ /XtraDB Cluster/ ) {
240+ plan skip_all => "PXC tests";
241+}
242+
243+my $node1_dsn = $sb->dsn_for('node1');
244+my $node2_dsn = $sb->dsn_for('node2');
245+my $node3_dsn = $sb->dsn_for('node3');
246+my $node1_port = $sb->port_for('node1');
247+my $node2_port = $sb->port_for('node2');
248+my $node3_port = $sb->port_for('node3');
249+
250+my $output;
251+my $exit;
252+my $base_pidfile = (tempfile("/tmp/pt-heartbeat-test.XXXXXXXX", OPEN => 0, UNLINK => 0))[1];
253+my $sample = "t/pt-heartbeat/samples/";
254+
255+my $sentinel = '/tmp/pt-heartbeat-sentinel';
256+
257+diag(`rm -rf $sentinel >/dev/null 2>&1`);
258+$sb->create_dbs($node1, ['test']);
259+
260+my @exec_pids;
261+my @pidfiles;
262+
263+sub start_update_instance {
264+ my ($port) = @_;
265+ my $pidfile = "$base_pidfile.$port.pid";
266+ push @pidfiles, $pidfile;
267+
268+ my $pid = fork();
269+ die "Cannot fork: $OS_ERROR" unless defined $pid;
270+ if ( $pid == 0 ) {
271+ my $cmd = "$trunk/bin/pt-heartbeat";
272+ exec { $cmd } $cmd, qw(-h 127.0.0.1 -u msandbox -p msandbox -P), $port,
273+ qw(--database test --table heartbeat --create-table),
274+ qw(--update --interval 0.5 --pid), $pidfile;
275+ exit 1;
276+ }
277+ push @exec_pids, $pid;
278+
279+ PerconaTest::wait_for_files($pidfile);
280+ ok(
281+ -f $pidfile,
282+ "--update on $port started"
283+ );
284+}
285+
286+sub stop_all_instances {
287+ my @pids = @exec_pids, map { chomp; $_ } map { slurp_file($_) } @pidfiles;
288+ diag(`$trunk/bin/pt-heartbeat --stop >/dev/null`);
289+
290+ waitpid($_, 0) for @pids;
291+ PerconaTest::wait_until(sub{ !-e $_ }) for @pidfiles;
292+
293+ unlink $sentinel;
294+}
295+
296+foreach my $port ( map { $sb->port_for($_) } qw(node1 node2 node3) ) {
297+ start_update_instance($port);
298+}
299+
300+# #############################################################################
301+# Basic cluster tests
302+# #############################################################################
303+
304+my $rows = $node1->selectall_hashref("select * from test.heartbeat", 'server_id');
305+
306+is(
307+ scalar keys %$rows,
308+ 3,
309+ "Sanity check: All nodes are in the heartbeat table"
310+);
311+
312+my $only_slave_data = {
313+ map {
314+ $_ => {
315+ relay_master_log_file => $rows->{$_}->{relay_master_log_file},
316+ exec_master_log_pos => $rows->{$_}->{exec_master_log_pos},
317+ } } keys %$rows
318+};
319+
320+my $same_data = { relay_master_log_file => undef, exec_master_log_pos => undef };
321+is_deeply(
322+ $only_slave_data,
323+ {
324+ 12345 => $same_data,
325+ 12346 => $same_data,
326+ 12347 => $same_data,
327+ },
328+ "Sanity check: No slave data (relay log or master pos) is stored"
329+);
330+
331+$output = output(sub{
332+ pt_heartbeat::main($node1_dsn, qw(-D test --check)),
333+ },
334+ stderr => 1,
335+);
336+
337+like(
338+ $output,
339+ qr/\QThe --master-server-id option must be specified because the heartbeat table `test`.`heartbeat`/,
340+ "pt-heartbeat --check + PXC doesn't autodetect a master if there isn't any"
341+);
342+
343+$output = output(sub{
344+ pt_heartbeat::main($node1_dsn, qw(-D test --check),
345+ '--master-server-id', $node3_port),
346+ },
347+ stderr => 1,
348+);
349+
350+$output =~ s/\d\.\d{2}/0.00/g;
351+is(
352+ $output,
353+ "0.00\n",
354+ "pt-heartbeat --check + PXC works with --master-server-id"
355+);
356+
357+# Test --monitor
358+
359+$output = output(sub {
360+ pt_heartbeat::main($node1_dsn,
361+ qw(-D test --monitor --run-time 1s),
362+ '--master-server-id', $node3_port)
363+ },
364+ stderr => 1,
365+);
366+
367+$output =~ s/\d\.\d{2}/0.00/g;
368+is(
369+ $output,
370+ "0.00s [ 0.00s, 0.00s, 0.00s ]\n",
371+ "--monitor works"
372+);
373+
374+# Try to generate some lag between cluster nodes. Rather brittle at the moment.
375+
376+# Lifted from alter active table
377+my $pt_osc_sample = "t/pt-online-schema-change/samples";
378+
379+my $query_table_stop = "/tmp/query_table.$PID.stop";
380+my $query_table_pid = "/tmp/query_table.$PID.pid";
381+my $query_table_output = "/tmp/query_table.$PID.output";
382+
383+$sb->create_dbs($node1, ['pt_osc']);
384+$sb->load_file('master', "$pt_osc_sample/basic_no_fks_innodb.sql");
385+
386+$node1->do("USE pt_osc");
387+$node1->do("TRUNCATE TABLE t");
388+$node1->do("LOAD DATA INFILE '$trunk/$pt_osc_sample/basic_no_fks.data' INTO TABLE t");
389+$node1->do("ANALYZE TABLE t");
390+$sb->wait_for_slaves();
391+
392+diag(`rm -rf $query_table_stop`);
393+diag(`echo > $query_table_output`);
394+
395+my $cmd = "$trunk/$pt_osc_sample/query_table.pl";
396+system("$cmd 127.0.0.1 $node1_port pt_osc t id $query_table_stop $query_table_pid >$query_table_output 2>&1 &");
397+wait_until(sub{-e $query_table_pid});
398+
399+# Reload sakila
400+system "$trunk/sandbox/load-sakila-db $node1_port &";
401+
402+$output = output(sub {
403+ pt_heartbeat::main($node3_dsn,
404+ qw(-D test --monitor --run-time 5s),
405+ '--master-server-id', $node1_port)
406+ },
407+ stderr => 1,
408+);
409+
410+like(
411+ $output,
412+ qr/^(?:0\.(?:\d[1-9]|[1-9]\d)|\d*[1-9]\d*\.\d{2})s\s+\[/m,
413+ "pt-heartbeat can detect replication lag between nodes"
414+);
415+
416+diag(`touch $query_table_stop`);
417+chomp(my $p = slurp_file($query_table_pid));
418+wait_until(sub{!kill 0, $p});
419+
420+$node1->do(q{DROP DATABASE pt_osc});
421+
422+$sb->wait_for_slaves();
423+
424+# #############################################################################
425+# cluster, node1 -> slave, run on node1
426+# #############################################################################
427+
428+my ($slave_dbh, $slave_dsn) = $sb->start_sandbox(
429+ server => 'cslave1',
430+ type => 'slave',
431+ master => 'node1',
432+ env => q/BINLOG_FORMAT="ROW"/,
433+);
434+
435+$sb->create_dbs($slave_dbh, ['test']);
436+
437+start_update_instance($sb->port_for('cslave1'));
438+PerconaTest::wait_for_table($slave_dbh, "test.heartbeat", "1=1");
439+
440+$output = output(sub{
441+ pt_heartbeat::main($slave_dsn, qw(-D test --check)),
442+ },
443+ stderr => 1,
444+);
445+
446+like(
447+ $output,
448+ qr/\d\.\d{2}\n/,
449+ "pt-heartbeat --check works on a slave of a cluster node"
450+);
451+
452+$output = output(sub {
453+ pt_heartbeat::main($slave_dsn,
454+ qw(-D test --monitor --run-time 2s))
455+ },
456+ stderr => 1,
457+);
458+
459+like(
460+ $output,
461+ qr/^\d.\d{2}s\s+\[/,
462+ "pt-heartbeat --monitor + slave of a node1, without --master-server-id"
463+);
464+
465+$output = output(sub {
466+ pt_heartbeat::main($slave_dsn,
467+ qw(-D test --monitor --run-time 2s),
468+ '--master-server-id', $node3_port)
469+ },
470+ stderr => 1,
471+);
472+
473+like(
474+ $output,
475+ qr/^\d.\d{2}s\s+\[/,
476+ "pt-heartbeat --monitor + slave of node1, --master-server-id pointing to node3"
477+);
478+
479+# #############################################################################
480+# master -> node1 in cluster
481+# #############################################################################
482+
483+# CAREFUL! See the comments in t/pt-table-checksum/pxc.t about cmaster.
484+# Nearly everything applies here.
485+
486+my ($master_dbh, $master_dsn) = $sb->start_sandbox(
487+ server => 'cmaster',
488+ type => 'master',
489+ env => q/BINLOG_FORMAT="ROW"/,
490+);
491+
492+my $cmaster_port = $sb->port_for('cmaster');
493+
494+$sb->create_dbs($master_dbh, ['test']);
495+
496+$master_dbh->do("FLUSH LOGS");
497+$master_dbh->do("RESET MASTER");
498+
499+$sb->set_as_slave('node1', 'cmaster');
500+
501+start_update_instance($sb->port_for('cmaster'));
502+PerconaTest::wait_for_table($node1, "test.heartbeat", "server_id=$cmaster_port");
503+
504+$output = output(sub{
505+ pt_heartbeat::main($node1_dsn, qw(-D test --check --print-master-server-id)),
506+ },
507+ stderr => 1,
508+);
509+
510+like(
511+ $output,
512+ qr/^\d.\d{2} $cmaster_port$/,
513+ "--print-master-id works for master -> $node1_port, when run from $node1_port"
514+);
515+
516+# Wait until node2 & node3 get cmaster in their heartbeat tables
517+$sb->wait_for_slaves(master => 'node1', slave => 'node2');
518+$sb->wait_for_slaves(master => 'node1', slave => 'node3');
519+
520+foreach my $test (
521+ [ $node2_port, $node2_dsn, $node2 ],
522+ [ $node3_port, $node3_dsn, $node3 ],
523+) {
524+ my ($port, $dsn, $dbh) = @$test;
525+
526+ $output = output(sub{
527+ pt_heartbeat::main($dsn, qw(-D test --check --print-master-server-id)),
528+ },
529+ stderr => 1,
530+ );
531+
532+ # This could be made to work, see the node autodiscovery branch
533+ TODO: {
534+ local $::TODO = "cmaster -> node1, other nodes can't autodetect the master";
535+ like(
536+ $output,
537+ qr/$cmaster_port/,
538+ "--print-master-id works for master -> $node1_port, when run from $port"
539+ );
540+ }
541+
542+ $output = output(sub{
543+ pt_heartbeat::main($dsn, qw(-D test --check --master-server-id), $cmaster_port),
544+ },
545+ stderr => 1,
546+ );
547+
548+ $output =~ s/\d\.\d{2}/0.00/g;
549+ is(
550+ $output,
551+ "0.00\n",
552+ "--check + explicit --master-server-id work for master -> node1, run from $port"
553+ );
554+}
555+
556+# ############################################################################
557+# Stop the --update instances.
558+# ############################################################################
559+
560+stop_all_instances();
561+
562+# ############################################################################
563+# Disconnect & stop the two servers we started
564+# ############################################################################
565+
566+# We have to do this after the --stop, otherwise the --update processes will
567+# spew a bunch of warnings and clog
568+
569+$slave_dbh->disconnect;
570+$master_dbh->disconnect;
571+$sb->stop_sandbox('cslave1', 'cmaster');
572+$node1->do("STOP SLAVE");
573+$node1->do("RESET SLAVE");
574+
575+# #############################################################################
576+# Done.
577+# #############################################################################
578+$sb->wipe_clean($node1);
579+diag(`/tmp/12345/stop`);
580+diag(`/tmp/12345/start`);
581+ok($sb->ok(), "Sandbox servers") or BAIL_OUT(__FILE__ . " broke the sandbox");
582+done_testing;

Subscribers

People subscribed via source and target branches