Merge lp:~vlad-lesin/percona-server/5.5-bugs-1268729-1268735 into lp:percona-server/5.5

Proposed by Vlad Lesin
Status: Merged
Approved by: Laurynas Biveinis
Approved revision: no longer in the source branch.
Merged at revision: 633
Proposed branch: lp:~vlad-lesin/percona-server/5.5-bugs-1268729-1268735
Merge into: lp:percona-server/5.5
Diff against target: 204 lines (+141/-11)
4 files modified
mysql-test/suite/rpl/r/rpl_percona_gmvac_net_err.result (+30/-0)
mysql-test/suite/rpl/t/rpl_percona_gmvac_net_err.test (+67/-0)
sql-common/client.c (+6/-0)
sql/slave.cc (+38/-11)
To merge this branch: bzr merge lp:~vlad-lesin/percona-server/5.5-bugs-1268729-1268735
Reviewer Review Type Date Requested Status
Laurynas Biveinis (community) Approve
Review via email: mp+204782@code.launchpad.net

Description of the change

The fix for bug #1268735.

Process the network error during setting @master_heartbeat_period in
get_master_version_and_clock() correctly.

The fix for bug #1268729:
Slave I/O thread won't attempt to automatically reconnect to the master.

The fix is in including ER_NET_READ_INTERRUPTED error code to the set of
network related error codes in is_network_error() function.

The test case tests a couple of cases when get_master_version_and_clock()
fails due to ER_NET_READ_INTERRUPTED network error.

http://jenkins.percona.com/view/PS%205.5/job/percona-server-5.5-param/940

To post a comment you must log in.
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

    - It is possible to make the code changes for error injection
      less invasive and IMHO easier to read. Instead of

      // Inject ER_NET_READ_INTERRUPTED network error for testing.
      DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
                      mysql->net.last_errno= ER_NET_READ_INTERRUPTED;);
      if (DBUG_EVALUATE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
                           false, true) &&
          !mysql_real_query(mysql, STRING_WITH_LEN("SELECT
      UNIX_TIMESTAMP()")) &&

      you can push down the ER_NET_READ_INTERRUPTED injection deeper
      all the way to e.g. my_net_read, and make it one-shot:

      ...
      DBUG_EXECUTE_IF("inject_ER_NET_READ_INTERRUPTED",
                      {
                         ...whatever action..., e.g. net->last_error= ER_NET...
                         DBUG_SET("-d,inject_ER_NET_READ_INTERRUPTED");
                         ...early return or whatever...
                      });

      and in the three original injection points you just prime the
      above injection:

      DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
                      DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED"););
      if (!mysql_real_query ...

      and the surrounding code (the mysql_real_query call) itself is
      not touched.

    - The debug sync points and all their support code are not
      required. Even if START SLAVE is async (cannot recall at the
      moment whether it really is), you can use MTR include file
      start_slave_io.inc (I guess I/O thread is enough), which will
      wait until slave is up and running, and so the error injection
      point has passed.

    - Instead of own log grepper in Perl use one already provided:
      include/search_pattern.inc or search_pattern_in_file.inc,
      depending on whether you prefer testcase abort or a result file
      mismatch in case of the pattern not found. (s_p_in_file.inc
      exists in both 5.5 and 5.6, s_p.inc only in 5.6. In case you
      want that one, just bzr add it with the correct file id to
      5.5).

    - 58: s/binglog/binlog

    - 188, 259: spurious whitespace change

review: Needs Fixing
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

ST 38489

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

> - It is possible to make the code changes for error injection
> less invasive and IMHO easier to read. Instead of
>
> // Inject ER_NET_READ_INTERRUPTED network error for testing.
> DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
> mysql->net.last_errno= ER_NET_READ_INTERRUPTED;);
> if
> (DBUG_EVALUATE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
> false, true) &&
> !mysql_real_query(mysql, STRING_WITH_LEN("SELECT
> UNIX_TIMESTAMP()")) &&
>
> you can push down the ER_NET_READ_INTERRUPTED injection deeper
> all the way to e.g. my_net_read, and make it one-shot:
>
> ...
> DBUG_EXECUTE_IF("inject_ER_NET_READ_INTERRUPTED",
> {
> ...whatever action..., e.g. net->last_error=
> ER_NET...
> DBUG_SET("-d,inject_ER_NET_READ_INTERRUPTED");
> ...early return or whatever...
> });
>
> and in the three original injection points you just prime the
> above injection:
>
> DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
> DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED"););
> if (!mysql_real_query ...
>
> and the surrounding code (the mysql_real_query call) itself is
> not touched.
>
> - The debug sync points and all their support code are not
> required. Even if START SLAVE is async (cannot recall at the
> moment whether it really is), you can use MTR include file
> start_slave_io.inc (I guess I/O thread is enough), which will
> wait until slave is up and running, and so the error injection
> point has passed.

No, this does not work. wait_for_slave_io_to_start invokes wait_for_slave_param. wait_for_slave_param polls "show slave status" periodically expecting Slave_IO_Running field equals to 'Yes'. Slave_IO_Running does not equal to 'Yes' if connection to master is not established. When get_master_version_and_clock() returns "network error" handle_slave_io() tries to reconnect. As @global.debug was set to "get_master_version.timestamp.ER_NET_READ_INTERRUPTED" or any other value for testing get_master_version_and_clock() returns "network error" again and again and connection can not be established. So we need to reset @global.debug when get_master_version_and_clock() is finished. That is why debug sync points are currently used. We could reset @global.debug in mysql_real_query() (or my_net_read() as you have suggested) in DBUG_EXECUTE_IF("inject_ER_NET_READ_INTERRUPTED", {...}) block but I don't think this is a good idea to place specific code which must test only get_master_version_and_clock() function to more common function. How do you think?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

IMHO there's no problem with placing a ER_NET_READ_INTERRUPTED-injecting block into my_net_read even though it's a common function. Such injection is not linked to the particular bug scenario, could be reused in testing some other code path, and is similar to the already-existing simulate_net_write_failure in my_net_write.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

> IMHO there's no problem with placing a ER_NET_READ_INTERRUPTED-injecting block
> into my_net_read even though it's a common function. Such injection is not
> linked to the particular bug scenario, could be reused in testing some other
> code path, and is similar to the already-existing simulate_net_write_failure
> in my_net_write.

Sure, but I did not mean ER_NET_READ_INTERRUPTED in my previous comment.

We set @global.debug to certain values in test for DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED") being executed in certain places of get_master_version_and_clock(). For example we set:

SET @@global.debug="+d,get_master_version.timestamp.ER_NET_READ_INTERRUPTED"

to execute DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED") just before timestamp is gotten from master. And then we must reset get_master_version.timestamp.ER_NET_READ_INTERRUPTED to avoid execution of DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED") at the same place on reconnect. We can do this in mysql_real_query() what is not good because we place the certain test-specific code in common function. Or we can do this at the end of get_master_version_and_clock() in three places as this function has three end points and in this case some macros will be used to avoid copy-paste. But as I understood right the main idea of your comment was in removing such macros from get_master_version_and_clock().

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Can we just make the high-level injections one shot too?

      DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
                      DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED");
                      DBUG_SET("-d,get_master_version.timestamp.ER_NET_READ_INTERRUPTED"););

Or this would not work?

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

> Can we just make the high-level injections one shot too?
...
> Or this would not work?

Yes, this works, thanks.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

> - Instead of own log grepper in Perl use one already provided:
> include/search_pattern.inc or search_pattern_in_file.inc,
> depending on whether you prefer testcase abort or a result file
> mismatch in case of the pattern not found. (s_p_in_file.inc
> exists in both 5.5 and 5.6, s_p.inc only in 5.6. In case you
> want that one, just bzr add it with the correct file id to
> 5.5).

I have not found include/search_pattern.inc in both 5.5 and 5.6, so include/log_grep.inc is used instead it.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

>
> > - Instead of own log grepper in Perl use one already provided:
> > include/search_pattern.inc or search_pattern_in_file.inc,
> > depending on whether you prefer testcase abort or a result file
> > mismatch in case of the pattern not found. (s_p_in_file.inc
> > exists in both 5.5 and 5.6, s_p.inc only in 5.6. In case you
> > want that one, just bzr add it with the correct file id to
> > 5.5).
>
> I have not found include/search_pattern.inc in both 5.5 and 5.6, so
> include/log_grep.inc is used instead it.

There is search_pattern_in_file.inc. log_grep.inc is OK too, except that it would be nice to remove it from the 5.7 port.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :

> There is search_pattern_in_file.inc. log_grep.inc is OK too, except that it
> would be nice to remove it from the 5.7 port.

search_pattern_in_file.inc dies when pattern is not found. So it can be used only if we want to check if the pattern is in file. But my case is checking if error messages are NOT in file. log_grep.inc count the number of pattern occurrences, that is why it suits for this case. If we want to get rid log_grep.inc in 5.7, I can add one more parameter in search_pattern_in_file.inc to check pattern absence.

Revision history for this message
Vlad Lesin (vlad-lesin) wrote :
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== added file 'mysql-test/suite/rpl/r/rpl_percona_gmvac_net_err.result'
--- mysql-test/suite/rpl/r/rpl_percona_gmvac_net_err.result 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/r/rpl_percona_gmvac_net_err.result 2014-02-27 16:19:30 +0000
@@ -0,0 +1,30 @@
1include/master-slave.inc
2[connection master]
3call mtr.add_suppression("Slave I/O: Get master clock failed with error: , Error_code: 1159");
4call mtr.add_suppression("Get master SERVER_ID failed with error: , Error_code: 1159");
5call mtr.add_suppression("Slave I/O: SET @master_heartbeat_period to master failed with error: , Error_code: 1159");
6call mtr.add_suppression("Slave I/O: Master command COM_REGISTER_SLAVE failed: .*");
7call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
8SET @debug_saved=@@global.debug;
9include/stop_slave.inc
10SET @@global.debug="+d,get_master_version.heartbeat.ER_NET_READ_INTERRUPTED";
11START SLAVE;
12include/wait_for_slave_io_to_start.inc
13[log_grep.inc] file: pattern: The slave I/O thread stops because SET @master_heartbeat_period on master failed.
14[log_grep.inc] lines: 0
15include/stop_slave.inc
16SET @@global.debug="+d,get_master_version.serverid.ER_NET_READ_INTERRUPTED";
17START SLAVE;
18include/wait_for_slave_io_to_start.inc
19[log_grep.inc] file: pattern: fatal error is encountered when it try to get the value of SERVER_ID variable from master
20[log_grep.inc] lines: 0
21include/stop_slave.inc
22SET @@global.debug="+d,get_master_version.timestamp.ER_NET_READ_INTERRUPTED";
23START SLAVE;
24include/wait_for_slave_io_to_start.inc
25[log_grep.inc] file: pattern: "SELECT UNIX_TIMESTAMP.." failed on master, do not trust column
26[log_grep.inc] lines: 0
27include/stop_slave.inc
28SET @@global.debug='';
29include/start_slave.inc
30include/rpl_end.inc
031
=== added file 'mysql-test/suite/rpl/t/rpl_percona_gmvac_net_err.test'
--- mysql-test/suite/rpl/t/rpl_percona_gmvac_net_err.test 1970-01-01 00:00:00 +0000
+++ mysql-test/suite/rpl/t/rpl_percona_gmvac_net_err.test 2014-02-27 16:19:30 +0000
@@ -0,0 +1,67 @@
1# Bug #1268735:
2# Slave I/O thread won't attempt to automatically reconnect to the master
3# on ER_NET_READ_INTERRUPTED network error.
4#
5# The bug is in get_master_version_and_clock(). There are two places in this
6# function where the bug takes place. The first one is getting master server id
7# and the second one is getting master time. The error was injected to the code
8# to test the bug fix.
9
10source include/master-slave.inc;
11# The error injection works only with debug facilities.
12source include/have_debug.inc;
13source include/have_debug_sync.inc;
14# The test is not supposed to have any binglog affairs.
15# Hence it's enough it to run only with one binlog format
16source include/have_binlog_format_mixed.inc;
17
18call mtr.add_suppression("Slave I/O: Get master clock failed with error: , Error_code: 1159");
19call mtr.add_suppression("Get master SERVER_ID failed with error: , Error_code: 1159");
20call mtr.add_suppression("Slave I/O: SET @master_heartbeat_period to master failed with error: , Error_code: 1159");
21call mtr.add_suppression("Slave I/O: Master command COM_REGISTER_SLAVE failed: .*");
22call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
23
24let log_file_full_path= `SELECT @@GLOBAL.log_error`;
25if(!$log_file_full_path)
26{
27 # MySQL Server on windows is started with --console and thus
28 # does not know the location of its .err log, use default location
29 let log_file_full_path = $MYSQLTEST_VARDIR/log/mysqld.2.err;
30}
31
32connection slave;
33SET @debug_saved=@@global.debug;
34
35# i=1 for activating injected error during getting timestamp from master
36# i=2 for activating injected error during getting server id from master
37# i=3 for activating injected error during setting @master_heartbeat_period
38let i=3;
39
40connection slave;
41source include/stop_slave.inc;
42let old_debug=`SELECT @@global.debug`;
43
44while($i)
45{
46 let debug_name=`SELECT CASE $i WHEN 1 THEN 'get_master_version.timestamp.ER_NET_READ_INTERRUPTED' WHEN 2 THEN 'get_master_version.serverid.ER_NET_READ_INTERRUPTED' WHEN 3 THEN 'get_master_version.heartbeat.ER_NET_READ_INTERRUPTED' END`;
47
48 # Stop slave and set debug variables
49 eval SET @@global.debug="+d,$debug_name";
50
51 START SLAVE;
52 # Crash if error during slave starting takes place
53 let $rpl_allow_error= 0;
54 source include/wait_for_slave_io_to_start.inc;
55
56 let grep_pattern=`SELECT CASE $i WHEN 1 THEN '"SELECT UNIX_TIMESTAMP.." failed on master, do not trust column' WHEN 2 THEN 'fatal error is encountered when it try to get the value of SERVER_ID variable from master' WHEN 3 THEN 'The slave I/O thread stops because SET @master_heartbeat_period on master failed.' END`;
57
58 source include/log_grep.inc;
59
60 source include/stop_slave.inc;
61 dec $i;
62}
63
64eval SET @@global.debug='$old_debug';
65
66source include/start_slave.inc;
67source include/rpl_end.inc;
068
=== modified file 'sql-common/client.c'
--- sql-common/client.c 2013-12-16 08:19:34 +0000
+++ sql-common/client.c 2014-02-27 16:19:30 +0000
@@ -3912,6 +3912,12 @@
3912 DBUG_ENTER("mysql_real_query");3912 DBUG_ENTER("mysql_real_query");
3913 DBUG_PRINT("enter",("handle: 0x%lx", (long) mysql));3913 DBUG_PRINT("enter",("handle: 0x%lx", (long) mysql));
3914 DBUG_PRINT("query",("Query = '%-.4096s'",query));3914 DBUG_PRINT("query",("Query = '%-.4096s'",query));
3915 DBUG_EXECUTE_IF("inject_ER_NET_READ_INTERRUPTED",
3916 {
3917 mysql->net.last_errno= ER_NET_READ_INTERRUPTED;
3918 DBUG_SET("-d,inject_ER_NET_READ_INTERRUPTED");
3919 DBUG_RETURN(1);
3920 });
39153921
3916 if (mysql_send_query(mysql,query,length))3922 if (mysql_send_query(mysql,query,length))
3917 DBUG_RETURN(1);3923 DBUG_RETURN(1);
39183924
=== modified file 'sql/slave.cc'
--- sql/slave.cc 2014-02-03 04:39:37 +0000
+++ sql/slave.cc 2014-02-27 16:19:30 +0000
@@ -1209,7 +1209,8 @@
1209 errorno == CR_SERVER_GONE_ERROR ||1209 errorno == CR_SERVER_GONE_ERROR ||
1210 errorno == CR_SERVER_LOST ||1210 errorno == CR_SERVER_LOST ||
1211 errorno == ER_CON_COUNT_ERROR ||1211 errorno == ER_CON_COUNT_ERROR ||
1212 errorno == ER_SERVER_SHUTDOWN)1212 errorno == ER_SERVER_SHUTDOWN ||
1213 errorno == ER_NET_READ_INTERRUPTED)
1213 return TRUE;1214 return TRUE;
12141215
1215 return FALSE; 1216 return FALSE;
@@ -1325,6 +1326,12 @@
1325 };);1326 };);
13261327
1327 master_res= NULL;1328 master_res= NULL;
1329 DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
1330 {
1331 DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED");
1332 DBUG_SET("-d,get_master_version.timestamp."
1333 "ER_NET_READ_INTERRUPTED");
1334 });
1328 if (!mysql_real_query(mysql, STRING_WITH_LEN("SELECT UNIX_TIMESTAMP()")) &&1335 if (!mysql_real_query(mysql, STRING_WITH_LEN("SELECT UNIX_TIMESTAMP()")) &&
1329 (master_res= mysql_store_result(mysql)) &&1336 (master_res= mysql_store_result(mysql)) &&
1330 (master_row= mysql_fetch_row(master_res)))1337 (master_row= mysql_fetch_row(master_res)))
@@ -1375,6 +1382,12 @@
1375 };);1382 };);
1376 master_res= NULL;1383 master_res= NULL;
1377 master_row= NULL;1384 master_row= NULL;
1385 DBUG_EXECUTE_IF("get_master_version.server_id.ER_NET_READ_INTERRUPTED",
1386 {
1387 DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED");
1388 DBUG_SET("-d,get_master_version.server_id."
1389 "ER_NET_READ_INTERRUPTED");
1390 });
1378 if (!mysql_real_query(mysql,1391 if (!mysql_real_query(mysql,
1379 STRING_WITH_LEN("SHOW VARIABLES LIKE 'SERVER_ID'")) &&1392 STRING_WITH_LEN("SHOW VARIABLES LIKE 'SERVER_ID'")) &&
1380 (master_res= mysql_store_result(mysql)) &&1393 (master_res= mysql_store_result(mysql)) &&
@@ -1564,17 +1577,31 @@
1564 */1577 */
1565 llstr((ulonglong) (mi->heartbeat_period*1000000000UL), llbuf);1578 llstr((ulonglong) (mi->heartbeat_period*1000000000UL), llbuf);
1566 sprintf(query, query_format, llbuf);1579 sprintf(query, query_format, llbuf);
1580 DBUG_EXECUTE_IF("get_master_version.heartbeat.ER_NET_READ_INTERRUPTED",
1581 {
1582 DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED");
1583 DBUG_SET("-d,get_master_version.heartbeat."
1584 "ER_NET_READ_INTERRUPTED");
1585 });
1586 if (mysql_real_query(mysql, query, strlen(query))) {
1587 if (is_network_error(mysql_errno(mysql)))
1588 {
1589 mi->report(WARNING_LEVEL, mysql_errno(mysql),
1590 "SET @master_heartbeat_period to master failed with error: %s",
1591 mysql_error(mysql));
1592 mysql_free_result(mysql_store_result(mysql));
1593 goto network_err;
1594 }
1595 else if (!check_io_slave_killed(mi->io_thd, mi, NULL)) {
1596 errmsg= "The slave I/O thread stops because SET @master_heartbeat_period "
1597 "on master failed.";
1598 err_code= ER_SLAVE_FATAL_ERROR;
1599 sprintf(err_buff, "%s Error: %s", errmsg, mysql_error(mysql));
1600 mysql_free_result(mysql_store_result(mysql));
1601 goto err;
1602 }
1603 }
15671604
1568 if (mysql_real_query(mysql, query, strlen(query))
1569 && !check_io_slave_killed(mi->io_thd, mi, NULL))
1570 {
1571 errmsg= "The slave I/O thread stops because SET @master_heartbeat_period "
1572 "on master failed.";
1573 err_code= ER_SLAVE_FATAL_ERROR;
1574 sprintf(err_buff, "%s Error: %s", errmsg, mysql_error(mysql));
1575 mysql_free_result(mysql_store_result(mysql));
1576 goto err;
1577 }
1578 mysql_free_result(mysql_store_result(mysql));1605 mysql_free_result(mysql_store_result(mysql));
1579 }1606 }
1580 1607

Subscribers

People subscribed via source and target branches