Merge lp:~vlad-lesin/percona-server/5.5-bugs-1268729-1268735 into lp:percona-server/5.5

Proposed by Vlad Lesin on 2014-02-04
Status: Merged
Approved by: Laurynas Biveinis on 2014-03-05
Approved revision: 608
Merged at revision: 633
Proposed branch: lp:~vlad-lesin/percona-server/5.5-bugs-1268729-1268735
Merge into: lp:percona-server/5.5
Diff against target: 204 lines (+141/-11)
4 files modified
mysql-test/suite/rpl/r/rpl_percona_gmvac_net_err.result (+30/-0)
mysql-test/suite/rpl/t/rpl_percona_gmvac_net_err.test (+67/-0)
sql-common/client.c (+6/-0)
sql/slave.cc (+38/-11)
To merge this branch: bzr merge lp:~vlad-lesin/percona-server/5.5-bugs-1268729-1268735
Reviewer Review Type Date Requested Status
Laurynas Biveinis (community) 2014-02-04 Approve on 2014-03-05
Review via email: mp+204782@code.launchpad.net

Description of the change

The fix for bug #1268735.

Process the network error during setting @master_heartbeat_period in
get_master_version_and_clock() correctly.

The fix for bug #1268729:
Slave I/O thread won't attempt to automatically reconnect to the master.

The fix is in including ER_NET_READ_INTERRUPTED error code to the set of
network related error codes in is_network_error() function.

The test case tests a couple of cases when get_master_version_and_clock()
fails due to ER_NET_READ_INTERRUPTED network error.

http://jenkins.percona.com/view/PS%205.5/job/percona-server-5.5-param/940

To post a comment you must log in.

    - It is possible to make the code changes for error injection
      less invasive and IMHO easier to read. Instead of

      // Inject ER_NET_READ_INTERRUPTED network error for testing.
      DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
                      mysql->net.last_errno= ER_NET_READ_INTERRUPTED;);
      if (DBUG_EVALUATE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
                           false, true) &&
          !mysql_real_query(mysql, STRING_WITH_LEN("SELECT
      UNIX_TIMESTAMP()")) &&

      you can push down the ER_NET_READ_INTERRUPTED injection deeper
      all the way to e.g. my_net_read, and make it one-shot:

      ...
      DBUG_EXECUTE_IF("inject_ER_NET_READ_INTERRUPTED",
                      {
                         ...whatever action..., e.g. net->last_error= ER_NET...
                         DBUG_SET("-d,inject_ER_NET_READ_INTERRUPTED");
                         ...early return or whatever...
                      });

      and in the three original injection points you just prime the
      above injection:

      DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
                      DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED"););
      if (!mysql_real_query ...

      and the surrounding code (the mysql_real_query call) itself is
      not touched.

    - The debug sync points and all their support code are not
      required. Even if START SLAVE is async (cannot recall at the
      moment whether it really is), you can use MTR include file
      start_slave_io.inc (I guess I/O thread is enough), which will
      wait until slave is up and running, and so the error injection
      point has passed.

    - Instead of own log grepper in Perl use one already provided:
      include/search_pattern.inc or search_pattern_in_file.inc,
      depending on whether you prefer testcase abort or a result file
      mismatch in case of the pattern not found. (s_p_in_file.inc
      exists in both 5.5 and 5.6, s_p.inc only in 5.6. In case you
      want that one, just bzr add it with the correct file id to
      5.5).

    - 58: s/binglog/binlog

    - 188, 259: spurious whitespace change

review: Needs Fixing

ST 38489

Vlad Lesin (vlad-lesin) wrote :

> - It is possible to make the code changes for error injection
> less invasive and IMHO easier to read. Instead of
>
> // Inject ER_NET_READ_INTERRUPTED network error for testing.
> DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
> mysql->net.last_errno= ER_NET_READ_INTERRUPTED;);
> if
> (DBUG_EVALUATE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
> false, true) &&
> !mysql_real_query(mysql, STRING_WITH_LEN("SELECT
> UNIX_TIMESTAMP()")) &&
>
> you can push down the ER_NET_READ_INTERRUPTED injection deeper
> all the way to e.g. my_net_read, and make it one-shot:
>
> ...
> DBUG_EXECUTE_IF("inject_ER_NET_READ_INTERRUPTED",
> {
> ...whatever action..., e.g. net->last_error=
> ER_NET...
> DBUG_SET("-d,inject_ER_NET_READ_INTERRUPTED");
> ...early return or whatever...
> });
>
> and in the three original injection points you just prime the
> above injection:
>
> DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
> DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED"););
> if (!mysql_real_query ...
>
> and the surrounding code (the mysql_real_query call) itself is
> not touched.
>
> - The debug sync points and all their support code are not
> required. Even if START SLAVE is async (cannot recall at the
> moment whether it really is), you can use MTR include file
> start_slave_io.inc (I guess I/O thread is enough), which will
> wait until slave is up and running, and so the error injection
> point has passed.

No, this does not work. wait_for_slave_io_to_start invokes wait_for_slave_param. wait_for_slave_param polls "show slave status" periodically expecting Slave_IO_Running field equals to 'Yes'. Slave_IO_Running does not equal to 'Yes' if connection to master is not established. When get_master_version_and_clock() returns "network error" handle_slave_io() tries to reconnect. As @global.debug was set to "get_master_version.timestamp.ER_NET_READ_INTERRUPTED" or any other value for testing get_master_version_and_clock() returns "network error" again and again and connection can not be established. So we need to reset @global.debug when get_master_version_and_clock() is finished. That is why debug sync points are currently used. We could reset @global.debug in mysql_real_query() (or my_net_read() as you have suggested) in DBUG_EXECUTE_IF("inject_ER_NET_READ_INTERRUPTED", {...}) block but I don't think this is a good idea to place specific code which must test only get_master_version_and_clock() function to more common function. How do you think?

IMHO there's no problem with placing a ER_NET_READ_INTERRUPTED-injecting block into my_net_read even though it's a common function. Such injection is not linked to the particular bug scenario, could be reused in testing some other code path, and is similar to the already-existing simulate_net_write_failure in my_net_write.

Vlad Lesin (vlad-lesin) wrote :

> IMHO there's no problem with placing a ER_NET_READ_INTERRUPTED-injecting block
> into my_net_read even though it's a common function. Such injection is not
> linked to the particular bug scenario, could be reused in testing some other
> code path, and is similar to the already-existing simulate_net_write_failure
> in my_net_write.

Sure, but I did not mean ER_NET_READ_INTERRUPTED in my previous comment.

We set @global.debug to certain values in test for DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED") being executed in certain places of get_master_version_and_clock(). For example we set:

SET @@global.debug="+d,get_master_version.timestamp.ER_NET_READ_INTERRUPTED"

to execute DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED") just before timestamp is gotten from master. And then we must reset get_master_version.timestamp.ER_NET_READ_INTERRUPTED to avoid execution of DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED") at the same place on reconnect. We can do this in mysql_real_query() what is not good because we place the certain test-specific code in common function. Or we can do this at the end of get_master_version_and_clock() in three places as this function has three end points and in this case some macros will be used to avoid copy-paste. But as I understood right the main idea of your comment was in removing such macros from get_master_version_and_clock().

Can we just make the high-level injections one shot too?

      DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
                      DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED");
                      DBUG_SET("-d,get_master_version.timestamp.ER_NET_READ_INTERRUPTED"););

Or this would not work?

Vlad Lesin (vlad-lesin) wrote :

> Can we just make the high-level injections one shot too?
...
> Or this would not work?

Yes, this works, thanks.

Vlad Lesin (vlad-lesin) wrote :

> - Instead of own log grepper in Perl use one already provided:
> include/search_pattern.inc or search_pattern_in_file.inc,
> depending on whether you prefer testcase abort or a result file
> mismatch in case of the pattern not found. (s_p_in_file.inc
> exists in both 5.5 and 5.6, s_p.inc only in 5.6. In case you
> want that one, just bzr add it with the correct file id to
> 5.5).

I have not found include/search_pattern.inc in both 5.5 and 5.6, so include/log_grep.inc is used instead it.

>
> > - Instead of own log grepper in Perl use one already provided:
> > include/search_pattern.inc or search_pattern_in_file.inc,
> > depending on whether you prefer testcase abort or a result file
> > mismatch in case of the pattern not found. (s_p_in_file.inc
> > exists in both 5.5 and 5.6, s_p.inc only in 5.6. In case you
> > want that one, just bzr add it with the correct file id to
> > 5.5).
>
> I have not found include/search_pattern.inc in both 5.5 and 5.6, so
> include/log_grep.inc is used instead it.

There is search_pattern_in_file.inc. log_grep.inc is OK too, except that it would be nice to remove it from the 5.7 port.

Vlad Lesin (vlad-lesin) wrote :

> There is search_pattern_in_file.inc. log_grep.inc is OK too, except that it
> would be nice to remove it from the 5.7 port.

search_pattern_in_file.inc dies when pattern is not found. So it can be used only if we want to check if the pattern is in file. But my case is checking if error messages are NOT in file. log_grep.inc count the number of pattern occurrences, that is why it suits for this case. If we want to get rid log_grep.inc in 5.7, I can add one more parameter in search_pattern_in_file.inc to check pattern absence.

review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'mysql-test/suite/rpl/r/rpl_percona_gmvac_net_err.result'
2--- mysql-test/suite/rpl/r/rpl_percona_gmvac_net_err.result 1970-01-01 00:00:00 +0000
3+++ mysql-test/suite/rpl/r/rpl_percona_gmvac_net_err.result 2014-02-27 16:19:30 +0000
4@@ -0,0 +1,30 @@
5+include/master-slave.inc
6+[connection master]
7+call mtr.add_suppression("Slave I/O: Get master clock failed with error: , Error_code: 1159");
8+call mtr.add_suppression("Get master SERVER_ID failed with error: , Error_code: 1159");
9+call mtr.add_suppression("Slave I/O: SET @master_heartbeat_period to master failed with error: , Error_code: 1159");
10+call mtr.add_suppression("Slave I/O: Master command COM_REGISTER_SLAVE failed: .*");
11+call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
12+SET @debug_saved=@@global.debug;
13+include/stop_slave.inc
14+SET @@global.debug="+d,get_master_version.heartbeat.ER_NET_READ_INTERRUPTED";
15+START SLAVE;
16+include/wait_for_slave_io_to_start.inc
17+[log_grep.inc] file: pattern: The slave I/O thread stops because SET @master_heartbeat_period on master failed.
18+[log_grep.inc] lines: 0
19+include/stop_slave.inc
20+SET @@global.debug="+d,get_master_version.serverid.ER_NET_READ_INTERRUPTED";
21+START SLAVE;
22+include/wait_for_slave_io_to_start.inc
23+[log_grep.inc] file: pattern: fatal error is encountered when it try to get the value of SERVER_ID variable from master
24+[log_grep.inc] lines: 0
25+include/stop_slave.inc
26+SET @@global.debug="+d,get_master_version.timestamp.ER_NET_READ_INTERRUPTED";
27+START SLAVE;
28+include/wait_for_slave_io_to_start.inc
29+[log_grep.inc] file: pattern: "SELECT UNIX_TIMESTAMP.." failed on master, do not trust column
30+[log_grep.inc] lines: 0
31+include/stop_slave.inc
32+SET @@global.debug='';
33+include/start_slave.inc
34+include/rpl_end.inc
35
36=== added file 'mysql-test/suite/rpl/t/rpl_percona_gmvac_net_err.test'
37--- mysql-test/suite/rpl/t/rpl_percona_gmvac_net_err.test 1970-01-01 00:00:00 +0000
38+++ mysql-test/suite/rpl/t/rpl_percona_gmvac_net_err.test 2014-02-27 16:19:30 +0000
39@@ -0,0 +1,67 @@
40+# Bug #1268735:
41+# Slave I/O thread won't attempt to automatically reconnect to the master
42+# on ER_NET_READ_INTERRUPTED network error.
43+#
44+# The bug is in get_master_version_and_clock(). There are two places in this
45+# function where the bug takes place. The first one is getting master server id
46+# and the second one is getting master time. The error was injected to the code
47+# to test the bug fix.
48+
49+source include/master-slave.inc;
50+# The error injection works only with debug facilities.
51+source include/have_debug.inc;
52+source include/have_debug_sync.inc;
53+# The test is not supposed to have any binglog affairs.
54+# Hence it's enough it to run only with one binlog format
55+source include/have_binlog_format_mixed.inc;
56+
57+call mtr.add_suppression("Slave I/O: Get master clock failed with error: , Error_code: 1159");
58+call mtr.add_suppression("Get master SERVER_ID failed with error: , Error_code: 1159");
59+call mtr.add_suppression("Slave I/O: SET @master_heartbeat_period to master failed with error: , Error_code: 1159");
60+call mtr.add_suppression("Slave I/O: Master command COM_REGISTER_SLAVE failed: .*");
61+call mtr.add_suppression("Slave SQL.*Request to stop slave SQL Thread received while applying a group that has non-transactional changes; waiting for completion of the group");
62+
63+let log_file_full_path= `SELECT @@GLOBAL.log_error`;
64+if(!$log_file_full_path)
65+{
66+ # MySQL Server on windows is started with --console and thus
67+ # does not know the location of its .err log, use default location
68+ let log_file_full_path = $MYSQLTEST_VARDIR/log/mysqld.2.err;
69+}
70+
71+connection slave;
72+SET @debug_saved=@@global.debug;
73+
74+# i=1 for activating injected error during getting timestamp from master
75+# i=2 for activating injected error during getting server id from master
76+# i=3 for activating injected error during setting @master_heartbeat_period
77+let i=3;
78+
79+connection slave;
80+source include/stop_slave.inc;
81+let old_debug=`SELECT @@global.debug`;
82+
83+while($i)
84+{
85+ let debug_name=`SELECT CASE $i WHEN 1 THEN 'get_master_version.timestamp.ER_NET_READ_INTERRUPTED' WHEN 2 THEN 'get_master_version.serverid.ER_NET_READ_INTERRUPTED' WHEN 3 THEN 'get_master_version.heartbeat.ER_NET_READ_INTERRUPTED' END`;
86+
87+ # Stop slave and set debug variables
88+ eval SET @@global.debug="+d,$debug_name";
89+
90+ START SLAVE;
91+ # Crash if error during slave starting takes place
92+ let $rpl_allow_error= 0;
93+ source include/wait_for_slave_io_to_start.inc;
94+
95+ let grep_pattern=`SELECT CASE $i WHEN 1 THEN '"SELECT UNIX_TIMESTAMP.." failed on master, do not trust column' WHEN 2 THEN 'fatal error is encountered when it try to get the value of SERVER_ID variable from master' WHEN 3 THEN 'The slave I/O thread stops because SET @master_heartbeat_period on master failed.' END`;
96+
97+ source include/log_grep.inc;
98+
99+ source include/stop_slave.inc;
100+ dec $i;
101+}
102+
103+eval SET @@global.debug='$old_debug';
104+
105+source include/start_slave.inc;
106+source include/rpl_end.inc;
107
108=== modified file 'sql-common/client.c'
109--- sql-common/client.c 2013-12-16 08:19:34 +0000
110+++ sql-common/client.c 2014-02-27 16:19:30 +0000
111@@ -3912,6 +3912,12 @@
112 DBUG_ENTER("mysql_real_query");
113 DBUG_PRINT("enter",("handle: 0x%lx", (long) mysql));
114 DBUG_PRINT("query",("Query = '%-.4096s'",query));
115+ DBUG_EXECUTE_IF("inject_ER_NET_READ_INTERRUPTED",
116+ {
117+ mysql->net.last_errno= ER_NET_READ_INTERRUPTED;
118+ DBUG_SET("-d,inject_ER_NET_READ_INTERRUPTED");
119+ DBUG_RETURN(1);
120+ });
121
122 if (mysql_send_query(mysql,query,length))
123 DBUG_RETURN(1);
124
125=== modified file 'sql/slave.cc'
126--- sql/slave.cc 2014-02-03 04:39:37 +0000
127+++ sql/slave.cc 2014-02-27 16:19:30 +0000
128@@ -1209,7 +1209,8 @@
129 errorno == CR_SERVER_GONE_ERROR ||
130 errorno == CR_SERVER_LOST ||
131 errorno == ER_CON_COUNT_ERROR ||
132- errorno == ER_SERVER_SHUTDOWN)
133+ errorno == ER_SERVER_SHUTDOWN ||
134+ errorno == ER_NET_READ_INTERRUPTED)
135 return TRUE;
136
137 return FALSE;
138@@ -1325,6 +1326,12 @@
139 };);
140
141 master_res= NULL;
142+ DBUG_EXECUTE_IF("get_master_version.timestamp.ER_NET_READ_INTERRUPTED",
143+ {
144+ DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED");
145+ DBUG_SET("-d,get_master_version.timestamp."
146+ "ER_NET_READ_INTERRUPTED");
147+ });
148 if (!mysql_real_query(mysql, STRING_WITH_LEN("SELECT UNIX_TIMESTAMP()")) &&
149 (master_res= mysql_store_result(mysql)) &&
150 (master_row= mysql_fetch_row(master_res)))
151@@ -1375,6 +1382,12 @@
152 };);
153 master_res= NULL;
154 master_row= NULL;
155+ DBUG_EXECUTE_IF("get_master_version.server_id.ER_NET_READ_INTERRUPTED",
156+ {
157+ DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED");
158+ DBUG_SET("-d,get_master_version.server_id."
159+ "ER_NET_READ_INTERRUPTED");
160+ });
161 if (!mysql_real_query(mysql,
162 STRING_WITH_LEN("SHOW VARIABLES LIKE 'SERVER_ID'")) &&
163 (master_res= mysql_store_result(mysql)) &&
164@@ -1564,17 +1577,31 @@
165 */
166 llstr((ulonglong) (mi->heartbeat_period*1000000000UL), llbuf);
167 sprintf(query, query_format, llbuf);
168+ DBUG_EXECUTE_IF("get_master_version.heartbeat.ER_NET_READ_INTERRUPTED",
169+ {
170+ DBUG_SET("+d,inject_ER_NET_READ_INTERRUPTED");
171+ DBUG_SET("-d,get_master_version.heartbeat."
172+ "ER_NET_READ_INTERRUPTED");
173+ });
174+ if (mysql_real_query(mysql, query, strlen(query))) {
175+ if (is_network_error(mysql_errno(mysql)))
176+ {
177+ mi->report(WARNING_LEVEL, mysql_errno(mysql),
178+ "SET @master_heartbeat_period to master failed with error: %s",
179+ mysql_error(mysql));
180+ mysql_free_result(mysql_store_result(mysql));
181+ goto network_err;
182+ }
183+ else if (!check_io_slave_killed(mi->io_thd, mi, NULL)) {
184+ errmsg= "The slave I/O thread stops because SET @master_heartbeat_period "
185+ "on master failed.";
186+ err_code= ER_SLAVE_FATAL_ERROR;
187+ sprintf(err_buff, "%s Error: %s", errmsg, mysql_error(mysql));
188+ mysql_free_result(mysql_store_result(mysql));
189+ goto err;
190+ }
191+ }
192
193- if (mysql_real_query(mysql, query, strlen(query))
194- && !check_io_slave_killed(mi->io_thd, mi, NULL))
195- {
196- errmsg= "The slave I/O thread stops because SET @master_heartbeat_period "
197- "on master failed.";
198- err_code= ER_SLAVE_FATAL_ERROR;
199- sprintf(err_buff, "%s Error: %s", errmsg, mysql_error(mysql));
200- mysql_free_result(mysql_store_result(mysql));
201- goto err;
202- }
203 mysql_free_result(mysql_store_result(mysql));
204 }
205

Subscribers

People subscribed via source and target branches