Merge into t12 : t12-crash-safe-slave : Code : Percona Server moved to https://jira.percona.com/projects/PS

Status:	Merged
Merge reported by:	George Ormond Lorch III
Merged at revision:	not available
Proposed branch:	lp:~percona-core/percona-server/t12-crash-safe-slave
Merge into:	lp:~percona-core/percona-server/t12
To merge this branch:	bzr merge lp:~percona-core/percona-server/t12-crash-safe-slave
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Vlad Lesin (community)	g2		Approve on 2013-10-21
Sergei Glushchenko (community)	g2	2013-10-15	Approve on 2013-10-17
Review via email: mp+191254@code.launchpad.net

Description of the change

http://jenkins.percona.com/view/MySQL/job/mysql-5.5-param/9/

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-10-16:

#

Hi George!

I've looked at the patch and I have couple of questions.

   * MYSQL_SERVER is defined because we use some MySQL headers in
     ha_innodb.cc. Why do we use ifdef MYSQL_SERVER and #ifndef
     MYSQL_SERVER in ha_innodb.cc? We defined it unconditionally for
     this file, so ha_innodb.cc seems cannot be built MYSQL_CLIENT defined.
   * line 845 of the diff seems just adds an empty line
   * can you clarify what is the Crash Safe Slave feature and why it
     cannot be fully replaced with our Crash Resistant Replication?
     Why do we need to store additional set of log coordinates in trx
     record?

review: Needs Fixing (g2)

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2013-10-16:

#

> Hi George!
>
> I've looked at the patch and I have couple of questions.
>
> * MYSQL_SERVER is defined because we use some MySQL headers in
> ha_innodb.cc. Why do we use ifdef MYSQL_SERVER and #ifndef
> MYSQL_SERVER in ha_innodb.cc? We defined it unconditionally for
> this file, so ha_innodb.cc seems cannot be built MYSQL_CLIENT defined.

This was lifted directly from PS 5.5. I had wondered the same things but was unsure of why it was placed there originally but did not want to spend too much time investigating that. Maybe I can get Laurynas to comment on that.

> * line 845 of the diff seems just adds an empty line

Good catch, that was left behind from when I had debugging code in there. I thought I had everything scrubbed. I guess not :-\

> * can you clarify what is the Crash Safe Slave feature and why it
> cannot be fully replaced with our Crash Resistant Replication?
> Why do we need to store additional set of log coordinates in trx
> record?

Honestly, we do not know what the purpose of that feature is. There is an MTR test in the twitter code for that functionality and it fails if we completely replace it with our CSS implementation. The only place we could find any mention of it is here http://darnaut.blogspot.com/2012/06/changes-in-twitter-mysql-5523t6.html

"Export the last know good binlog position as a status variable from InnoDB
Introduced two new status variables that export the master binary log name and position of a slave as stored by InnoDB. Whenever the SQL thread commits a transaction, InnoDB also commits the master binary log name and position to the system tablespace. Now this information can be retrieved through the Innodb_mysql_master_log_file and Innodb_mysql_master_log_pos status variables.

It is worth noting that the position stored within InnoDB might not match the actual position of the SQL thread. The position is only updated when the SQL thread applies an event that uses an InnoDB table and causes a read-write transaction to be committed. For example, an UPDATE statement which does not change any rows is considered a read-only transaction."

The real problem with it is that is calculates/captures the master pos at a slightly different point than our CSS does and they are not compatible for the purpose of a real crash safe slave. After discussing a bit with Calvin we decided to go with this split implementation just to be safe so that anything that depends on that functionality in the twitter infrastructure will still work.

> Hi George!
> 
> I've looked at the patch and I have couple of questions.
> 
>    * MYSQL_SERVER is defined because we use some MySQL headers in
>      ha_innodb.cc.  Why do we use ifdef MYSQL_SERVER and #ifndef
>      MYSQL_SERVER in ha_innodb.cc? We defined it unconditionally for
>      this file, so ha_innodb.cc seems cannot be built MYSQL_CLIENT defined.

This was lifted directly from PS 5.5. I had wondered the same things but was unsure of why it was placed there originally but did not want to spend too much time investigating that. Maybe I can get Laurynas to comment on that.

>    * line 845 of the diff seems just adds an empty line

Good catch, that was left behind from when I had debugging code in there. I thought I had everything scrubbed. I guess not :-\

>    * can you clarify what is the Crash Safe Slave feature and why it
>      cannot be fully replaced with our Crash Resistant Replication?
>      Why do we need to store additional set of log coordinates in trx
>      record?

Honestly, we do not know what the purpose of that feature is. There is an MTR test in the twitter code for that functionality and it fails if we completely replace it with our CSS implementation. The only place we could find any mention of it is here http://darnaut.blogspot.com/2012/06/changes-in-twitter-mysql-5523t6.html

"Export the last know good binlog position as a status variable from InnoDB
Introduced two new status variables that export the master binary log name and position of a slave as stored by InnoDB. Whenever the SQL thread commits a transaction, InnoDB also commits the master binary log name and position to the system tablespace. Now this information can be retrieved through the Innodb_mysql_master_log_file and Innodb_mysql_master_log_pos status variables.

It is worth noting that the position stored within InnoDB might not match the actual position of the SQL thread. The position is only updated when the SQL thread applies an event that uses an InnoDB table and causes a read-write transaction to be committed. For example, an UPDATE statement which does not change any rows is considered a read-only transaction."

The real problem with it is that is calculates/captures the master pos at a slightly different point than our CSS does and they are not compatible for the purpose of a real crash safe slave. After discussing a bit with Calvin we decided to go with this split implementation just to be safe so that anything that depends on that functionality in the twitter infrastructure will still work.

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-10-17:

#

Seems like vanilla InnoDB stores pointer to the statement which already executed, while we store pointer to the statement which should be executed next. That is why Twitter's test has "SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1".

If we have binlog like this:

BEGIN
INSERT INTO t1 VALUES (3,0)
COMMIT;/*1*/
BEGIN;/*2*/
INSERT INTO t1 VALUES (4,0);
COMMIT;

and we just executed COMMIT/*1*/, vanilla InnoDB will point to COMMIT/*1*/ while PS will point to BEGIN/*2*/.
So we can make this test case to work by just removing "SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1".

If twitter really need pointer to COMMIT/*1*/, can we calculate it? If it is not trivial I agree we have to store it.

Revision history for this message

George Ormond Lorch III (gl-az) wrote on 2013-10-17:

#

Yup, that is precisely the difference and I thought I noted that (maybe
not quite as clearly as you illustrated) in a code comment.

The t12 code does more than just the vanilla InnoDB implementation, it
is almost as if it is 20% implementation of crash safe slave but without
the master/relay overwrite on startup after slave crash.

I don't think calculating backwards after the fact can be easily done
since at the point of calculation we don't have all of the necessary
context info to re-create a proper position. It is a good idea, but I
don't think it is possible.

I can appreciate the desire to not want to duplicate this data in the
trx header and use up even more space and that decision was not made in
haste. When working with this customer, the absolute worst thing we
could do is introduce a change/bug that breaks their existing
application. Since I could get no clear indication from them on how/what
this patch of theirs is used for, we chose to step entirely around it
rather than to try to integrate the two and potentially break whatever
it is they use it for.

Thanks for the thoughtful review Sergei!

On 10/17/2013 8:15 AM, Sergei Glushchenko wrote:
> Seems like vanilla InnoDB stores pointer to the statement which already executed, while we store pointer to the statement which should be executed next. That is why Twitter's test has "SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1".
>
> If we have binlog like this:
>
> BEGIN
> INSERT INTO t1 VALUES (3,0)
> COMMIT;/*1*/
> BEGIN;/*2*/
> INSERT INTO t1 VALUES (4,0);
> COMMIT;
>
> and we just executed COMMIT/*1*/, vanilla InnoDB will point to COMMIT/*1*/ while PS will point to BEGIN/*2*/.
> So we can make this test case to work by just removing "SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1".
>
> If twitter really need pointer to COMMIT/*1*/, can we calculate it? If it is not trivial I agree we have to store it.
>
>

--
George O. Lorch III
Software Engineer, Percona
+1-888-401-3401 x542 US/Arizona (GMT -7)
skype: george.ormond.lorch.iii

Yup, that is precisely the difference and I thought I noted that (maybe 
not quite as clearly as you illustrated) in a code comment.

The t12 code does more than just the vanilla InnoDB implementation, it 
is almost as if it is 20% implementation of crash safe slave but without 
the master/relay overwrite on startup after slave crash.

I don't think calculating backwards after the fact can be easily done 
since at the point of calculation we don't have all of the necessary 
context info to re-create a proper position. It is a good idea, but I 
don't think it is possible.

I can appreciate the desire to not want to duplicate this data in the 
trx header and use up even more space and that decision was not made in 
haste. When working with this customer, the absolute worst thing we 
could do is introduce a change/bug that breaks their existing 
application. Since I could get no clear indication from them on how/what 
this patch of theirs is used for, we chose to step entirely around it 
rather than to try to integrate the two and potentially break whatever 
it is they use it for.

Thanks for the thoughtful review Sergei!

On 10/17/2013 8:15 AM, Sergei Glushchenko wrote:
> Seems like vanilla InnoDB stores pointer to the statement which already executed, while we store pointer to the statement which should be executed next. That is why Twitter's test has "SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1".
>
> If we have binlog like this:
>
> BEGIN
> INSERT INTO t1 VALUES (3,0)
> COMMIT;/*1*/
> BEGIN;/*2*/
> INSERT INTO t1 VALUES (4,0);
> COMMIT;
>
> and we just executed COMMIT/*1*/, vanilla InnoDB will point to COMMIT/*1*/ while PS will point to BEGIN/*2*/.
> So we can make this test case to work by just removing "SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1".
>
> If twitter really need pointer to COMMIT/*1*/, can we calculate it? If it is not trivial I agree we have to store it.
>
>

-- 
George O. Lorch III
Software Engineer, Percona
+1-888-401-3401 x542 US/Arizona (GMT -7)
skype: george.ormond.lorch.iii

Revision history for this message

Sergei Glushchenko (sergei.glushchenko) wrote on 2013-10-17:

#

Nice, George, sounds good for me. I haven't really had a hope that we can easily calculate one pos given the other, just wanted you to confirm. I saw you comment with two different calculations but I couldn't figure out where

rli->group_master_log_pos +
(rli->future_event_relay_log_pos - rli->group_relay_log_pos);

points to :)

Probably this formula should be commented better in our code as well.

I am approving this MP. Well done!

review: Approve (g2)

Revision history for this message

Vlad Lesin (vlad-lesin) wrote on 2013-10-21:

#

ha_innodb.cc:2709-2751

strncpy() here does not produce not null-terminated strings in a bound case only because my_b_gets() which is called from init_strvar_from_file() sets 0 at the end of a string if string length from file exceeds maximum length.

Such construction as "char a[LEN]; strncpy(a,b,LEN);" is wrong in common case because if length of 'b' exceeds 'LEN' 'a' will not be null-terminated (see manual page for strncpy()).

I understand this is just a copy from approved code and it works well because my_b_gets() processes bound case well, but such code attracts attention during review.

review: Approve (g2)

Percona Server moved to https://jira.percona.com/projects/PS

Merge lp:~percona-core/percona-server/t12-crash-safe-slave into lp:~percona-core/percona-server/t12

Commit message

Description of the change

Subscribers