wsrep_start_position does not work unless grastate.dat is parseable

Bug #1112724 reported by Jay Janssen
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Galera
Confirmed
Medium
Unassigned
Percona XtraDB Cluster moved to https://jira.percona.com/projects/PXC
Status tracked in 5.6
5.5
Confirmed
Medium
Unassigned
5.6
Fix Released
Medium
Unassigned

Bug Description

I would expect --wsrep_start_position to always apply and override whatever state grastate.dat is in. My use case is a datadir that was recovered from an xtrabackup which doesn't save the grastate.dat. I recovered the position with --wsrep-recover.

Submitting a ---wsrep_start_position when the datadir does not have a grastate.dat not work -- it forces a zero state and SSTs instead.

130201 12:36:21 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130201 12:36:21 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.GQzLuIeAwV
130201 12:36:26 mysqld_safe WSREP: Recovered position 8d211006-5bf5-11e2-0800-067f71542765:426885
130201 12:36:26 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:426885'
130201 12:36:26 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:426885'
130201 12:36:26 [Note] WSREP: Read nil XID from storage engines, skipping position init
130201 12:36:26 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
130201 12:36:26 [Note] WSREP: wsrep_load(): Galera 2.3(r143) by Codership Oy <email address hidden> loaded succesfully.
130201 12:36:26 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
130201 12:36:26 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

Further, if I create an empty grastate.dat, it also fails:

[root@node3 lib]# ls -lah mysql/grastate.dat
-rw-r--r--. 1 mysql mysql 0 Feb 1 12:42 mysql/grastate.dat

130201 12:43:41 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130201 12:43:41 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.pnE4L8ze9L
130201 12:43:46 mysqld_safe WSREP: Recovered position 8d211006-5bf5-11e2-0800-067f71542765:431680
130201 12:43:46 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:431680'
130201 12:43:46 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:431680'
130201 12:43:46 [Note] WSREP: Read nil XID from storage engines, skipping position init
130201 12:43:46 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
130201 12:43:46 [Note] WSREP: wsrep_load(): Galera 2.3(r143) by Codership Oy <email address hidden> loaded succesfully.
130201 12:43:46 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

From what I can tell, wsrep_start_position only works if the grastate is present and has a parseable format:

[root@node3 lib]# cat mysql/grastate.dat
# GALERA saved state
version: 2.1
uuid: 8d211006-5bf5-11e2-0800-067f71542765
seqno: -1
cert_index:

[root@node3 lib]# service mysql start --wsrep_start_position=8d211006-5bf5-11e2-0800-067f71542765:434636

130201 12:49:09 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
130201 12:49:09 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.TbDFDzLI6T
130201 12:49:14 mysqld_safe WSREP: Recovered position 8d211006-5bf5-11e2-0800-067f71542765:434636
130201 12:49:14 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:434636'
130201 12:49:14 [Note] WSREP: wsrep_start_position var submitted: '8d211006-5bf5-11e2-0800-067f71542765:434636'
130201 12:49:14 [Note] WSREP: Read nil XID from storage engines, skipping position init
130201 12:49:14 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
130201 12:49:14 [Note] WSREP: wsrep_load(): Galera 2.3(r143) by Codership Oy <email address hidden> loaded succesfully.
130201 12:49:15 [Note] WSREP: Found saved state: 8d211006-5bf5-11e2-0800-067f71542765:-1
...
130201 12:49:15 [Note] WSREP: State transfer required:
 Group state: 8d211006-5bf5-11e2-0800-067f71542765:436317
 Local state: 8d211006-5bf5-11e2-0800-067f71542765:434636

In all other cases it does a zero state reset and forces SST. This will lead to unexpected results.

Revision history for this message
Jay Janssen (jay-janssen) wrote :

Forgot to mention:

Centos 6.3
Server version: 5.5.29 Percona XtraDB Cluster (GPL), wsrep_23.7.1.r3843
| wsrep_provider_version | 2.3(r143) |

Revision history for this message
Jay Janssen (jay-janssen) wrote :

To be more concise:

130302 06:52:32 mysqld_safe WSREP: Running position recovery with --log_error=/tmp/tmp.WOzvdidmIe
130302 06:52:37 mysqld_safe WSREP: Recovered position 8797f811-7f73-11e2-0800-8b513b3819c1:314673
130302 6:52:37 [Note] WSREP: wsrep_start_position var submitted: '8797f811-7f73-11e2-0800-8b513b3819c1:314673'

130302 6:52:37 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
130302 6:52:37 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

130302 6:52:37 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1

Shouldn't the wsrep_start_position submitted here overrule a missing grastate.dat?

Fixing this would probably solve: https://bugs.launchpad.net/codership-mysql/+bug/1111706

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Jay,

traditionally, yes, a command line parameter should take precedence over any defaults, configs, etc.

The problem here is that this option is used in _automatic_ (read: unattended) node recovery: to pass a GTID value found via --wsrep-recover option from InnoDB table space. So it is not always user-supplied, and hence grastate.dat takes precedence, since InnoDB does not store DDL and other non-transactional GTIDs.

And I don't think that lp:1111706 can ever be fixed without a risk of inconsistency.

Revision history for this message
Jay Janssen (jay-janssen) wrote : Re: [Bug 1112724] wsrep_start_position does not work unless grastate.dat is parseable

On Mar 2, 2013, at 11:52 AM, Alex Yurchenko <email address hidden> wrote:

> The problem here is that this option is used in _automatic_ (read:
> unattended) node recovery: to pass a GTID value found via --wsrep-
> recover option from InnoDB table space. So it is not always user-
> supplied, and hence grastate.dat takes precedence, since InnoDB does not
> store DDL and other non-transactional GTIDs.

It's really not clear at what points grastate takes precedence over --wsrep_start_position.

AFAIK, there are three (maybe 4) possible grastate.dat states:

1) UUID set, seqno is >= 0, indicating either a clean shutdown, or someone manually tinkering with the file
2) UUID set, seqno is -1: indicating an unclean shutdown/crash
3) UUID zeroed: wsrep abort (?) (like lp:1111706?, and RBR errors?)
3) grastate.dat missing or unparseable: someone trying to build a node from a backup, someone manually tinkering with the file, or something horrible (filesystem corruption)

AFAICT, --wsrep_start_position only works in case #2, am I right?

I can accept that #3 would not accept wsrep_start_position, since RBR errors should trigger SST. However, there should be a clear log entry explaining why wsrep_start_position is getting ignored (in any and every case that it is ignored, BTW).

However, I think --wsrep_start_position should apply in #4 -- this would make manual node recovery (say from a backup) much easier, and if you wanted to try an auto-recovery in case of #3, all you would need to do is delete the grastate and let it try to recover.

Jay Janssen, MySQL Consulting Lead, Percona
http://about.me/jay.janssen
Percona Live in Santa Clara, CA April 22nd-25th 2013
http://www.percona.com/live/mysql-conference-2013/

Revision history for this message
Alex Yurchenko (ayurchen) wrote :

Jay,

Now that you put it this way, I can't find any more excuses except that we have a pile of other issues with higher priorities ATM :)

affects: codership-mysql → galera
Changed in galera:
importance: Undecided → Low
milestone: none → 3.0beta
status: New → Confirmed
Changed in galera:
importance: Low → Medium
Revision history for this message
Raghavendra D Prabhu (raghavendra-prabhu) wrote :

I was looking at this from POV of xtrabackup and SST, and

===========

     st_.get (uuid, seqno);

    if (0 != args->state_uuid &&
        *args->state_uuid != WSREP_UUID_UNDEFINED &&
        *args->state_uuid == uuid &&
        seqno == WSREP_SEQNO_UNDEFINED)
    {
        /* non-trivial recovery information provided on startup, and db is safe
         * so use recovered seqno value */
        seqno = args->state_seqno;
    }
    log_debug << "End state: " << uuid << ':' << seqno << " #################";
    update_state_uuid (uuid);

    cc_seqno_ = seqno; // is it needed here?
    apply_monitor_.set_initial_position(seqno);
    if (co_mode_ != CommitOrder::BYPASS)
        commit_monitor_.set_initial_position(seqno);
    cert_.assign_initial_position(seqno, trx_proto_ver_);

=======================

It looks like the provided position (with wsrep-start-position)
is allowed only when UUID matches the one in grastate.dat and
sequence number is -1 in grastate.dat

Now, from Xtrabackup's perspective, --no-lock is only used when DDL and
non-transactional tables are not in effect (at the moment this needs
to checked manually). So, doesn't that mean if that is taken care of
(automatically since SST runs unattended on donor) then grastate.dat
won't be needed? Regarding DDL, is it not possible for SST code in WSREP
to acquire a shared MDL lock (MDL_SHARED_READ or MDL_SHARED_HIGH_PRIO) which should block DDL?

Changed in galera:
milestone: 3.0beta → 3.0
Changed in galera:
milestone: 3.0-beta → 3.1
Changed in galera:
milestone: 25.3.1 → 25.3.2
Changed in galera:
milestone: 25.3.2 → 25.3.3
Changed in galera:
milestone: 25.3.3 → 25.3.4
no longer affects: galera/2.x
Changed in galera:
milestone: 25.3.4 → 25.3.5
Changed in galera:
milestone: 25.3.5 → 25.3.6
Revision history for this message
Nilnandan Joshi (nilnandan-joshi) wrote :

This is still happening.

[root@percona-pxc56-2 mysql]# /etc/init.d/mysql start --wsrep_start_position=82316fb9-f7c4-11e4-b421-2a1d77880e60:27501

150521 14:58:31 mysqld_safe mysqld from pid file /var/lib/mysql/percona-pxc56-2.pid ended
150521 14:59:55 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
150521 14:59:55 mysqld_safe Skipping wsrep-recover for empty datadir: /var/lib/mysql
150521 14:59:55 mysqld_safe Assigning 00000000-0000-0000-0000-000000000000:-1 to wsrep_start_position
2015-05-21 14:59:55 0 [Note] WSREP: wsrep_start_position var submitted: '82316fb9-f7c4-11e4-b421-2a1d77880e60:27501'
2015-05-21 14:59:55 0 [Note] WSREP: wsrep_start_position var submitted: '00000000-0000-0000-0000-000000000000:-1'
2015-05-21 14:59:55 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-05-21 14:59:55 2307 [Warning] You need to use --log-bin to make --log-slave-updates work.
2015-05-21 14:59:55 2307 [Note] WSREP: Read nil XID from storage engines, skipping position init
2015-05-21 14:59:55 2307 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/libgalera_smm.so'
2015-05-21 14:59:55 2307 [Note] WSREP: wsrep_load(): Galera 3.9(r93aca2d) by Codership Oy <email address hidden> loaded successfully.
2015-05-21 14:59:55 2307 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2015-05-21 14:59:55 2307 [Warning] WSREP: Could not open saved state file for reading: /var/lib/mysql//grastate.dat
2015-05-21 14:59:55 2307 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1

Revision history for this message
Hrvoje Matijakovic (hrvojem) wrote :
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PXC-1051

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.