RENAME/DROP crashes with innodb_track_changed_pages

Bug #1217002 reported by Hartmut Holzgraefe
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
Critical
Laurynas Biveinis
5.1
Fix Released
Critical
Laurynas Biveinis
5.5
Fix Released
Critical
Laurynas Biveinis
5.6
Fix Released
Critical
Laurynas Biveinis

Bug Description

The redo log reading code has been chaned in 5.5.30 with

  http://bazaar.launchpad.net/~mysql/mysql-server/5.5/revision/2661.834.1

Part of the change is that on MLOG_FILE_RENAME events the
recv_parse_or_apply_log_rec_body() does not only parse but
also applys the rename operation.

This can lead to InnoDB assertion failures if:

* innodb_track_changed_pages is enabled and parses the redo log
* a table is renamed
* and is dropped shortly after

The assertion messages are:

  InnoDB: Error: cannot find space id ### in the tablespace memory cache
  InnoDB: though the table '(name not specified)' in a rename operation should have that id
  InnoDB: Assertion failure in thread ### in file fil0fil.c line 2248

The related backtrace looks like this:

  ??:0(my_print_stacktrace)[0xa83a3e]
  ??:0(handle_fatal_signal)[0x6cc93b]
  ??:0(??)[0x3e9f00f500]
  ??:0(??)[0x3e9ec328a5]
  ??:0(??)[0x3e9ec34085]
  ??:0(fil_op_log_parse_or_replay)[0x8f7e37]
  ??:0(recv_parse_or_apply_log_rec_body)[0x92b936]
  ??:0(recv_parse_log_rec)[0x92be73]
  ??:0(log_online_follow_redo_log)[0x9334ab]
  ??:0(srv_redo_log_follow_thread)[0x86ecad]
  ??:0(??)[0x3e9f007851]
  ??:0(??)[0x3e9ece890d]

The important part of the InnoDB change is:

+ case MLOG_FILE_RENAME:
+ ptr = fil_op_log_parse_or_replay(ptr, end_ptr, type,
+ space_id, 0);
+ break;
        case MLOG_FILE_CREATE:
- case MLOG_FILE_RENAME:
        case MLOG_FILE_DELETE:
        case MLOG_FILE_CREATE2:
                ptr = fil_op_log_parse_or_replay(ptr, end_ptr, type, 0, 0);

which causes MLOG_FILE_RENAME events to be replayed when
the page change cache parses the logs. Now what can happen is
that a table gets renamed and then dropped shortly afterwards.
As the page change tracking thread reads the redo log with a
slight delay it may only end up reaching the RENAME event at
about the time when the DROP happens.

Now if this happens at just the "right" time then
fil_op_log_parse_or_replay() will execute
fil_tablespace_exists_in_mem(id) while the table still
exists, then check for fil_get_space_id_for_table(new_name)
right after it has been dropped, and comes to the wrong
conclusion that the rename has not happened yet and
needs to be recovered/redone. Then next it tries
fil_rename_tablespace() which fails as the space id for the
table no longer exists and eventually the

   cannot find space id ### in the tablespace memory cache

assertion is raised.

So actually a bug introduced by Oracle, but I'm reporting it
here only for now as it only affects the Percona page change
feature but has probably no negative effect on upstream
MySQL itself as the log parsing code is only used for crash
recovery or ibbackup postprocessing and never has to deal
with other DB activity going on in parallel ...

Related branches

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Hartmut -

Thank you for the bug report and the detailed analysis. Indeed we did not take into account this upstream change. But I'm not sure if there's a serious upstream bug to be found, the only thing I can think of is that recv_parse_or_apply_log_rec_body() header comment "File ops are parsed, but not applied in this function." is not true anymore.

tags: added: bitmap xtradb
Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

I've reported upstream bug about misleading comment: http://bugs.mysql.com/bug.php?id=70164

Revision history for this message
Valerii Kravchuk (valerii-kravchuk) wrote :

As upstream change happened in 5.1 and 5.6 code also, corresponding Percona server versions are also potentially affected.

Revision history for this message
Hartmut Holzgraefe (hartmut-php) wrote :

Yes, wording wasn't too clear in the original report, should have been "XtraDB assertion", not "InnoDB assertion" near the top, and "bug introduced by" should have been "problem caused by change made by" ...

On the Innobase/MySQL/Oracle side the code was written, and finally changed, under the assumption that it will only ever be executed single threaded during recovery operations (either on startup within the server itself, or when preparing an ibbackup for restore), so that no dictionary changes could possibly ever happen in parallel to the log reading ...

That's why I only reported the problem here, and not on bugs.mysql.com (although yes, some of the function names and comments were already confusing to begin with, and the "File ops are parsed, but not applied in this function." part now being outright wrong in the case of MLOG_FILE_RENAME

Revision history for this message
Hartmut Holzgraefe (hartmut-php) wrote :

Ah, and just now I see http://bugs.mysql.com/bug.php?id=70164 ... ;)

Revision history for this message
Hartmut Holzgraefe (hartmut-php) wrote :

What I've been wondering: is executing the rename really needed
for both code paths that call into recv_parse_or_apply_log_rec_body(),
or could it be limited to the case where it is called with block and
mtr arguments are !=NULL (the call path via recv_recover_page_func())
but not when called with block=NULL as the recv_parse_log_rec()
function that is also used by innodb_track_changed_pages?

If it could be limited to the first case with block!=NULL we'd have
an easy fix ...

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

See also bug 1213885.

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Hartmut -

Both recv_parse_or_apply_log_rec_body() callers need to execute MLOG_FILE_RENAME, actually the recv_parse_log_rec() even more so, as it is called during the initial log parsing, which must, among the other things, set up the data dictionary etc. so that the right pages are read and recovered later. Later is when the other caller recv_recover_page_func() is called, with records gathered for that page. MLOG_FILE_RENAME should be absent among them.

The easiest fix is to adjust recv_parse_or_apply_log_rec_body() to pass "recv_recovery_is_on() ? space_id : 0" instead of space_id to the fil_op_log_parse_or_replay() call.

summary: - RENAME/DROP crashes 5.5.30+ with innodb_track_changed_pages
+ RENAME/DROP crashes with innodb_track_changed_pages
Revision history for this message
Alexey Kopytov (akopytov) wrote :

Bug #1213885 was marked as a duplicate of this one.

tags: added: 56qual qa qablock
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-368

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.