fake-changes infinite loop on update, allocates many extents

Bug #917942 reported by Mark Callaghan
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Fix Released
High
Laurynas Biveinis
5.1
Fix Released
High
Laurynas Biveinis
5.5
Fix Released
High
Laurynas Biveinis

Bug Description

With the fake-changes patch, a transaction that does an UPDATE with innodb_fake_changes=1 can get into an infinite loop when btr_cur_pessimistic_update reserves free extents. The problem is that cursor->tree_height is used to determine the number of extents to allocate and this might be uninitialized when innodb_fake_changes=1. The result is a potentially huge value is used for the number of extents to reserve and the database file grows quickly (9MB to 200GB before we caught it). Running OPTIMIZE TABLE will reclaim the space.

The attached patch has two tests. The *bugs2 test reproduces the problem for update. The *bug3 test does not reproduce the problem for insert and it looks like cursor->tree_height is initialized when btr_cur_pessimistic_insert is called. I do not think the problem is possible for btr_cur_pessimistic_delete because that is called for purge and undo and innodb_fake_changes cannot be set in those cases.

Text from my patch:

Prevent space leaks when btr_cur_pessimistic_update references
cursor->tree_height and tree_height was not initialized. There are 1+
code paths (I found 1) by which this can happen. The workaround is
to hardwire the number of extents to reserve when it was not initialized.
When the new test is run tree_height is uninitialized on a few calls
to buf_cur_pessimistic_update. There is a test for pessimistic insert,
but the variable was initialized without this change. The pessimistic
delete function is called for undo and purge -- neither of which can
be done with fake-changes.

This happens when btr_pcur_restore_position_func is called with latch_mode=1
or 2 because latch_mode was downgraded from a stronger level when
fake_changes=1. When latch_mode == 1 or 2, btr_pcur_restore_position_func
has a path to return before calling btr_pcur_open_with_no_init_func ->
btr_cur_search_to_nth_level and tree_height might be set in
btr_cur_search_to_nth_level.
Note (1 == BTR_SEARCH_LEAF, 2 == BTR_MODIFY_LEAF, 33 == BTR_MODIFY_TREE).

The other problem is that btr_cur_update_alloc_zip should not call
page_zip_compress_write_log_no_data for a fake_changes transaction
as a fake-change shouldn't write the transaction log (hopefully
it didn't change anything other than getting a few locks).

Tags: contribution

Related branches

Revision history for this message
Mark Callaghan (mdcallag) wrote :
Changed in percona-server:
assignee: nobody → Valentine Gostev (longbow)
Stewart Smith (stewart)
Changed in percona-server:
status: New → Triaged
importance: Undecided → High
tags: added: contribution
Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :
Download full text (3.9 KiB)

Current Percona Server 5.1 debug build crashes on percona_innodb_fake_changes_bugs2.test this way:

121011 18:50:37 InnoDB: Assertion failure in thread 140724337829632 in file ./include/mtr0log.ic line 192
InnoDB: Failing assertion: mtr_memo_contains_page(mtr, ptr, 2)

Thread 1 (Thread 0x7f5068343700 (LWP 21949)):
#0 __pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:63
#1 0x0000000000a70b3d in my_write_core (sig=6) at stacktrace.c:437
#2 0x00000000007c580e in handle_fatal_signal (sig=6) at signal_handler.cc:249
#3 <signal handler called>
#4 0x00007f50693da445 in __GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#5 0x00007f50693ddbab in __GI_abort () at abort.c:91
#6 0x00000000008efd57 in mlog_write_initial_log_record_fast (ptr=0x7f5063b64000 "", type=51 '3', log_ptr=0x7f5068340810 "\a", mtr=0x7f50683405b0) at ./include/mtr0log.ic:192
#7 0x0000000000904369 in page_zip_compress_write_log (page_zip=0x7f505bfd7188, page=0x7f5063b64000 "", index=0x7f5038034198, mtr=0x7f50683405b0) at page/page0zip.c:351
#8 0x0000000000906b79 in page_zip_compress (page_zip=0x7f505bfd7188, page=0x7f5063b64000 "", index=0x7f5038034198, mtr=0x7f50683405b0) at page/page0zip.c:1445
#9 0x0000000000975bd2 in btr_cur_update_alloc_zip (page_zip=0x7f505bfd7188, block=0x7f505bfd7178, index=0x7f5038034198, length=6528, create=1, mtr=0x7f50683405b0) at btr/btr0cur.c:1781
#10 0x00000000009765cc in btr_cur_optimistic_update (flags=2, cursor=0x7f50380158b8, update=0x7f50380136f0, cmpl_info=1, thr=0x7f50380138f0, mtr=0x7f50683405b0) at btr/btr0cur.c:2062
#11 0x000000000093bafc in row_upd_clust_rec (node=0x7f50380135e8, index=0x7f5038034198, thr=0x7f50380138f0, mtr=0x7f50683405b0) at row/row0upd.c:1970
#12 0x000000000093c468 in row_upd_clust_step (node=0x7f50380135e8, thr=0x7f50380138f0) at row/row0upd.c:2251
#13 0x000000000093c6b4 in row_upd (node=0x7f50380135e8, thr=0x7f50380138f0) at row/row0upd.c:2331
#14 0x000000000093c9ac in row_upd_step (thr=0x7f50380138f0) at row/row0upd.c:2464
#15 0x000000000092656d in row_update_for_mysql (mysql_rec=0x7f5038036228 "\244", <incomplete sequence \310>, prebuilt=0x7f503800add8) at row/row0mysql.c:1416
#16 0x00000000008b8077 in ha_innobase::update_row (this=0x7f5038035cf8, old_row=0x7f5038036228 "\244", <incomplete sequence \310>, new_row=0x7f5038036148 "\244", <incomplete sequence \310>) at handler/ha_innodb.cc:5668
#17 0x00000000007aff6c in handler::ha_update_row (this=0x7f5038035cf8, old_data=0x7f5038036228 "\244", <incomplete sequence \310>, new_data=0x7f5038036148 "\244", <incomplete sequence \310>) at handler.cc:4837
#18 0x0000000000709812 in write_record (thd=0x2de19d8, table=0x7f5038031e08, info=0x7f5068340de0) at sql_insert.cc:1520
#19 0x0000000000707c02 in mysql_insert (thd=0x2de19d8, table_list=0x7f5038004aa8, fields=..., values_list=..., update_fields=..., update_values=..., duplic=DUP_UPDATE, ignore=false) at sql_insert.cc:863
#20 0x0000000000668503 in mysql_execute_command (thd=0x2de19d8) at sql_parse.cc:3436
#21 0x0000000000671085 in mysql_parse (thd=0x2de19d8, rawbuf=0x7f5038004908 "insert into t1 values (200, \"foo\", lpad(\...

Read more...

Revision history for this message
Mark Callaghan (mdcallag) wrote :

We have changed compression code in the FB patch and that includes several bug fixes that might not be in your 5.1 branch. Perhaps that is the reason we don't get a crash. Can you paste the code from page_zip_compress_log where it crashes?

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Mark -

This is an ut_ad() crash, and we only build with UNIV_DEBUG for debug configurations as of last week or so, thus this might very well be some UNIV_DEBUG fallout too. The mlog_write_initial_log_record_fast() failing assertion already looks suspicious in the context of fake updates: ut_ad(mtr_memo_contains_page(mtr, ptr, MTR_MEMO_PAGE_X_FIX)); I haven't tested the release build yet.

The page_zip_compress_log code is the following. The crashing call is couple of lines below the #if FIL_PAGE_DATA > PAGE_DATA block.

I will investigate this tomorrow.

static
void
page_zip_compress_write_log(
/*========================*/
 const page_zip_des_t* page_zip,/*!< in: compressed page */
 const page_t* page, /*!< in: uncompressed page */
 dict_index_t* index, /*!< in: index of the B-tree node */
 mtr_t* mtr) /*!< in: mini-transaction */
{
 byte* log_ptr;
 ulint trailer_size;

 ut_ad(!dict_index_is_ibuf(index));

 log_ptr = mlog_open(mtr, 11 + 2 + 2);

 if (!log_ptr) {

  return;
 }

 /* Read the number of user records. */
 trailer_size = page_dir_get_n_heap(page_zip->data)
  - PAGE_HEAP_NO_USER_LOW;
 /* Multiply by uncompressed of size stored per record */
 if (!page_is_leaf(page)) {
  trailer_size *= PAGE_ZIP_DIR_SLOT_SIZE + REC_NODE_PTR_SIZE;
 } else if (dict_index_is_clust(index)) {
  trailer_size *= PAGE_ZIP_DIR_SLOT_SIZE
   + DATA_TRX_ID_LEN + DATA_ROLL_PTR_LEN;
 } else {
  trailer_size *= PAGE_ZIP_DIR_SLOT_SIZE;
 }
 /* Add the space occupied by BLOB pointers. */
 trailer_size += page_zip->n_blobs * BTR_EXTERN_FIELD_REF_SIZE;
 ut_a(page_zip->m_end > PAGE_DATA);
#if FIL_PAGE_DATA > PAGE_DATA
# error "FIL_PAGE_DATA > PAGE_DATA"
#endif
 ut_a(page_zip->m_end + trailer_size <= page_zip_get_size(page_zip));

 log_ptr = mlog_write_initial_log_record_fast((page_t*) page,
           MLOG_ZIP_PAGE_COMPRESS,
           log_ptr, mtr);
 mach_write_to_2(log_ptr, page_zip->m_end - FIL_PAGE_TYPE);
 log_ptr += 2;
 mach_write_to_2(log_ptr, trailer_size);
 log_ptr += 2;
 mlog_close(mtr, log_ptr);

 /* Write FIL_PAGE_PREV and FIL_PAGE_NEXT */
 mlog_catenate_string(mtr, page_zip->data + FIL_PAGE_PREV, 4);
 mlog_catenate_string(mtr, page_zip->data + FIL_PAGE_NEXT, 4);
 /* Write most of the page header, the compressed stream and
 the modification log. */
 mlog_catenate_string(mtr, page_zip->data + FIL_PAGE_TYPE,
        page_zip->m_end - FIL_PAGE_TYPE);
 /* Write the uncompressed trailer of the compressed page. */
 mlog_catenate_string(mtr, page_zip->data + page_zip_get_size(page_zip)
        - trailer_size, trailer_size);
}

Revision history for this message
Laurynas Biveinis (laurynas-biveinis) wrote :

Doh! I was trying to run the testcase without patch applied to observe the issue, and the cause of this crash was stated right here all along:

  The other problem is that btr_cur_update_alloc_zip should not call
  page_zip_compress_write_log_no_data for a fake_changes transaction
  as a fake-change shouldn't write the transaction log (hopefully
  it didn't change anything other than getting a few locks).

Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-538

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.