Deadlock when trying to wake up transactions

Bug #310184 reported by Philip Stoev
2
Affects Status Importance Assigned to Milestone
PBXT
Fix Committed
Undecided
Vladimir Kolesnikov

Bug Description

When executing a concurrent workload, PBXT deadlocked in or around xt_xn_wakeup_transactions and related functions.

To reproduce

bzr branch lp:~randgen/randgen/main

and then execute:

$ perl runall.pl \
   --basedir=/build/mysql-5.1.30 \
   --mysqld=--plugin-dir=/build/pbxt-1.0.06-beta/src/.libs/ \
   --mysqld=--plugin-load=PBXT=libpbxt.so \
   --engine=PBXT \
   --grammar=conf/transactions.yy \
   --gendata=conf/transactions.zz \
   --reporters=Deadlock,Backtrace

Tags: deadlock rqg

Related branches

Revision history for this message
Philip Stoev (pstoev) wrote :
Revision history for this message
Vladimir Kolesnikov (vkolesnikov) wrote :

Philip,

As far as I understand you're talking about engine being freezed during test execution. Can you try to repeat this problem on the latest version. I remember we fixed a similar issue after 1.0.06. I tried the test on the latest lp:pbxt and it works...

Thanks,

Changed in pbxt:
status: New → Incomplete
Revision history for this message
Philip Stoev (pstoev) wrote :

Yes you are right the original deadlock (0% CPU usage) appears to be gone however running the test causes a 100% cpu situation on 1 core, with the rest of the threads blocked. The backtrace of the offending thread is:

#0 0x00000000001673ac in XTTabCache::xt_tc_read_4 (this=0x28e25d0, file=0x7f8ca00008c0, ref_id=218, value=0x7f8cac343618, thread=0x29b7b40)
    at tabcache_xt.cc:278
#1 0x0000000000167e18 in xt_tab_get_row (ot=<value optimized out>, row_id=1, var_rec_id=0x29090f0) at table_xt.cc:3678
#2 0x00000000001685f9 in tab_visible (ot=0x7f8ca0006b50, rec_head=<value optimized out>, new_rec_id=0x7f8cac3436a4) at table_xt.cc:2810
#3 0x000000000016a6ab in xt_tab_seq_next (ot=0x7f8ca0006b50, buffer=0x7f8ca0012e00 "Ы\226", eof=0x7f8cac3436ec) at table_xt.cc:4852
#4 0x000000000014b001 in ha_pbxt::rnd_next (this=0x7f8ca0000ee0, buf=0xd <Address 0xd out of bounds>) at ha_pbxt.cc:3099
---Type <return> to continue, or q <return> to quit---
#5 0x00000000007027b6 in rr_sequential (info=0x7f8cac3438e0) at records.cc:381
#6 0x00000000006a8113 in mysql_update (thd=0x7f8ca80b4140, table_list=0x7f8ca0001ef0, fields=@0x7f8ca80b60e0, values=@0x7f8ca80b6520, conds=0x0,
    order_num=<value optimized out>, order=0x0, limit=18446744073709551517, handle_duplicates=DUP_ERROR, ignore=false) at sql_update.cc:571
#7 0x00000000006231e5 in mysql_execute_command (thd=0x7f8ca80b4140) at sql_parse.cc:2959
#8 0x00000000006295aa in mysql_parse (thd=0x7f8ca80b4140,
    inBuf=0x7f8ca0006620 "UPDATE `table10_pbxt_int_autoinc` SET `int` = `int` + 30, `int_key` = `int_key` - 30", length=84, found_semicolon=0x7f8cac344fb8)
    at sql_parse.cc:5787
#9 0x000000000062a6a8 in dispatch_command (command=COM_QUERY, thd=0x7f8ca80b4140,
    packet=0x7f8ca80b6b41 " UPDATE `table10_pbxt_int_autoinc` SET `int` = `int` + 30, `int_key` = `int_key` - 30 ", packet_length=<value optimized out>)
    at sql_parse.cc:1200
#10 0x000000000062b4e4 in do_command (thd=0x7f8ca80b4140) at sql_parse.cc:857
#11 0x000000000061c536 in handle_one_connection (arg=<value optimized out>) at sql_connect.cc:1115
#12 0x000000315b0073da in start_thread () from /lib64/libpthread.so.0
#13 0x000000315a4e627d in clone () from /lib64/libc.so.6

Does the test complete succesfully for you?

Revision history for this message
Philip Stoev (pstoev) wrote :

The thread appears to be stuck in this loop forever:

2815 while (var_rec_id != ot->ot_curr_rec_id) {
(gdb)
2816 if (!var_rec_id) {
(gdb)
2822 if (!xt_tab_get_rec_data(ot, var_rec_id, sizeof(XTTabRecHeadDRec), (xtWord1 *) &var_head))
(gdb)
2829 if (XT_REC_IS_CLEAN(var_head.tr_rec_type_1)) {
(gdb)
2835 if (XT_REC_IS_FREE(var_head.tr_rec_type_1)) {
(gdb)
2840 if (invalid_rec != var_rec_id) {
(gdb)
2841 var_rec_id = invalid_rec;
(gdb)
2842 goto retry_3;
(gdb)
2810 if (!(xt_tab_get_row(ot, row_id, &var_rec_id)))
(gdb)
2815 while (var_rec_id != ot->ot_curr_rec_id) {

Revision history for this message
Vladimir Kolesnikov (vkolesnikov) wrote :

Philip,
On the first machine where I tried the test finished successfully but now I tried on another machine and I can see the freeze with 100% cpu utilization.
Thanks for the report.

Changed in pbxt:
assignee: nobody → vkolesnikov
status: Incomplete → Confirmed
Changed in pbxt:
status: Confirmed → In Progress
Revision history for this message
Vladimir Kolesnikov (vkolesnikov) wrote :

At the moment the fix is available at lp:~vkolesnikov/pbxt/pbxt-bug-310184 . I will give an update when it will be merged to the trunk.

Changed in pbxt:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.