Percona XtraBackup moved to https://jira.percona.com/projects/PXB

Bug #1214274
Comment #5

Comment 5 for bug 1214274

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2013-08-22: Re: [Bug 1214274] Re: kill_long_selects fails on slow Jenkins slaves

Hi Sergei,

Thanks, it is makes more sense now.

On Wed, 21 Aug 2013 14:16:41 -0000, Sergei Glushchenko wrote:
> Hi Alexey,
>
>> I have to admit I don't understand anything from the above:
>
> I was not clear enough.
>
>> - what select and update queries are "forked" by xtrabackup?
>
> Not by xtrabackup, but by test case. Test case executes selects/updates
> in background and xtrabackup should kill or not to kill them depending
> on arguments specified.
>
>> - why does xtrabackup starting before forked processes are forked lead
> to a test failure? Which from those 2 failures does this case correspond
> to?
>
> Xtrabackup may start before forked (by test) queries started justs
> because nothing prevents it.
>
>> - what is case #2 (the comment mentions case 1 twice)
>> - how can case #2 only happen for case #1?
>
> I mixed case 1 (which is under ====== case 1 ====== section of test) and two cases of failure :)
> Lets refer the one with innobackupex started early as "a)" - it is the "ubuntu-raring-64bit".
> "b)" - centos5-32.
>
> a) tried to test following
> - run update query which lasts 3 seconds
> - run select query witch lasts 3 seconds
> - run innobackupex which will kill all queries 5 seconds after FTWRL
> - as a result both queries and xtrabackup should succeed
> This is failed because innobackupex started to work earlier that queries in background and killed one of them.
>

OK, so it is a pure test case problem. I don't see a way to implement
the necessary level of synchronization to make the test fully
deterministic. So let's just increase innobackupex timeouts.

>> - how can xtrabackup "kill" its own connection? innobackupex may kill
> processes (but if it killed itself, it would fail differently?), and
> xtrabackup doesn't "own" any connections and doesn't kill anything.
>
> I referred xtrabackup as product. innobackupex killed it's own connection, and the error is
> innobackupex: Error:
> Error executing 'SHOW MASTER STATUS': DBD::mysql::db selectrow_hashref failed: MySQL server has gone away
>

This one looks suspicious. Any ideas how could that happen? I.e. how
could the innobackupex connection pass the is_query() /
is_select_query() checks in kill_long_queries()?

Besides I don't see any "Killing query ..." messages in the log. Is
there any chance they are not actually printed when a query is killed?

Anyway, this is a functionality issue and should be reported separately.
Can you analyze and report it?

Hi Sergei,

Thanks, it is makes more sense now.

On Wed, 21 Aug 2013 14:16:41 -0000, Sergei Glushchenko wrote:
> Hi Alexey,
>
>> I have to admit I don't understand anything from the above:
>
> I was not clear enough.
>
>> - what select and update queries are "forked" by xtrabackup?
>
> Not by xtrabackup, but by test case. Test case executes selects/updates
> in background and xtrabackup should kill or not to kill them depending
> on arguments specified.
>
>> - why does xtrabackup starting before forked processes are forked lead
> to a test failure? Which from those 2 failures does this case correspond
> to?
>
> Xtrabackup may start before forked (by test) queries started justs
> because nothing prevents it.
>
>> - what is case #2 (the comment mentions case 1 twice)
>> - how can case #2 only happen for case #1?
>
> I mixed case 1 (which is under ====== case 1 ====== section of test) and two cases of failure :)
> Lets refer the one with innobackupex started early as "a)" -  it is the "ubuntu-raring-64bit".
> "b)" - centos5-32.
>
> a) tried to test following
>     - run update query which lasts 3 seconds
>     - run select query witch lasts 3 seconds
>     - run innobackupex which will kill all queries 5 seconds after FTWRL
>     - as a result both queries and xtrabackup should succeed
>     This is failed because innobackupex started to work earlier that queries in background and killed one of them.
>

OK, so it is a pure test case problem. I don't see a way to implement 
the necessary level of synchronization to make the test fully 
deterministic. So let's just increase innobackupex timeouts.

This one looks suspicious. Any ideas how could that happen? I.e. how 
could the innobackupex connection pass the is_query() / 
is_select_query() checks in kill_long_queries()?

Besides I don't see any "Killing query ..." messages in the log. Is 
there any chance they are not actually printed when a query is killed?

Anyway, this is a functionality issue and should be reported separately. 
Can you analyze and report it?