MySQL upstart stop job does not cleanly shutdown mysql
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
upstart |
Invalid
|
Undecided
|
Unassigned | ||
mysql-dfsg-5.1 (Ubuntu) |
Fix Released
|
High
|
Clint Byrum | ||
Lucid |
Fix Released
|
High
|
Clint Byrum | ||
Maverick |
Fix Released
|
High
|
Clint Byrum |
Bug Description
Release: Ubuntu 10.04 LTS
Package Version: 5.1.41-3ubuntu12.6
Stopping a MySQL server can take some time, especially on a busy database. If mysqld gets killed while having open connections, then after the next start you will likely have a lot of tables corrupted.
The new upstart mechanism for starting/stopping mysql unfortunately behaves as follows on a "service mysql stop":
1. First it will send a TERM signal to mysqld which is perfectly fine as mysqld will do the same as if you would issue a mysqladmin shutdown command
2. If after 5 seconds the process is still not terminated, a KILL signal will be send. This is really a very big issue since on busy servers lots of myisam tables will be corrupted afterwards
Expected behaviour:
Send the term signal and wait for a certain amount of time (at least 1 minute). If still not stopped, then simply give an error about this to the user
== SRU REPORT ==
=== IMPACT ===
This has very little impact on a normal running system. It may cause shutdown to take up to 5 minutes, but that is an acceptable trade off as recovering a system with crashed tables can take much, much longer.
=== DEV FIX ===
Raising kill timeout to 300 seconds seems a good trade off between shutdown potentially taking forever, and trying not to corrupt tables.
=== PATCH ===
See merge proposal
=== TEST CASE: ===
This is a race condition, so it is hard to reproduce reliably. However, if you:
1. create a very large innodb table
2. import a lot of data (inserting more MB of data than your drives can write in 5 seconds should work)
3. run a very long select on the data in one thread (SELECT * FROM table WHERE column like '%X%';)
4. DELETE that data in another thread
5. issue 'stop mysql' immediately after the delete returns.
6. check to see that mysqld was sent SIGKILL in /var/log/daemon.log
7. start mysql -- at this point InnoDB recovery will need to be run
Once the fix is applied, the SIGKILL should never be sent.
=== REGRESSION POTENTIAL ===
The only regression potential is mentioned in the IMPACT section, that the shutdown of a server or the mysql service may take up to 5 minutes.
Related branches
- Mathias Gug: Pending requested
-
Diff: 27 lines (+9/-0)2 files modifieddebian/changelog (+6/-0)
debian/mysql-server-5.1.mysql.upstart (+3/-0)
- Bryce Harrington: Approve (packaging)
- Ubuntu branches: Pending requested
-
Diff: 27 lines (+9/-0)2 files modifieddebian/changelog (+6/-0)
debian/mysql-server-5.1.mysql.upstart (+3/-0)
Changed in mysql-dfsg-5.1 (Ubuntu): | |
assignee: | nobody → Clint Byrum (clint-fewbar) |
tags: | added: lucid |
tags: | added: server-mrs |
Changed in upstart: | |
status: | New → Invalid |
Changed in mysql-dfsg-5.1 (Ubuntu Maverick): | |
status: | Triaged → Confirmed |
assignee: | Clint Byrum (clint-fewbar) → nobody |
Changed in mysql-dfsg-5.1 (Ubuntu Maverick): | |
assignee: | nobody → Clint Byrum (clint-fewbar) |
status: | Confirmed → In Progress |
description: | updated |
tags: |
added: verification-done removed: verification-needed |
tags: | added: testcase |
Thank you for your complete bug report.
It would seem this can be fixed in the upstart job with a line like 'kill timeout 300'. From man 5 init:
| kill timeout INTERVAL
| Specifies the interval between sending the job's main process
| the SIGTERM and SIGKILL signals when stopping the running job.