Merge lp:~derks/mysql-mmm/angel-infinit-failures into lp:mysql-mmm

Proposed by Pascal Hofmann
Status: Merged
Merged at revision: not available
Proposed branch: lp:~derks/mysql-mmm/angel-infinit-failures
Merge into: lp:mysql-mmm
Diff against target: 60 lines (+21/-2)
1 file modified
lib/Common/Angel.pm (+21/-2)
To merge this branch: bzr merge lp:~derks/mysql-mmm/angel-infinit-failures
Reviewer Review Type Date Requested Status
Pascal Hofmann Approve
Review via email: mp+18509@code.launchpad.net

This proposal supersedes a proposal from 2009-11-11.

To post a comment you must log in.
Revision history for this message
BJ Dierkes (derks) wrote : Posted in a previous version of this proposal

Submitted as a proposed solution for LP #473446, and implements the changes from the patch I posted to that bug.

Revision history for this message
Pascal Hofmann (pascalhofmann) wrote : Posted in a previous version of this proposal

I think a time condition should be added here. I don't want the angel to restart more than 10 times within 5 minutes, but I definitly want it to restart more than 10 times within hours/days/months.

review: Needs Fixing
Revision history for this message
BJ Dierkes (derks) wrote : Posted in a previous version of this proposal

> I think a time condition should be added here. I don't want the angel to
> restart more than 10 times within 5 minutes, but I definitly want it to
> restart more than 10 times within hours/days/months.

I agree to that point, though any time there is a successful run the 'attempts' variable is reset to 0. I will look into wrapping the logic around a 'if < 5 minutes since the last time $attempts was reset' check.

Revision history for this message
BJ Dierkes (derks) wrote : Posted in a previous version of this proposal

I have added the logic to only giveup if there have been 10 consecutive failures within 300 seconds (5 minutes). Please re-review the changes to lib/Common/Angel.pm from my temporary branch (at your convenience).

The result with these changes:

2009/11/17 12:23:18 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:23:29 FATAL Listener: Can't create socket!
2009/11/17 12:23:29 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:23:40 FATAL Listener: Can't create socket!
2009/11/17 12:23:40 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:23:51 FATAL Listener: Can't create socket!
2009/11/17 12:23:51 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:02 FATAL Listener: Can't create socket!
2009/11/17 12:24:02 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:13 FATAL Listener: Can't create socket!
2009/11/17 12:24:13 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:24 FATAL Listener: Can't create socket!
2009/11/17 12:24:24 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:35 FATAL Listener: Can't create socket!
2009/11/17 12:24:35 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:46 FATAL Listener: Can't create socket!
2009/11/17 12:24:47 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:58 FATAL Listener: Can't create socket!
2009/11/17 12:24:58 FATAL Child exited with exitcode 99 and has failed more than 10 times consecutively in the last 5 minutes, not restarting

Revision history for this message
Pascal Hofmann (pascalhofmann) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'lib/Common/Angel.pm'
2--- lib/Common/Angel.pm 2009-06-08 23:06:17 +0000
3+++ lib/Common/Angel.pm 2010-02-03 09:59:10 +0000
4@@ -10,12 +10,17 @@
5
6 our $start_process;
7 our $pid;
8+our $attempts;
9+our $starttime;
10+
11
12 sub Init($) {
13 my $pidfile = shift;
14
15
16 $MMM::Common::Angel::start_process = 1;
17+ $MMM::Common::Angel::attempts = 0;
18+ $MMM::Common::Angel::starttime = time();
19 my $is_shutdown = 0;
20
21 $pidfile->create() if (defined($pidfile));
22@@ -25,6 +30,7 @@
23 local $SIG{QUIT} = \&MMM::Common::Angel::SignalHandler;
24
25 do {
26+ $MMM::Common::Angel::attempts++;
27
28 if ($MMM::Common::Angel::start_process) {
29 $MMM::Common::Angel::start_process = 0;
30@@ -41,6 +47,9 @@
31
32 # Wait for child to exit
33 if (waitpid($MMM::Common::Angel::pid, 0) == -1) {
34+ # child exited clean, reset attempts and starttime
35+ $MMM::Common::Angel::attempts = 0;
36+ $MMM::Common::Angel::starttime = time();
37 if ($ERRNO{ECHLD}) {
38 $is_shutdown = 1 unless ($MMM::Common::Angel::start_process);
39 }
40@@ -52,8 +61,18 @@
41 $is_shutdown = 1;
42 }
43 else {
44- FATAL sprintf("Child exited with exitcode %s, restarting", WEXITSTATUS($?));
45- $MMM::Common::Angel::start_process = 1;
46+ my $now = time();
47+ my $diff = $now - $MMM::Common::Angel::starttime;
48+ if ($MMM::Common::Angel::attempts >= 10 && $diff < 300) {
49+ FATAL sprintf("Child exited with exitcode %s and has failed more than 10 times consecutively in the last 5 minutes, not restarting", WEXITSTATUS($?));
50+ $MMM::Common::Angel::start_process = 0;
51+ $is_shutdown = 1;
52+ }
53+ else {
54+ FATAL sprintf("Child exited with exitcode %s, restarting after 10 second sleep", WEXITSTATUS($?));
55+ sleep(10);
56+ $MMM::Common::Angel::start_process = 1;
57+ }
58 }
59 }
60 if (WIFSIGNALED($CHILD_ERROR)) {

Subscribers

People subscribed via source and target branches