Merge lp:~derks/mysql-mmm/angel-infinit-failures into lp:mysql-mmm

Proposed by Pascal Hofmann
Status: Merged
Merged at revision: not available
Proposed branch: lp:~derks/mysql-mmm/angel-infinit-failures
Merge into: lp:mysql-mmm
Diff against target: 60 lines (+21/-2)
1 file modified
lib/Common/Angel.pm (+21/-2)
To merge this branch: bzr merge lp:~derks/mysql-mmm/angel-infinit-failures
Reviewer Review Type Date Requested Status
Pascal Hofmann Approve
Review via email: mp+18509@code.launchpad.net

This proposal supersedes a proposal from 2009-11-11.

To post a comment you must log in.
Revision history for this message
BJ Dierkes (derks) wrote : Posted in a previous version of this proposal

Submitted as a proposed solution for LP #473446, and implements the changes from the patch I posted to that bug.

Revision history for this message
Pascal Hofmann (pascalhofmann) wrote : Posted in a previous version of this proposal

I think a time condition should be added here. I don't want the angel to restart more than 10 times within 5 minutes, but I definitly want it to restart more than 10 times within hours/days/months.

review: Needs Fixing
Revision history for this message
BJ Dierkes (derks) wrote : Posted in a previous version of this proposal

> I think a time condition should be added here. I don't want the angel to
> restart more than 10 times within 5 minutes, but I definitly want it to
> restart more than 10 times within hours/days/months.

I agree to that point, though any time there is a successful run the 'attempts' variable is reset to 0. I will look into wrapping the logic around a 'if < 5 minutes since the last time $attempts was reset' check.

Revision history for this message
BJ Dierkes (derks) wrote : Posted in a previous version of this proposal

I have added the logic to only giveup if there have been 10 consecutive failures within 300 seconds (5 minutes). Please re-review the changes to lib/Common/Angel.pm from my temporary branch (at your convenience).

The result with these changes:

2009/11/17 12:23:18 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:23:29 FATAL Listener: Can't create socket!
2009/11/17 12:23:29 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:23:40 FATAL Listener: Can't create socket!
2009/11/17 12:23:40 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:23:51 FATAL Listener: Can't create socket!
2009/11/17 12:23:51 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:02 FATAL Listener: Can't create socket!
2009/11/17 12:24:02 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:13 FATAL Listener: Can't create socket!
2009/11/17 12:24:13 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:24 FATAL Listener: Can't create socket!
2009/11/17 12:24:24 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:35 FATAL Listener: Can't create socket!
2009/11/17 12:24:35 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:46 FATAL Listener: Can't create socket!
2009/11/17 12:24:47 FATAL Child exited with exitcode 99, restarting after 10 second sleep
2009/11/17 12:24:58 FATAL Listener: Can't create socket!
2009/11/17 12:24:58 FATAL Child exited with exitcode 99 and has failed more than 10 times consecutively in the last 5 minutes, not restarting

Revision history for this message
Pascal Hofmann (pascalhofmann) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'lib/Common/Angel.pm'
--- lib/Common/Angel.pm 2009-06-08 23:06:17 +0000
+++ lib/Common/Angel.pm 2010-02-03 09:59:10 +0000
@@ -10,12 +10,17 @@
1010
11our $start_process;11our $start_process;
12our $pid;12our $pid;
13our $attempts;
14our $starttime;
15
1316
14sub Init($) { 17sub Init($) {
15 my $pidfile = shift;18 my $pidfile = shift;
1619
17 20
18 $MMM::Common::Angel::start_process = 1;21 $MMM::Common::Angel::start_process = 1;
22 $MMM::Common::Angel::attempts = 0;
23 $MMM::Common::Angel::starttime = time();
19 my $is_shutdown = 0;24 my $is_shutdown = 0;
2025
21 $pidfile->create() if (defined($pidfile));26 $pidfile->create() if (defined($pidfile));
@@ -25,6 +30,7 @@
25 local $SIG{QUIT} = \&MMM::Common::Angel::SignalHandler;30 local $SIG{QUIT} = \&MMM::Common::Angel::SignalHandler;
2631
27 do {32 do {
33 $MMM::Common::Angel::attempts++;
2834
29 if ($MMM::Common::Angel::start_process) {35 if ($MMM::Common::Angel::start_process) {
30 $MMM::Common::Angel::start_process = 0;36 $MMM::Common::Angel::start_process = 0;
@@ -41,6 +47,9 @@
4147
42 # Wait for child to exit48 # Wait for child to exit
43 if (waitpid($MMM::Common::Angel::pid, 0) == -1) {49 if (waitpid($MMM::Common::Angel::pid, 0) == -1) {
50 # child exited clean, reset attempts and starttime
51 $MMM::Common::Angel::attempts = 0;
52 $MMM::Common::Angel::starttime = time();
44 if ($ERRNO{ECHLD}) {53 if ($ERRNO{ECHLD}) {
45 $is_shutdown = 1 unless ($MMM::Common::Angel::start_process);54 $is_shutdown = 1 unless ($MMM::Common::Angel::start_process);
46 }55 }
@@ -52,8 +61,18 @@
52 $is_shutdown = 1;61 $is_shutdown = 1;
53 }62 }
54 else {63 else {
55 FATAL sprintf("Child exited with exitcode %s, restarting", WEXITSTATUS($?));64 my $now = time();
56 $MMM::Common::Angel::start_process = 1;65 my $diff = $now - $MMM::Common::Angel::starttime;
66 if ($MMM::Common::Angel::attempts >= 10 && $diff < 300) {
67 FATAL sprintf("Child exited with exitcode %s and has failed more than 10 times consecutively in the last 5 minutes, not restarting", WEXITSTATUS($?));
68 $MMM::Common::Angel::start_process = 0;
69 $is_shutdown = 1;
70 }
71 else {
72 FATAL sprintf("Child exited with exitcode %s, restarting after 10 second sleep", WEXITSTATUS($?));
73 sleep(10);
74 $MMM::Common::Angel::start_process = 1;
75 }
57 }76 }
58 }77 }
59 if (WIFSIGNALED($CHILD_ERROR)) {78 if (WIFSIGNALED($CHILD_ERROR)) {

Subscribers

People subscribed via source and target branches