Merge lp:~mysql-mmm-core/mysql-mmm/devel into lp:mysql-mmm

Proposed by Pascal Hofmann
Status: Merged
Merged at revision: not available
Proposed branch: lp:~mysql-mmm-core/mysql-mmm/devel
Merge into: lp:mysql-mmm
Diff against target: 1821 lines (+914/-368)
13 files modified
doc/mmm_configuration.texi (+23/-0)
doc/mmm_control.texi (+18/-95)
doc/mmm_monitor.texi (+19/-16)
lib/Agent/Agent.pm (+21/-2)
lib/Common/Config.pm (+4/-1)
lib/Monitor/Checker.pm (+2/-2)
lib/Monitor/Commands.pm (+92/-35)
lib/Monitor/Monitor.pm (+278/-190)
lib/Monitor/NetworkChecker.pm (+18/-15)
lib/Monitor/Roles.pm (+136/-2)
lib/Monitor/StartupStatus.pm (+298/-0)
lib/Monitor/t/Roles.t (+2/-2)
sbin/mmm_mond (+3/-8)
To merge this branch: bzr merge lp:~mysql-mmm-core/mysql-mmm/devel
Reviewer Review Type Date Requested Status
Pascal Hofmann Approve
Review via email: mp+20955@code.launchpad.net

Commit message

  * Added manual mode (bug #531011), wait mode, config values 'mode' and 'wait_for_other_master'
  * Don't die at startup when no network connection is available - wait for it to appear instead (bug #416572)
  * Changed startup behaviour. mmm_mond will only go into passive mode if it detects the active_master_role on more than one host.
  * Added config value 'careful_startup' (bug #422549). If set to 0 mmm_mond won't ever switch into passive mode at startup.
  * Added check for invalid agent commands (prevents crash when mmmd_mon version 1.x talks to an 2.x agent).

To post a comment you must log in.
Revision history for this message
Pascal Hofmann (pascalhofmann) :
review: Approve

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
=== modified file 'doc/mmm_configuration.texi'
--- doc/mmm_configuration.texi 2010-02-05 17:27:32 +0000
+++ doc/mmm_configuration.texi 2010-03-09 10:25:26 +0000
@@ -231,6 +231,29 @@
231@item Used by: @tab monitor231@item Used by: @tab monitor
232@end multitable232@end multitable
233233
234@item @strong{careful_startup}
235@multitable @columnfractions 0.2 0.8
236@item Description: @tab Startup carefully i.e. switch into passive mode when writer role is configured on multiple hosts.
237@item Allowed values: @tab true/yes/1/on false/no/0/off
238@item Default value: @tab 0
239@item Used by: @tab monitor
240@end multitable
241
242@item @strong{mode}
243@multitable @columnfractions 0.2 0.8
244@item Description: @tab Default mode of monitor.
245@item Allowed values: @tab active manual wait passive
246@item Default value: @tab active
247@item Used by: @tab monitor
248@end multitable
249
250@item @strong{wait_for_other_master}
251@multitable @columnfractions 0.2 0.8
252@item Description: @tab How many seconds to wait for other master to become @code{ONLINE} before switching from mode @code{WAIT} to mode @code{ACTIVE}. 0 = infinite.
253@item Default value: @tab 120
254@item Used by: @tab monitor
255@end multitable
256
234@end itemize257@end itemize
235258
236259
237260
=== modified file 'doc/mmm_control.texi'
--- doc/mmm_control.texi 2010-02-18 14:32:50 +0000
+++ doc/mmm_control.texi 2010-03-09 10:25:26 +0000
@@ -132,7 +132,7 @@
132@end example132@end example
133133
134@noindent134@noindent
135See @ref{Passive mode}.135See @ref{Modes}.
136136
137@section @code{set_active}137@section @code{set_active}
138Switch the monitor into @code{ACTIVE} mode:138Switch the monitor into @code{ACTIVE} mode:
@@ -143,7 +143,18 @@
143@end example143@end example
144144
145@noindent145@noindent
146See @ref{Passive mode}.146See @ref{Modes}.
147
148@section @code{set_manual}
149Switch the monitor into @code{MANUAL} mode:
150
151@example
152# mmm_control set_manual
153OK: Switched into manual mode.
154@end example
155
156@noindent
157See @ref{Modes}.
147158
148@section @code{set_passive}159@section @code{set_passive}
149Switch the monitor into @code{PASSIVE} mode:160Switch the monitor into @code{PASSIVE} mode:
@@ -154,10 +165,10 @@
154@end example165@end example
155166
156@noindent167@noindent
157See @ref{Passive mode}.168See @ref{Modes}.
158169
159@section @code{move_role @var{role} @var{host}}170@section @code{move_role @var{role} @var{host}}
160Used to move an exclusive role between the cluster nodes. This command is available in @code{ACTIVE} mode only. Lets assume the following situation:171Used to move an exclusive role between the cluster nodes. This command is not available in @code{PASSIVE} mode. Lets assume the following situation:
161172
162@smallexample173@smallexample
163# mmm_control show174# mmm_control show
@@ -179,96 +190,8 @@
179@end smallexample190@end smallexample
180191
181@section @code{move_role --force @var{role} @var{host}}192@section @code{move_role --force @var{role} @var{host}}
182Can be used to move the @var{active_master_role} to a host with state @code{REPLICATION_FAIL} or @code{REPLICATION_BACKLOG}. Use this with caution! This command is available in @code{ACTIVE} mode only.193Can be used to move the @var{active_master_role} to a host with state @code{REPLICATION_FAIL} or @code{REPLICATION_BACKLOG}. Use this with caution! This command is not available in @code{PASSIVE} mode.
183194
184@section @code{set_ip @var{ip} @var{host}}195@section @code{set_ip @var{ip} @var{host}}
185@code{set_ip} can be used to manipulate the roles in @code{PASSIVE} mode. The changes won't be applied until the monitor is switched into @code{ACTIVE} mode via @code{set_active}.196@code{set_ip} can be used to manipulate the roles in @code{PASSIVE} mode. The changes won't be applied until the monitor is switched into @code{ACTIVE} or @code{manual} mode via @code{set_active} or @code{set_manual}.
186197
187@*
188Let's assume we have our cluster up and running with the following status:
189
190@smallexample
191# mmm_control show
192 db1(192.168.0.31) master/ONLINE. Roles: writer(192.168.0.50)
193 db2(192.168.0.32) master/ONLINE. Roles: reader(192.168.0.51)
194 db3(192.168.0.33) slave/ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53)
195@end smallexample
196
197@noindent
198Now, several bad things happen:
199@enumerate
200@item network connection to db1 fails
201@item mmm_mond detects that db1 has failed
202@item mmm_mond moves the writer role to db2, but can't remove it from db1 (because it can't connect to it)
203@item mmm_mond crashes and the status file gets corrupted.
204@item network connection to db1 recovers
205@item The admin restarts mmm_mond
206@end enumerate
207
208mmm_mond has no status information now, and two nodes report, that they have the
209@code{writer} role, so mmm_mond doesn't know what it should do and will switch
210into @code{PASSIVE} mode.
211
212@smallexample
213# mmm_control mode
214PASSIVE
215
216# mmm_control show
217# --- Monitor is in PASSIVE MODE ---
218# Cause: Discrepancies between stored status, agent status and system status during startup.
219#
220# Stored status:
221# db1(192.168.0.31) master/UNKNOWN. Roles:
222# db2(192.168.0.32) master/UNKNOWN. Roles:
223# db3(192.168.0.33) slave/UNKNOWN. Roles:
224#
225# Agent status:
226# db1 ONLINE. Roles: writer(192.168.0.50). Master: ?
227# db2 ONLINE. Roles: writer(192.168.0.50), reader(192.168.0.51). Master: ?
228# db3 ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53). Master: db2
229#
230# System status:
231# db1 writable. Roles: writer(192.168.0.50)
232# db2 writable. Roles: writer(192.168.0.50), reader(192.168.0.51)
233# db3 readonly. Roles: reader(192.168.0.52), reader(192.168.0.53)
234#
235 db1(192.168.0.31) master/ONLINE. Roles: writer(192.168.0.50)
236 db2(192.168.0.32) master/ONLINE. Roles: reader(192.168.0.51)
237 db3(192.168.0.33) slave/ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53)
238@end smallexample
239
240@noindent
241As you see, mmm_mond tries to recover the status as well as possible. But in this situation it's wrong so one must move the writer role to db2 manually:
242
243@smallexample
244# mmm_control set_ip 192.168.0.50 db2
245OK: Set role 'writer(192.168.0.50)' to host 'db2'.
246@end smallexample
247
248@noindent
249Now take a look at the status, everything looks ok:
250
251@smallexample
252# mmm_control show
253# --- Monitor is in PASSIVE MODE ---
254# [...]
255 db1(192.168.0.31) master/ONLINE. Roles:
256 db2(192.168.0.32) master/ONLINE. Roles: writer(192.168.0.50), reader(192.168.0.51)
257 db3(192.168.0.33) slave/ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53)
258@end smallexample
259
260@noindent
261Finally switch the monitor into active mode, so that it will apply the roles:
262
263@smallexample
264# mmm_control set_active
265OK: Switched into active mode.
266
267# mmm_control show
268 db1(192.168.0.31) master/ONLINE. Roles: reader(192.168.0.51)
269 db2(192.168.0.32) master/ONLINE. Roles: writer(192.168.0.50)
270 db3(192.168.0.33) slave/ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53)
271@end smallexample
272
273@*
274@strong{Note:} The role @code{reader(192.168.0.51)} has been moved to db1, because @code{reader} is a @code{balanced} role.
275198
=== modified file 'doc/mmm_monitor.texi'
--- doc/mmm_monitor.texi 2010-02-05 17:27:32 +0000
+++ doc/mmm_monitor.texi 2010-03-09 10:25:26 +0000
@@ -108,7 +108,7 @@
108@end itemize108@end itemize
109109
110@noindent110@noindent
111If the network connection doesn't work during startup, mmm_mond will switch into passive mode (@pxref{Passive mode}).111If the network connection doesn't work during startup, mmm_mond will delay startup until it's available again.
112112
113113
114@node Flapping114@node Flapping
@@ -130,20 +130,22 @@
130If @var{auto_set_online} is > 0, flapping hosts will automatically be set to @code{ONLINE} 130If @var{auto_set_online} is > 0, flapping hosts will automatically be set to @code{ONLINE}
131after @var{flap_duration} seconds.131after @var{flap_duration} seconds.
132132
133@node Passive mode133@node Modes
134@section Passive mode134@section Modes
135@cindex passive mode135@cindex Modes
136136
137entered if no network connection during startup137@subsection Active mode
138entered if discrepancies are detected during startup138The monitor will remove roles from failed hosts and assign them to other hosts automatically.
139entered with set_passive139@subsection Manual mode
140140The monitor will only distribute @code{balanced} roles across the hosts, but will not remove roles from failed hosts automatically. You can remove roles from failed hosts manually with @code{move_role}.
141roles can be changed (unclean) with set_ip141@subsection Wait mode
142changed to active with set_active142Like @code{MANUAL} mode, but mode will be changed into @code{ACTIVE} mode when both master hosts are @code{ONLINE} or after @code{wait_for_other_master} seconds have elapsed.
143143@subsection Passive mode
144roles get never changed automatically144In passive mode the monitor doesn't change roles, update the status file nor send anything to agents.
145nothing is send to agents145In passive mode you can modify roles (unclean) with @code{set_ip} - the changes won't be applied until the monitor is switched to @code{ACTIVE} or @code{MANUAL} mode with @code{set_active} or @code{set_manual}.
146status file won't be updated146Passive mode will be entered if conflicting roles are detected during startup. You should then analyze the situation, fix the role information (if needed) and switch into @code{ACTIVE} or @code{MANUAL} mode.
147It also can be entered manually with @code{set_passive}.
148
147149
148@node Startup150@node Startup
149@section Startup151@section Startup
@@ -152,6 +154,7 @@
152@itemize154@itemize
153155
154@item Initial network check156@item Initial network check
157@item If network is down startup will be delayed until it's reachable again.
155@item Initial host checks158@item Initial host checks
156@item reads status information from ...159@item reads status information from ...
157@itemize @minus160@itemize @minus
@@ -159,7 +162,7 @@
159@item agents (agent info)162@item agents (agent info)
160@item hosts (system info)163@item hosts (system info)
161@end itemize164@end itemize
162@item If status information doesn't match or network is down @code{PASSIVE} mode will be entered.165and tries to figure out the cluster status.
163@end itemize166@end itemize
164167
165@node Role transition168@node Role transition
166169
=== modified file 'lib/Agent/Agent.pm'
--- lib/Agent/Agent.pm 2009-10-30 07:19:35 +0000
+++ lib/Agent/Agent.pm 2010-03-09 10:25:26 +0000
@@ -4,7 +4,9 @@
4use warnings FATAL => 'all';4use warnings FATAL => 'all';
5use English qw(EVAL_ERROR);5use English qw(EVAL_ERROR);
6use Algorithm::Diff;6use Algorithm::Diff;
7use DBI;
7use Class::Struct;8use Class::Struct;
9use Errno qw(EINTR);
8use Log::Log4perl qw(:easy);10use Log::Log4perl qw(:easy);
9use MMM::Common::Role;11use MMM::Common::Role;
10use MMM::Common::Socket;12use MMM::Common::Socket;
@@ -81,6 +83,7 @@
81 DEBUG "Received Command $cmd";83 DEBUG "Received Command $cmd";
82 my ($cmd_name, $version, $host, @params) = split('\|', $cmd, -1);84 my ($cmd_name, $version, $host, @params) = split('\|', $cmd, -1);
8385
86 return "ERROR: Invalid command '$cmd'!" unless (defined($host));
84 return "ERROR: Invalid hostname in command ($host)! My name is '" . $self->name . "'" if ($host ne $self->name);87 return "ERROR: Invalid hostname in command ($host)! My name is '" . $self->name . "'" if ($host ne $self->name);
85 88
86 if ($version > main::MMM_PROTOCOL_VERSION) {89 if ($version > main::MMM_PROTOCOL_VERSION) {
@@ -114,7 +117,23 @@
114sub cmd_get_system_status($) {117sub cmd_get_system_status($) {
115 my $self = shift;118 my $self = shift;
116119
117 # TODO maybe determine and send master info if we are a slave host.120 # determine master info
121 my $dsn = sprintf("DBI:mysql:host=%s;port=%s;mysql_connect_timeout=3", $self->ip, $self->mysql_port);
122 my $eintr = EINTR;
123 my $master_ip = '';
124
125 my $dbh;
126CONNECT: {
127 DEBUG "Connecting to mysql";
128 $dbh = DBI->connect($dsn, $self->mysql_user, $self->mysql_password, { PrintError => 0 });
129 unless ($dbh) {
130 redo CONNECT if ($DBI::err == 2003 && $DBI::errstr =~ /\($eintr\)/);
131 WARN "Couldn't connect to mysql. Can't determine current master host." . $DBI::err . " " . $DBI::errstr;
132 }
133}
134
135 my $slave_status = $dbh->selectrow_hashref('SHOW SLAVE STATUS');
136 $master_ip = $slave_status->{Master_Host} if (defined($slave_status));
118137
119 my @roles;138 my @roles;
120 foreach my $role (keys(%{$main::config->{role}})) {139 foreach my $role (keys(%{$main::config->{role}})) {
@@ -133,7 +152,7 @@
133 return "ERROR: Could not check if MySQL is writable: $res" if ($ret == 255);152 return "ERROR: Could not check if MySQL is writable: $res" if ($ret == 255);
134 my $writable = ($ret == 1);153 my $writable = ($ret == 1);
135154
136 my $answer = join('|', ($writable, join(',', @roles)));155 my $answer = join('|', ($writable, join(',', @roles), $master_ip));
137 return "OK: Returning status!|$answer";156 return "OK: Returning status!|$answer";
138}157}
139158
140159
=== modified file 'lib/Common/Config.pm'
--- lib/Common/Config.pm 2010-02-03 09:06:11 +0000
+++ lib/Common/Config.pm 2010-03-09 10:25:26 +0000
@@ -37,7 +37,10 @@
37 'flap_duration' => { 'default' => 60 * 60 },37 'flap_duration' => { 'default' => 60 * 60 },
38 'flap_count' => { 'default' => 3 },38 'flap_count' => { 'default' => 3 },
39 'auto_set_online' => { 'default' => 0 },39 'auto_set_online' => { 'default' => 0 },
40 'kill_host_bin' => { 'default' => 'kill_host' }40 'kill_host_bin' => { 'default' => 'kill_host' },
41 'careful_startup' => { 'default' => 1, 'boolean' => 1 },
42 'mode' => { 'default' => 'active', 'values' => ['passive', 'active', 'manual', 'wait'] },
43 'wait_for_other_master' => { 'default' => 120 }
41 }44 }
42 },45 },
43 'socket' => { 'create_if_empty' => ['AGENT', 'CONTROL', 'MONITOR'], 'section' => {46 'socket' => { 'create_if_empty' => ['AGENT', 'CONTROL', 'MONITOR'], 'section' => {
4447
=== modified file 'lib/Monitor/Checker.pm'
--- lib/Monitor/Checker.pm 2010-02-08 15:06:09 +0000
+++ lib/Monitor/Checker.pm 2010-03-09 10:25:26 +0000
@@ -184,7 +184,7 @@
184 my $self = shift;184 my $self = shift;
185 my $name = $self->{name};185 my $name = $self->{name};
186186
187 DEBUG "Pinging checker '$name'...";187# DEBUG "Pinging checker '$name'...";
188188
189 my $reader = $self->{reader};189 my $reader = $self->{reader};
190 my $writer = $self->{writer};190 my $writer = $self->{writer};
@@ -202,7 +202,7 @@
202 return 0;202 return 0;
203 }203 }
204204
205 DEBUG "Checker '$name' is OK ($recv_res)";205# DEBUG "Checker '$name' is OK ($recv_res)";
206 return 1;206 return 1;
207}207}
208208
209209
=== modified file 'lib/Monitor/Commands.pm'
--- lib/Monitor/Commands.pm 2010-03-03 00:34:21 +0000
+++ lib/Monitor/Commands.pm 2010-03-09 10:25:26 +0000
@@ -61,7 +61,7 @@
61 my $roles = MMM::Monitor::Roles->instance();61 my $roles = MMM::Monitor::Roles->instance();
6262
63 my $ret = '';63 my $ret = '';
64 if ($monitor->passive) {64 if ($monitor->is_passive) {
65 $ret .= "--- Monitor is in PASSIVE MODE ---\n";65 $ret .= "--- Monitor is in PASSIVE MODE ---\n";
66 $ret .= sprintf("Cause: %s\n", $monitor->passive_info);66 $ret .= sprintf("Cause: %s\n", $monitor->passive_info);
67 $ret =~ s/^/# /mg;67 $ret =~ s/^/# /mg;
@@ -193,7 +193,7 @@
193193
194 FATAL "Admin changed state of '$host' from $host_state to ADMIN_OFFLINE";194 FATAL "Admin changed state of '$host' from $host_state to ADMIN_OFFLINE";
195 $agents->set_state($host, 'ADMIN_OFFLINE');195 $agents->set_state($host, 'ADMIN_OFFLINE');
196 MMM::Monitor::Roles->instance()->clear_host_roles($host);196 MMM::Monitor::Roles->instance()->clear_roles($host);
197 MMM::Monitor::Monitor->instance()->send_agent_status($host);197 MMM::Monitor::Monitor->instance()->send_agent_status($host);
198198
199 return "OK: State of '$host' changed to ADMIN_OFFLINE. Now you can wait some time and check all roles!";199 return "OK: State of '$host' changed to ADMIN_OFFLINE. Now you can wait some time and check all roles!";
@@ -203,7 +203,7 @@
203 my $ip = shift;203 my $ip = shift;
204 my $host = shift;204 my $host = shift;
205205
206 return "ERROR: This command is only allowed in passive mode" unless (MMM::Monitor::Monitor->instance()->passive);206 return "ERROR: This command is only allowed in passive mode" unless (MMM::Monitor::Monitor->instance()->is_passive);
207207
208 my $agents = MMM::Monitor::Agents->instance();208 my $agents = MMM::Monitor::Agents->instance();
209 my $roles = MMM::Monitor::Roles->instance();209 my $roles = MMM::Monitor::Roles->instance();
@@ -239,14 +239,19 @@
239 my $role = shift;239 my $role = shift;
240 my $host = shift;240 my $host = shift;
241 241
242 return "ERROR: This command is only allowed in active mode" if (MMM::Monitor::Monitor->instance()->passive);242 my $monitor = MMM::Monitor::Monitor->instance();
243 return "ERROR: This command is not allowed in passive mode" if ($monitor->is_passive);
243244
244 my $agents = MMM::Monitor::Agents->instance();245 my $agents = MMM::Monitor::Agents->instance();
245 my $roles = MMM::Monitor::Roles->instance();246 my $roles = MMM::Monitor::Roles->instance();
246247
247 return "ERROR: Unknown role name '$role'!" unless ($roles->exists($role));248 return "ERROR: Unknown role name '$role'!" unless ($roles->exists($role));
248 return "ERROR: Unknown host name '$host'!" unless ($agents->exists($host));249 return "ERROR: Unknown host name '$host'!" unless ($agents->exists($host));
249 return "ERROR: move_role may be used for exclusive roles only!" unless ($roles->is_exclusive($role));250
251 unless ($roles->is_exclusive($role)) {
252 $roles->clear_balanced_role($host, $role);
253 return "OK: Balanced role $role has been removed from host '$host'. Now you can wait some time and check new roles info!";
254 }
250255
251 my $host_state = $agents->state($host);256 my $host_state = $agents->state($host);
252 return "ERROR: Can't move role to host with state $host_state." unless ($host_state eq 'ONLINE');257 return "ERROR: Can't move role to host with state $host_state." unless ($host_state eq 'ONLINE');
@@ -261,7 +266,9 @@
261 my $agent = MMM::Monitor::Agents->instance()->get($host);266 my $agent = MMM::Monitor::Agents->instance()->get($host);
262 return "ERROR: Can't reach agent daemon on '$host'! Can't move roles there!" unless ($agent->cmd_ping());267 return "ERROR: Can't reach agent daemon on '$host'! Can't move roles there!" unless ($agent->cmd_ping());
263268
264 return "ERROR: Role '$role' is assigned to preferred host '$old_owner'. Can't move it!" if ($roles->assigned_to_preferred_host($role));269 if ($monitor->is_active && $roles->assigned_to_preferred_host($role)) {
270 return "ERROR: Role '$role' is assigned to preferred host '$old_owner'. Can't move it!";
271 }
265272
266 my $ip = $roles->get_exclusive_role_ip($role);273 my $ip = $roles->get_exclusive_role_ip($role);
267 return "Error: Role $role has no IP." unless ($ip);274 return "Error: Role $role has no IP." unless ($ip);
@@ -272,13 +279,13 @@
272 $roles->set_role($role, $ip, $host);279 $roles->set_role($role, $ip, $host);
273280
274 # Notify old host (if is_active_master_role($role) this will make the host non writable)281 # Notify old host (if is_active_master_role($role) this will make the host non writable)
275 MMM::Monitor::Monitor->instance()->send_agent_status($old_owner);282 $monitor->send_agent_status($old_owner);
276283
277 # Notify slaves (this will make them switch the master)284 # Notify slaves (this will make them switch the master)
278 MMM::Monitor::Monitor->instance()->notify_slaves($host) if ($roles->is_active_master_role($role));285 $monitor->notify_slaves($host) if ($roles->is_active_master_role($role));
279286
280 # Notify new host (if is_active_master_role($role) this will make the host writable)287 # Notify new host (if is_active_master_role($role) this will make the host writable)
281 MMM::Monitor::Monitor->instance()->send_agent_status($host);288 $monitor->send_agent_status($host);
282 289
283 return "OK: Role '$role' has been moved from '$old_owner' to '$host'. Now you can wait some time and check new roles info!";290 return "OK: Role '$role' has been moved from '$old_owner' to '$host'. Now you can wait some time and check new roles info!";
284 291
@@ -288,7 +295,8 @@
288 my $role = shift;295 my $role = shift;
289 my $host = shift;296 my $host = shift;
290 297
291 return "ERROR: This command is only allowed in active mode" if (MMM::Monitor::Monitor->instance()->passive);298 my $monitor = MMM::Monitor::Monitor->instance();
299 return "ERROR: This command is not allowed in passive mode" if (MMM::Monitor::Monitor->instance()->is_passive);
292300
293 my $agents = MMM::Monitor::Agents->instance();301 my $agents = MMM::Monitor::Agents->instance();
294 my $roles = MMM::Monitor::Roles->instance();302 my $roles = MMM::Monitor::Roles->instance();
@@ -328,12 +336,12 @@
328 if (!$checks->rep_threads($old_owner)) {336 if (!$checks->rep_threads($old_owner)) {
329 FATAL "State of host '$old_owner' changed from ONLINE to REPLICATION_FAIL (because of move_role --force)";337 FATAL "State of host '$old_owner' changed from ONLINE to REPLICATION_FAIL (because of move_role --force)";
330 $old_agent->state('REPLICATION_FAIL');338 $old_agent->state('REPLICATION_FAIL');
331 $roles->clear_host_roles($old_owner);339 $roles->clear_roles($old_owner) if ($monitor->is_active);
332 }340 }
333 elsif (!$checks->rep_backlog($old_owner)) {341 elsif (!$checks->rep_backlog($old_owner)) {
334 FATAL "State of host '$old_owner' changed from ONLINE to REPLICATION_BACKLOG (because of move_role --force)";342 FATAL "State of host '$old_owner' changed from ONLINE to REPLICATION_BACKLOG (because of move_role --force)";
335 $old_agent->state('REPLICATION_BACKLOG');343 $old_agent->state('REPLICATION_BACKLOG');
336 $roles->clear_host_roles($old_owner);344 $roles->clear_roles($old_owner) if ($monitor->is_active);
337 }345 }
338346
339 # Notify old host (this will make the host non writable)347 # Notify old host (this will make the host non writable)
@@ -352,13 +360,13 @@
352360
353=item mode361=item mode
354362
355Get information about current mode (active or passive)363Get information about current mode (active, manual or passive)
356364
357=cut365=cut
358366
359sub mode() {367sub mode() {
360 return 'PASSIVE' if (MMM::Monitor::Monitor->instance()->passive);368 my $monitor = MMM::Monitor::Monitor->instance();
361 return 'ACTIVE';369 return $monitor->get_mode_string();
362}370}
363371
364372
@@ -369,26 +377,69 @@
369=cut377=cut
370378
371sub set_active() {379sub set_active() {
372 return 'OK: Already in active mode.' unless (MMM::Monitor::Monitor->instance()->passive);380 my $monitor = MMM::Monitor::Monitor->instance();
373381
374382 return 'OK: Already in active mode.' if ($monitor->is_active);
375 # Send status to agents383
376 MMM::Monitor::Monitor->instance()->send_status_to_agents();384 my $old_mode = $monitor->get_mode_string();
377385 INFO "Admin changed mode from '$old_mode' to 'ACTIVE'";
378 # Clear 'bad' roles386
379 my $agents = MMM::Monitor::Agents->instance();387 if ($monitor->is_passive) {
380 foreach my $host (keys(%{$main::config->{host}})) {388 $monitor->set_active(); # so that we can send status to agents
381 my $agent = $agents->get($host);389 $monitor->cleanup_and_send_status();
382 $agent->cmd_clear_bad_roles(); # TODO check result390 $monitor->passive_info('');
383 }391 }
384392 elsif ($monitor->is_manual) {
385393 # remove all roles from hosts which are not ONLINE
386 MMM::Monitor::Monitor->instance()->passive(0);394 my $roles = MMM::Monitor::Roles->instance();
387 MMM::Monitor::Monitor->instance()->passive_info('');395 my $agents = MMM::Monitor::Agents->instance();
396 my $checks = MMM::Monitor::ChecksStatus->instance();
397 foreach my $host (keys(%{$main::config->{host}})) {
398 my $host_state = $agents->state($host);
399 next if ($host_state eq 'ONLINE' || $roles->get_host_roles($host) == 0);
400 my $agent = $agents->get($host);
401 $roles->clear_roles($host);
402 my $ret = $monitor->send_agent_status($host);
403# next if ($host_state eq 'REPLICATION_FAIL');
404# next if ($host_state eq 'REPLICATION_BACKLOG');
405 # NOTE host_state should never be ADMIN_OFFLINE at this point
406 if (!$ret) {
407 ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host);
408 $monitor->_kill_host($host, $checks->ping($host));
409 }
410 }
411 }
412
413 $monitor->set_active();
388 return 'OK: Switched into active mode.';414 return 'OK: Switched into active mode.';
389}415}
390416
391417
418=item set_manual
419
420Switch to manual mode.
421
422=cut
423
424sub set_manual() {
425 my $monitor = MMM::Monitor::Monitor->instance();
426
427 return 'OK: Already in manual mode.' if ($monitor->is_manual);
428
429 my $old_mode = $monitor->get_mode_string();
430 INFO "Admin changed mode from '$old_mode' to 'MANUAL'";
431
432 if ($monitor->is_passive) {
433 $monitor->set_manual(); # so that we can send status to agents
434 $monitor->cleanup_and_send_status();
435 $monitor->passive_info('');
436 }
437
438 $monitor->set_manual();
439 return 'OK: Switched into manual mode.';
440}
441
442
392=item set_passive443=item set_passive
393444
394Switch to passive mode.445Switch to passive mode.
@@ -396,10 +447,15 @@
396=cut447=cut
397448
398sub set_passive() {449sub set_passive() {
399 return 'OK: Already in passive mode.' if (MMM::Monitor::Monitor->instance()->passive);450 my $monitor = MMM::Monitor::Monitor->instance();
400451
401 MMM::Monitor::Monitor->instance()->passive(1);452 return 'OK: Already in passive mode.' if ($monitor->is_passive);
402 MMM::Monitor::Monitor->instance()->passive_info('Admin switched to passive mode.');453
454 my $old_mode = $monitor->get_mode_string();
455 INFO "Admin changed mode from '$old_mode' to 'PASSIVE'";
456
457 $monitor->set_passive();
458 $monitor->passive_info('Admin switched to passive mode.');
403 return 'OK: Switched into passive mode.';459 return 'OK: Switched into passive mode.';
404}460}
405461
@@ -413,6 +469,7 @@
413 set_offline <host> - set host <host> offline469 set_offline <host> - set host <host> offline
414 mode - print current mode.470 mode - print current mode.
415 set_active - switch into active mode.471 set_active - switch into active mode.
472 set_manual - switch into manual mode.
416 set_passive - switch into passive mode.473 set_passive - switch into passive mode.
417 move_role [--force] <role> <host> - move exclusive role <role> to host <host>474 move_role [--force] <role> <host> - move exclusive role <role> to host <host>
418 (Only use --force if you know what you are doing!)475 (Only use --force if you know what you are doing!)
419476
=== modified file 'lib/Monitor/Monitor.pm'
--- lib/Monitor/Monitor.pm 2010-02-11 01:05:09 +0000
+++ lib/Monitor/Monitor.pm 2010-03-09 10:25:26 +0000
@@ -19,6 +19,7 @@
19use MMM::Monitor::NetworkChecker;19use MMM::Monitor::NetworkChecker;
20use MMM::Monitor::Role;20use MMM::Monitor::Role;
21use MMM::Monitor::Roles;21use MMM::Monitor::Roles;
22use MMM::Monitor::StartupStatus;
2223
23=head1 NAME24=head1 NAME
2425
@@ -28,6 +29,11 @@
2829
29our $VERSION = '0.01';30our $VERSION = '0.01';
3031
32use constant MMM_MONITOR_MODE_PASSIVE => 0;
33use constant MMM_MONITOR_MODE_ACTIVE => 1;
34use constant MMM_MONITOR_MODE_MANUAL => 2;
35use constant MMM_MONITOR_MODE_WAIT => 3;
36
31use Class::Struct;37use Class::Struct;
3238
33sub instance() {39sub instance() {
@@ -40,12 +46,13 @@
40 command_queue => 'Thread::Queue',46 command_queue => 'Thread::Queue',
41 result_queue => 'Thread::Queue',47 result_queue => 'Thread::Queue',
42 roles => 'MMM::Monitor::Roles',48 roles => 'MMM::Monitor::Roles',
43 passive => '$',49 mode => '$',
44 passive_info => '$',50 passive_info => '$',
45 kill_host_bin => '$'51 kill_host_bin => '$'
46};52};
4753
4854
55
49=head1 FUNCTIONS56=head1 FUNCTIONS
5057
51=over 458=over 4
@@ -59,6 +66,24 @@
59sub init($) {66sub init($) {
60 my $self = shift;67 my $self = shift;
6168
69 #___________________________________________________________________________
70 #
71 # Wait until network connection is available
72 #___________________________________________________________________________
73
74 INFO "Waiting for network connection...";
75 unless (MMM::Monitor::NetworkChecker->wait_for_network()) {
76 INFO "Received shutdown request while waiting for network connection.";
77 return 0;
78 }
79 INFO "Network connection is available.";
80
81
82 #___________________________________________________________________________
83 #
84 # Create thread queues and other stuff...
85 #___________________________________________________________________________
86
62 my $agents = MMM::Monitor::Agents->instance();87 my $agents = MMM::Monitor::Agents->instance();
6388
64 $self->checker_queue(new Thread::Queue::);89 $self->checker_queue(new Thread::Queue::);
@@ -68,6 +93,23 @@
68 $self->roles(MMM::Monitor::Roles->instance());93 $self->roles(MMM::Monitor::Roles->instance());
69 $self->passive_info('');94 $self->passive_info('');
7095
96 if ($main::config->{monitor}->{mode} eq 'active') {
97 $self->mode(MMM_MONITOR_MODE_ACTIVE);
98 }
99 elsif ($main::config->{monitor}->{mode} eq 'manual') {
100 $self->mode(MMM_MONITOR_MODE_MANUAL);
101 }
102 elsif ($main::config->{monitor}->{mode} eq 'wait') {
103 $self->mode(MMM_MONITOR_MODE_WAIT);
104 }
105 elsif ($main::config->{monitor}->{mode} eq 'passive') {
106 $self->mode(MMM_MONITOR_MODE_PASSIVE);
107 $self->passive_info('Configured to start up in passive mode.');
108 }
109 else {
110 LOGDIE "Something very, very strange just happend - dieing..."
111 }
112
71113
72 #___________________________________________________________________________114 #___________________________________________________________________________
73 #115 #
@@ -89,14 +131,6 @@
89131
90 my $checks = $self->checks_status;132 my $checks = $self->checks_status;
91133
92 #___________________________________________________________________________
93 #
94 # Go into passive mode if we have no network connection at startup
95 #___________________________________________________________________________
96
97 $self->passive(!$main::have_net);
98 $self->passive_info('No network connection during startup.') unless ($main::have_net);
99
100 134
101 #___________________________________________________________________________135 #___________________________________________________________________________
102 #136 #
@@ -108,21 +142,21 @@
108142
109 #___________________________________________________________________________143 #___________________________________________________________________________
110 #144 #
111 # Figure out current status. Go into passive mode if there are discrepancies145 # Fetch stored status, agent status and system status
112 #___________________________________________________________________________146 #___________________________________________________________________________
113147
114 $agents->load_status();148 $agents->load_status(); # load stored status
115149
116 my $system_status = {};150
117 my $agent_status = {};151 my $startup_status = new MMM::Monitor::StartupStatus;
118 my $status = 1;152
119 my $res;153 my $res;
120154
121 foreach my $host (keys(%{$main::config->{host}})) {155 foreach my $host (keys(%{$main::config->{host}})) {
122156
123 my $agent = $agents->get($host);157 my $agent = $agents->get($host);
124 my $host_status = 1;
125158
159 $startup_status->set_stored_status($host, $agent->state, $agent->roles);
126160
127 #_______________________________________________________________________161 #_______________________________________________________________________
128 #162 #
@@ -132,28 +166,23 @@
132 $res = $agent->cmd_get_agent_status(2);166 $res = $agent->cmd_get_agent_status(2);
133167
134 if ($res =~ /^OK/) {168 if ($res =~ /^OK/) {
135
136 my ($msg, $state, $roles_str, $master) = split('\|', $res);169 my ($msg, $state, $roles_str, $master) = split('\|', $res);
137 my @roles_str_arr = sort(split(/\,/, $roles_str));170 my @roles_str_arr = sort(split(/\,/, $roles_str));
138 my @roles;171 my @roles;
139172
140 foreach my $role_str (@roles_str_arr) {173 foreach my $role_str (@roles_str_arr) {
141 my $role = MMM::Monitor::Role->from_string($role_str);174 my $role = MMM::Monitor::Role->from_string($role_str);
142 if (defined($role)) {175 push(@roles, $role) if (defined($role));
143 push @roles, $role;
144 }
145 }176 }
146177
147 $agent_status->{$host} = { state => $state, roles => \@roles, master => $master };178 $startup_status->set_agent_status($host, $state, \@roles, $master);
148 }179 }
149 elsif ($agent->state ne 'ADMIN_OFFLINE') {180 elsif ($agent->state ne 'ADMIN_OFFLINE') {
150 if ($checks->ping($host) && $checks->mysql($host) && !$agent->agent_down()) {181 if ($checks->ping($host) && $checks->mysql($host) && !$agent->agent_down()) {
151 ERROR "Can't reach agent on host '$host'";182 ERROR "Can't reach agent on host '$host'";
152 $agent->agent_down(1);183 $agent->agent_down(1);
153 }184 }
154 ERROR "Switching to passive mode: The status of the agent on host '$host' could not be determined (answer was: $res).";185 ERROR "The status of the agent on host '$host' could not be determined (answer was: $res).";
155 $status = 0;
156 $host_status = 0;
157 }186 }
158 187
159188
@@ -163,180 +192,61 @@
163 #_______________________________________________________________________192 #_______________________________________________________________________
164193
165 $res = $agent->cmd_get_system_status(2);194 $res = $agent->cmd_get_system_status(2);
195
166 if ($res =~ /^OK/) {196 if ($res =~ /^OK/) {
167 my ($msg, $writable, $roles_str) = split('\|', $res);197 my ($msg, $writable, $roles_str, $master_ip) = split('\|', $res);
168 my @roles_str_arr = sort(split(/\,/, $roles_str));198 my @roles_str_arr = sort(split(/\,/, $roles_str));
169 my @roles;199 my @roles;
200
170 foreach my $role_str (@roles_str_arr) {201 foreach my $role_str (@roles_str_arr) {
171 my $role = MMM::Monitor::Role->from_string($role_str);202 my $role = MMM::Monitor::Role->from_string($role_str);
172 if (defined($role)) {203 push(@roles, $role) if (defined($role));
173 push @roles, $role;204 }
205
206 my $master = '';
207 if (defined($master_ip)) {
208 foreach my $a_host (keys(%{$main::config->{host}})) {
209 $master = $a_host if ($main::config->{host}->{$a_host}->{ip} eq $master_ip);
174 }210 }
175 }211 }
176 $system_status->{$host} = {212 $startup_status->set_system_status($host, $writable, \@roles, $master);
177 writable => $writable,
178 roles => \@roles
179 };
180 }213 }
181 elsif ($agent->state ne 'ADMIN_OFFLINE') {214 elsif ($agent->state ne 'ADMIN_OFFLINE') {
182 if ($checks->ping($host) && $checks->mysql($host) && !$agent->agent_down()) {215 if ($checks->ping($host) && $checks->mysql($host) && !$agent->agent_down()) {
183 ERROR "Can't reach agent on host '$host'";216 ERROR "Can't reach agent on host '$host'";
184 $agent->agent_down(1);217 $agent->agent_down(1);
185 }218 }
186 ERROR "Switching to passive mode: The status of the system '$host' could not be determined (answer was: $res).";219 ERROR "The status of the system '$host' could not be determined (answer was: $res).";
187 $status = 0;220 }
188 $host_status = 0;221 }
189222
190 }223 my $conflict = $startup_status->determine_status();
191224
192225 DEBUG "STATE INFO\n", Data::Dumper->Dump([$startup_status], ['Startup status']);
193 #_______________________________________________________________________226 INFO $startup_status->to_string();
194 #227
195 # Skip comparison, if we coult not fetch AGENT/SYSTEM status228 foreach my $host (keys(%{$startup_status->{result}})) {
196 #_______________________________________________________________________
197
198 next unless (defined($agent_status->{$host}));
199 next unless (defined($system_status->{$host}));
200
201
202 #_______________________________________________________________________
203 #
204 # Compare agent and system status ...
205 #_______________________________________________________________________
206
207 if ($agent_status->{$host}->{state} ne 'UNKNOWN' && $agent_status->{$host}->{state} ne $agent->state) {
208 ERROR "Switching to passive mode: Agent state '", $agent_status->{$host}->{state}, "' differs from stored one '", $agent->state, "' for host '$host'.";
209 $status = 0;
210 $host_status = 0;
211 next;
212 }
213
214
215 #_______________________________________________________________________
216 #
217 # ... determine if roles differ
218 #_______________________________________________________________________
219
220 my $changes = 0;
221 my $diff = new Algorithm::Diff:: (
222 $system_status->{$host}->{roles},
223 $agent->roles,
224 { keyGen => \&MMM::Common::Role::to_string }
225 );
226
227 while ($diff->Next) {
228 next if ($diff->Same);
229
230 ERROR sprintf(
231 "Switching to passive mode: Roles of host '$host' [%s] differ from stored ones [%s]",
232 join(', ', @{$system_status->{$host}->{roles}}),
233 join(', ', @{$agent->roles})
234 );
235 $status = 0;
236 $host_status = 0;
237 last;
238 }
239
240 next unless ($host_status);
241 foreach my $role (@{$agent->roles}) {
242 next unless ($self->roles->is_active_master_role($role->name));
243 next if ($system_status->{$host}->{writable});
244 WARN "Active master $host was not writable at monitor startup. (Don't mind, the host will be made writable soon)"
245 }
246
247 }
248
249 DEBUG "STATE INFO\n", Data::Dumper->Dump([$agents, $agent_status, $system_status], ['Stored status', 'Agent status', 'System status']);
250
251
252 #___________________________________________________________________________
253 #
254 # Maybe switch into passive mode?
255 #___________________________________________________________________________
256
257 unless ($status) {
258 # Enter PASSIVE MODE
259 $self->passive(1);
260 my $agent_status_str = '';
261 foreach my $host (sort(keys(%{$agent_status}))) {
262 $agent_status_str .= sprintf(
263 " %s %s. Roles: %s. Master: %s\n",
264 $host,
265 $agent_status->{$host}->{state},
266 scalar(@{$agent_status->{$host}->{roles}}) > 0 ? join(', ', sort(@{$agent_status->{$host}->{roles}})) : 'none',
267 $agent_status->{$host}->{master} ? $agent_status->{$host}->{master} : '?'
268 );
269 }
270 my $system_status_str = '';
271 foreach my $host (sort(keys(%{$system_status}))) {
272 $system_status_str .= sprintf(
273 " %s %s. Roles: %s\n",
274 $host,
275 $system_status->{$host}->{writable} ? 'writable' : 'readonly',
276 scalar(@{$system_status->{$host}->{roles}}) > 0 ? join(', ', sort(@{$system_status->{$host}->{roles}})) : 'none'
277 );
278 }
279 my $status_str = sprintf("\nStored status:\n%s\nAgent status:\n%s\nSystem status:\n%s", $agents->get_status_info(), $agent_status_str, $system_status_str);
280 $self->passive_info("Discrepancies between stored status, agent status and system status during startup.\n" . $status_str);
281 FATAL "Switching to passive mode now. See output of 'mmm_control show' for details.";
282 INFO $status_str;
283
284 foreach my $host (keys(%{$main::config->{host}})) {
285 my $agent = $agents->get($host);
286
287 # Set all unknown hosts to AWAITING_RECOVERY
288 $agent->state('AWAITING_RECOVERY') if ($agent->state eq 'UNKNOWN');
289
290 next unless ($system_status->{$host});
291 next unless (scalar(@{$system_status->{$host}->{roles}}));
292 # Set status restored from agent systems
293 $agent->state('ONLINE');
294 foreach my $role (@{$system_status->{$host}->{roles}}) {
295 next unless ($self->roles->exists_ip($role->name, $role->ip));
296 next unless ($self->roles->can_handle($role->name, $host));
297 $self->roles->set_role($role->name, $role->ip, $host);
298 }
299 }
300
301 # propagate roles to agent objects
302 foreach my $host (keys(%{$main::config->{host}})) {
303 my $agent = $agents->get($host);
304 my @roles = sort($self->roles->get_host_roles($host));
305 $agent->roles(\@roles);
306 }
307
308 WARN "Monitor started in passive mode.";
309
310 return;
311 }
312
313 # Stay in ACTIVE MODE
314 # Everything is okay, apply roles from status file.
315 foreach my $host (keys(%{$main::config->{host}})) {
316 my $agent = $agents->get($host);229 my $agent = $agents->get($host);
317230 $agent->state($startup_status->{result}->{$host}->{state});
318 # Set new hosts to AWAITING_RECOVERY231 foreach my $role (@{$startup_status->{result}->{$host}->{roles}}) {
319 if ($agent->state eq 'UNKNOWN') {
320 WARN "Detected new host '$host': Setting its initial state to 'AWAITING_RECOVERY'. Use 'mmm_control set_online $host' to switch it online.";
321 $agent->state('AWAITING_RECOVERY');
322 }
323
324 # Apply roles loaded from status file
325 foreach my $role (@{$agent->roles}) {
326 unless ($self->roles->exists_ip($role->name, $role->ip)) {
327 WARN "Detected change in role definitions: Role '$role' was removed.";
328 next;
329 }
330 unless ($self->roles->can_handle($role->name, $host)) {
331 WARN "Detected change in role definitions: Host '$host' can't handle role '$role' anymore.";
332 next;
333 }
334 $self->roles->set_role($role->name, $role->ip, $host);232 $self->roles->set_role($role->name, $role->ip, $host);
335 }233 }
336 }234 }
337235
338 INFO "Monitor started in active mode." unless ($self->passive);236 if ($conflict && $main::config->{monitor}->{careful_startup}) {
339 WARN "Monitor started in passive mode." if ($self->passive);237 $self->set_passive();
238 $self->passive_info("Conflicting roles during startup:\n\n" . $startup_status->to_string());
239 }
240 elsif (!$self->is_passive) {
241 $self->cleanup_and_send_status();
242 }
243
244 INFO "Monitor started in active mode." if ($self->mode == MMM_MONITOR_MODE_ACTIVE);
245 INFO "Monitor started in manual mode." if ($self->mode == MMM_MONITOR_MODE_MANUAL);
246 INFO "Monitor started in wait mode." if ($self->mode == MMM_MONITOR_MODE_WAIT);
247 INFO "Monitor started in passive mode." if ($self->mode == MMM_MONITOR_MODE_PASSIVE);
248
249 return 1;
340}250}
341251
342sub check_master_configuration($) {252sub check_master_configuration($) {
@@ -507,7 +417,7 @@
507417
508 foreach my $host (keys(%{$main::config->{host}})) {418 foreach my $host (keys(%{$main::config->{host}})) {
509419
510 $agents->save_status() unless ($self->passive);420 $agents->save_status() unless ($self->is_passive);
511421
512 my $agent = $agents->get($host);422 my $agent = $agents->get($host);
513 my $state = $agent->state;423 my $state = $agent->state;
@@ -539,7 +449,8 @@
539 unless ($ping && $mysql) {449 unless ($ping && $mysql) {
540 FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE (ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql? 'OK' : 'not OK'));450 FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE (ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql? 'OK' : 'not OK'));
541 $agent->state('HARD_OFFLINE');451 $agent->state('HARD_OFFLINE');
542 $self->roles->clear_host_roles($host);452 next if ($self->is_manual);
453 $self->roles->clear_roles($host);
543 if (!$self->send_agent_status($host)) {454 if (!$self->send_agent_status($host)) {
544 ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host);455 ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host);
545 $self->_kill_host($host, $checks->ping($host));456 $self->_kill_host($host, $checks->ping($host));
@@ -557,8 +468,12 @@
557 if ($ping && $mysql && !$rep_threads && $peer_state eq 'ONLINE' && $checks->ping($peer) && $checks->mysql($peer)) {468 if ($ping && $mysql && !$rep_threads && $peer_state eq 'ONLINE' && $checks->ping($peer) && $checks->mysql($peer)) {
558 FATAL "State of host '$host' changed from $state to REPLICATION_FAIL";469 FATAL "State of host '$host' changed from $state to REPLICATION_FAIL";
559 $agent->state('REPLICATION_FAIL');470 $agent->state('REPLICATION_FAIL');
560 $self->roles->clear_host_roles($host);471 next if ($self->is_manual);
561 $self->send_agent_status($host);472 $self->roles->clear_roles($host);
473 if (!$self->send_agent_status($host)) {
474 ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host);
475 $self->_kill_host($host, $checks->ping($host));
476 }
562 next;477 next;
563 }478 }
564479
@@ -566,8 +481,12 @@
566 if ($ping && $mysql && !$rep_backlog && $rep_threads && $peer_state eq 'ONLINE' && $checks->ping($peer) && $checks->mysql($peer)) {481 if ($ping && $mysql && !$rep_backlog && $rep_threads && $peer_state eq 'ONLINE' && $checks->ping($peer) && $checks->mysql($peer)) {
567 FATAL "State of host '$host' changed from $state to REPLICATION_DELAY";482 FATAL "State of host '$host' changed from $state to REPLICATION_DELAY";
568 $agent->state('REPLICATION_DELAY');483 $agent->state('REPLICATION_DELAY');
569 $self->roles->clear_host_roles($host);484 next if ($self->is_manual);
570 $self->send_agent_status($host);485 $self->roles->clear_roles($host);
486 if (!$self->send_agent_status($host)) {
487 ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host);
488 $self->_kill_host($host, $checks->ping($host));
489 }
571 next;490 next;
572 }491 }
573 next;492 next;
@@ -711,7 +630,47 @@
711 next;630 next;
712 }631 }
713 }632 }
714 $agents->save_status() unless ($self->passive);633
634 if ($self->mode == MMM_MONITOR_MODE_WAIT) {
635 my $master_one = $self->roles->get_first_master();
636 my $master_two = $self->roles->get_second_master();
637 my $state_one = $agents->state($master_one);
638 my $state_two = $agents->state($master_two);
639
640 if ($state_one eq 'ONLINE' && $state_two eq 'ONLINE') {
641 INFO "Nodes $master_one and $master_two are ONLINE, switching from mode 'WAIT' to 'ACTIVE'.";
642 $self->set_active();
643 }
644 elsif ($main::config->{monitor}->{wait_for_other_master} > 0 && ($state_one eq 'ONLINE' || $state_two eq 'ONLINE')) {
645 my $living_master = $state_one eq 'ONLINE' ? $master_one : $master_two;
646 my $dead_master = $state_one eq 'ONLINE' ? $master_two : $master_one;
647
648 if ($main::config->{monitor}->{wait_for_other_master} <= time() - $agents->online_since($living_master)) {
649 $self->set_active();
650 WARN sprintf("Master $dead_master did not come online for %d(wait_for_other_master) seconds. Switching from mode 'WAIT' to 'ACTIVE'", $main::config->{monitor}->{wait_for_other_master});
651 }
652
653 }
654 if ($self->is_active) {
655 # cleanup
656 foreach my $host (keys(%{$main::config->{host}})) {
657 my $host_state = $agents->state($host);
658 next if ($host_state eq 'ONLINE' || $self->roles->get_host_roles($host) == 0);
659 my $agent = $agents->get($host);
660 $self->roles->clear_roles($host);
661 my $ret = $self->send_agent_status($host);
662# next if ($host_state eq 'REPLICATION_FAIL');
663# next if ($host_state eq 'REPLICATION_BACKLOG');
664 # NOTE host_state should never be ADMIN_OFFLINE at this point
665 if (!$ret) {
666 ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host);
667 $self->_kill_host($host, $checks->ping($host));
668 }
669 }
670 }
671 }
672
673 $agents->save_status() unless ($self->is_passive);
715}674}
716675
717676
@@ -725,7 +684,7 @@
725 my $self = shift;684 my $self = shift;
726685
727 # Never change roles if we are in PASSIVE mode686 # Never change roles if we are in PASSIVE mode
728 return if ($self->passive);687 return if ($self->is_passive);
729688
730 my $old_active_master = $self->roles->get_active_master();689 my $old_active_master = $self->roles->get_active_master();
731 690
@@ -734,7 +693,7 @@
734 $self->roles->process_orphans('balanced');693 $self->roles->process_orphans('balanced');
735694
736 # obey preferences695 # obey preferences
737 $self->roles->obey_preferences();696 $self->roles->obey_preferences() if ($self->is_active);
738697
739 # Balance roles698 # Balance roles
740 $self->roles->balance();699 $self->roles->balance();
@@ -749,6 +708,46 @@
749}708}
750709
751710
711=item cleanup_and_send_status()
712
713Send status information to all agents and clean up old roles.
714
715=cut
716sub cleanup_and_send_status($) {
717 my $self = shift;
718
719 my $agents = MMM::Monitor::Agents->instance();
720 my $roles = MMM::Monitor::Roles->instance();
721
722 my $active_master = $roles->get_active_master();
723 my $passive_master = $roles->get_passive_master();
724
725 # Notify passive master first
726 if ($passive_master ne '') {
727 my $host = $passive_master;
728 $self->send_agent_status($host);
729 my $agent = $agents->get($host);
730 $agent->cmd_clear_bad_roles(); # TODO check result
731 }
732
733 # Notify all slave hosts
734 foreach my $host (keys(%{$main::config->{host}})) {
735 next if ($self->roles->is_master($host));
736 $self->send_agent_status($host);
737 my $agent = $agents->get($host);
738 $agent->cmd_clear_bad_roles(); # TODO check result
739 }
740
741 # Notify active master at the end
742 if ($active_master ne '') {
743 my $host = $active_master;
744 $self->send_agent_status($host);
745 my $agent = $agents->get($host);
746 $agent->cmd_clear_bad_roles(); # TODO check result
747 }
748}
749
750
752=item send_status_to_agents751=item send_status_to_agents
753752
754Send status information to all agents.753Send status information to all agents.
@@ -797,7 +796,7 @@
797796
798 # Never send anything to agents if we are in PASSIVE mode797 # Never send anything to agents if we are in PASSIVE mode
799 # Never send anything to agents if we have no network connection798 # Never send anything to agents if we have no network connection
800 return if ($self->passive || !$main::have_net);799 return if ($self->is_passive || !$main::have_net);
801800
802 # Determine active master if it was not passed801 # Determine active master if it was not passed
803 $master = $self->roles->get_active_master() unless (defined($master));802 $master = $self->roles->get_active_master() unless (defined($master));
@@ -903,6 +902,7 @@
903 elsif ($command eq 'mode' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::mode(); }902 elsif ($command eq 'mode' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::mode(); }
904 elsif ($command eq 'set_active' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::set_active(); }903 elsif ($command eq 'set_active' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::set_active(); }
905 elsif ($command eq 'set_passive' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::set_passive(); }904 elsif ($command eq 'set_passive' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::set_passive(); }
905 elsif ($command eq 'set_manual' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::set_manual(); }
906 elsif ($command eq 'set_online' && $arg_cnt == 1) { $res = MMM::Monitor::Commands::set_online ($args[0]); }906 elsif ($command eq 'set_online' && $arg_cnt == 1) { $res = MMM::Monitor::Commands::set_online ($args[0]); }
907 elsif ($command eq 'set_offline' && $arg_cnt == 1) { $res = MMM::Monitor::Commands::set_offline($args[0]); }907 elsif ($command eq 'set_offline' && $arg_cnt == 1) { $res = MMM::Monitor::Commands::set_offline($args[0]); }
908 elsif ($command eq 'move_role' && $arg_cnt == 2) { $res = MMM::Monitor::Commands::move_role($args[0], $args[1]); }908 elsif ($command eq 'move_role' && $arg_cnt == 2) { $res = MMM::Monitor::Commands::move_role($args[0], $args[1]); }
@@ -917,5 +917,93 @@
917 }917 }
918}918}
919919
920
921=item is_active()
922
923Check if monitor is in active mode
924
925=cut
926
927sub is_active($$) {
928 my $self = shift;
929 return ($self->mode == MMM_MONITOR_MODE_ACTIVE);
930}
931
932
933=item is_manual()
934
935Check if monitor is in manual mode
936
937=cut
938
939sub is_manual($$) {
940 my $self = shift;
941 return ($self->mode == MMM_MONITOR_MODE_MANUAL || $self->mode == MMM_MONITOR_MODE_WAIT);
942}
943
944
945=item is_passive()
946
947Check if monitor is in passive mode
948
949=cut
950
951sub is_passive($$) {
952 my $self = shift;
953 return ($self->mode == MMM_MONITOR_MODE_PASSIVE);
954}
955
956
957=item set_active()
958
959Set mode to active
960
961=cut
962
963sub set_active($$) {
964 my $self = shift;
965 $self->mode(MMM_MONITOR_MODE_ACTIVE);
966}
967
968
969=item set_manual()
970
971Set mode to manual
972
973=cut
974
975sub set_manual($$) {
976 my $self = shift;
977 $self->mode(MMM_MONITOR_MODE_MANUAL);
978}
979
980
981=item set_passive()
982
983Set mode to passive
984
985=cut
986
987sub set_passive($$) {
988 my $self = shift;
989 $self->mode(MMM_MONITOR_MODE_PASSIVE);
990}
991
992
993=item get_mode_string()
994
995Get string representation of current mode
996
997=cut
998
999sub get_mode_string($) {
1000 my $self = shift;
1001 return 'ACTIVE' if ($self->mode == MMM_MONITOR_MODE_ACTIVE);
1002 return 'MANUAL' if ($self->mode == MMM_MONITOR_MODE_MANUAL);
1003 return 'WAIT' if ($self->mode == MMM_MONITOR_MODE_WAIT);
1004 return 'PASSIVE' if ($self->mode == MMM_MONITOR_MODE_PASSIVE);
1005 return 'UNKNOWN'; # should never happen
1006}
1007
9201;10081;
9211009
9221010
=== modified file 'lib/Monitor/NetworkChecker.pm'
--- lib/Monitor/NetworkChecker.pm 2009-02-10 08:18:57 +0000
+++ lib/Monitor/NetworkChecker.pm 2010-03-09 10:25:26 +0000
@@ -54,29 +54,32 @@
54 $checker->shutdown();54 $checker->shutdown();
55}55}
5656
57sub initial_check() {57sub wait_for_network() {
58 my @ips = @{$main::config->{monitor}->{ping_ips}};58 my @ips = @{$main::config->{monitor}->{ping_ips}};
59 my $state = 0;
60 59
61 # Create checker60 # Create checker
62 my $checker = new MMM::Monitor::Checker::('ping_ip');61 my $checker = new MMM::Monitor::Checker::('ping_ip');
6362
64 # Ping all ips63 while (!$main::shutdown) {
65 foreach my $ip (@ips) {64 # Ping all ips
66 # Ping checker65 foreach my $ip (@ips) {
67 $checker->spawn() unless $checker->ping();66 last if ($main::shutdown);
6867 # Ping checker
69 my $res = $checker->check($ip);68 $checker->spawn() unless $checker->ping();
70 if ($res =~ /^OK/) {69
71 DEBUG "IP '$ip' is reachable: $res";70 my $res = $checker->check($ip);
72 $state = 1;71 if ($res =~ /^OK/) {
73 last;72 DEBUG "IP '$ip' is reachable: $res";
73 $checker->shutdown();
74 return 1;
75 }
74 }76 }
75 DEBUG "IP '$ip' is not reachable: $res";77
78 # Sleep a while before checking every ip again
79 sleep($main::config->{monitor}->{ping_interval});
76 }80 }
77 $checker->shutdown();81 $checker->shutdown();
7882 return 0;
79 return $state;
80}83}
8184
821;851;
8386
=== modified file 'lib/Monitor/Roles.pm'
--- lib/Monitor/Roles.pm 2009-10-29 15:27:32 +0000
+++ lib/Monitor/Roles.pm 2010-03-09 10:25:26 +0000
@@ -112,6 +112,29 @@
112}112}
113113
114114
115=item host_has_roles($host)
116
117Check whether there are roles assigned to host $host
118
119=cut
120
121sub host_has_roles($$) {
122 my $self = shift;
123 my $host = shift;
124
125 return 0 unless (defined($host));
126
127 foreach my $role (keys(%$self)) {
128 my $role_info = $self->{$role};
129 foreach my $ip (keys(%{$role_info->{ips}})) {
130 my $ip_info = $role_info->{ips}->{$ip};
131 return 1 if ($ip_info->{assigned_to} eq $host);
132 }
133 }
134 return 0;
135}
136
137
115=item count_host_roles($host)138=item count_host_roles($host)
116139
117Count all roles assigned to host $host140Count all roles assigned to host $host
@@ -155,6 +178,74 @@
155}178}
156179
157180
181=item get_passive_master
182
183Get the passive master
184
185=cut
186
187sub get_passive_master($) {
188 my $self = shift;
189
190 my $role = $self->{$main::config->{active_master_role}};
191 my $active_master = $self->get_active_master();
192 return '' unless $role;
193 return '' unless $active_master;
194
195 foreach my $host ( @{ $role->{hosts} } ) {
196 return $host if ($host ne $active_master);
197 }
198 return '';
199}
200
201
202=item get_first_master
203
204Get the first master
205
206=cut
207
208sub get_first_master($) {
209 my $self = shift;
210
211 my $role = $self->{$main::config->{active_master_role}};
212 return '' unless $role;
213 return '' unless $role->{hosts}[0];
214 return $role->{hosts}[0];
215}
216
217
218=item get_second_master
219
220Get the second master
221
222=cut
223
224sub get_second_master($) {
225 my $self = shift;
226
227 my $role = $self->{$main::config->{active_master_role}};
228 return '' unless $role;
229 return '' unless $role->{hosts}[1];
230 return $role->{hosts}[1];
231}
232
233
234=item get_master_hosts
235
236Get the hosts which can handle the active master-role
237
238=cut
239
240sub get_master_hosts($) {
241 my $self = shift;
242
243 my $role = $self->{$main::config->{active_master_role}};
244 return '' unless $role;
245 return $self->{$role}->{hosts};
246}
247
248
158=item get_exclusive_role_owner($role)249=item get_exclusive_role_owner($role)
159250
160Get the host which has the exclusive role $role assigned251Get the host which has the exclusive role $role assigned
@@ -211,13 +302,13 @@
211}302}
212303
213304
214=item clear_host_roles($host)305=item clear_roles($host)
215306
216Remove all roles from host $host.307Remove all roles from host $host.
217308
218=cut309=cut
219310
220sub clear_host_roles($$) {311sub clear_roles($$) {
221 my $self = shift;312 my $self = shift;
222 my $host = shift;313 my $host = shift;
223314
@@ -238,6 +329,34 @@
238}329}
239330
240331
332=item clear_balanced_role($host, $role)
333
334Remove balanced role $role from host $host.
335
336=cut
337
338sub clear_balanced_role($$$) {
339 my $self = shift;
340 my $host = shift;
341 my $role = shift;
342
343 INFO "Removing balanced role $role from host '$host':";
344
345 my $role_info = $self->{$role};
346 return 0 unless $role_info;
347 my $cnt = 0;
348 next unless ($role_info->{mode} eq 'balanced');
349 foreach my $ip (keys(%{$role_info->{ips}})) {
350 my $ip_info = $role_info->{ips}->{$ip};
351 next unless ($ip_info->{assigned_to} eq $host);
352 $cnt++;
353 INFO " Removed role '$role($ip)' from host '$host'";
354 $ip_info->{assigned_to} = '';
355 }
356 return $cnt;
357}
358
359
241=item find_eligible_host($role)360=item find_eligible_host($role)
242361
243find host which can take over the role $role362find host which can take over the role $role
@@ -562,6 +681,21 @@
562}681}
563682
564683
684=item is_master($host)
685
686Check if host $host can handle role $role.
687
688=cut
689
690sub is_master($$) {
691 my $self = shift;
692 my $host = shift;
693 my $role = $self->{$main::config->{active_master_role}};
694 return 0 unless defined($role);
695 return grep({$_ eq $host} @{$role->{hosts}});
696}
697
698
565=item is_active_master_role($role)699=item is_active_master_role($role)
566700
567Check whether $role is the active master role.701Check whether $role is the active master role.
568702
=== added file 'lib/Monitor/StartupStatus.pm'
--- lib/Monitor/StartupStatus.pm 1970-01-01 00:00:00 +0000
+++ lib/Monitor/StartupStatus.pm 2010-03-09 10:25:26 +0000
@@ -0,0 +1,298 @@
1package MMM::Monitor::StartupStatus;
2
3use strict;
4use warnings FATAL => 'all';
5use List::Util qw(max);
6use Log::Log4perl qw(:easy);
7use MMM::Common::Role;
8use MMM::Monitor::Role;
9use MMM::Monitor::Roles;
10
11our $VERSION = '0.01';
12
13=head1 NAME
14
15MMM::Monitor::StartupStatus - holds information about agent/system/stored status during startup
16
17=cut
18
19sub new($) {
20 my $class = shift;
21
22 my $self = {
23 roles => {},
24 hosts => {},
25 result=> {}
26 };
27 return bless $self, $class;
28}
29
30
31=head1 FUNCTIONS
32
33=over 4
34
35=item set_agent_status($host, $state, $roles, $master)
36
37Set agent status
38
39=cut
40
41sub set_agent_status($$\@$) {
42 my $self = shift;
43 my $host = shift;
44 my $state = shift;
45 my $roles = shift;
46 my $master = shift;
47
48 $self->{hosts}->{$host} = {} unless (defined($self->{hosts}->{$host}));
49 $self->{hosts}->{$host}->{agent} = {
50 state => $state,
51 master => $master
52 };
53 foreach my $role (@{$roles}) {
54 unless (MMM::Monitor::Roles->instance()->exists_ip($role->name, $role->ip)) {
55 WARN "Detected change in role definitions: Role '$role' was removed.";
56 next;
57 }
58 unless (MMM::Monitor::Roles->instance()->can_handle($role->name, $host)) {
59 WARN "Detected change in role definitions: Host '$host' can't handle role '$role' anymore.";
60 next;
61 }
62 my $role_str = $role->to_string();
63 $self->{roles}->{$role_str} = {} unless (defined($self->{roles}->{$role_str}));
64 $self->{roles}->{$role_str}->{$host} = {} unless (defined($self->{roles}->{$role_str}->{$host}));
65 $self->{roles}->{$role_str}->{$host}->{agent} = 1;
66 }
67}
68
69
70=item set_stored_status($host, $state, $roles)
71
72Set stored status
73
74=cut
75
76sub set_stored_status($$\@$) {
77 my $self = shift;
78 my $host = shift;
79 my $state = shift;
80 my $roles = shift;
81
82 $self->{hosts}->{$host} = {} unless (defined($self->{hosts}->{$host}));
83 $self->{hosts}->{$host}->{stored} = {
84 state => $state,
85 };
86 foreach my $role (@{$roles}) {
87 unless (MMM::Monitor::Roles->instance()->exists_ip($role->name, $role->ip)) {
88 WARN "Detected change in role definitions: Role '$role' was removed.";
89 next;
90 }
91 unless (MMM::Monitor::Roles->instance()->can_handle($role->name, $host)) {
92 WARN "Detected change in role definitions: Host '$host' can't handle role '$role' anymore.";
93 next;
94 }
95 my $role_str = $role->to_string();
96 $self->{roles}->{$role_str} = {} unless (defined($self->{roles}->{$role_str}));
97 $self->{roles}->{$role_str}->{$host} = {} unless (defined($self->{roles}->{$role_str}->{$host}));
98 $self->{roles}->{$role_str}->{$host}->{stored} = 1;
99 }
100}
101
102
103=item set_system_status($host, $writable, $roles, $master)
104
105Set system status
106
107=cut
108
109sub set_system_status($$\@$) {
110 my $self = shift;
111 my $host = shift;
112 my $writable= shift;
113 my $roles = shift;
114 my $master = shift;
115
116 $self->{hosts}->{$host} = {} unless (defined($self->{hosts}->{$host}));
117 $self->{hosts}->{$host}->{system} = {
118 writable=> $writable,
119 master => $master
120 };
121 foreach my $role (@{$roles}) {
122 unless (MMM::Monitor::Roles->instance()->exists_ip($role->name, $role->ip)) {
123 WARN "Detected change in role definitions: Role '$role' was removed.";
124 next;
125 }
126 unless (MMM::Monitor::Roles->instance()->can_handle($role->name, $host)) {
127 WARN "Detected change in role definitions: Host '$host' can't handle role '$role' anymore.";
128 next;
129 }
130 my $role_str = $role->to_string();
131 $self->{roles}->{$role_str} = {} unless (defined($self->{roles}->{$role_str}));
132 $self->{roles}->{$role_str}->{$host} = {} unless (defined($self->{roles}->{$role_str}->{$host}));
133 $self->{roles}->{$role_str}->{$host}->{system} = 1;
134 }
135}
136
137sub determine_status() {
138 my $self = shift;
139 my $roles = MMM::Monitor::Roles->instance();
140
141 my $is_manual = MMM::Monitor::Monitor->instance()->is_manual();
142
143 my $conflict = 0;
144
145 foreach my $host (keys(%{$main::config->{host}})) {
146
147 # Figure out host state
148
149 my $stored_state = 'UNKNOWN';
150 my $agent_state = 'UNKNOWN';
151 my $state;
152
153 $stored_state = $self->{hosts}->{$host}->{stored}->{state} if (defined($self->{hosts}->{$host}->{stored}->{state}));
154 $agent_state = $self->{hosts}->{$host}->{agent}->{state} if (defined($self->{hosts}->{$host}->{agent}->{state} ));
155
156 if ( $stored_state eq 'ADMIN_OFFLINE' || $agent_state eq 'ADMIN_OFFLINE' ) { $state = 'ADMIN_OFFLINE'; }
157 elsif ($stored_state eq 'HARD_OFFLINE' || $agent_state eq 'HARD_OFFLINE' ) { $state = 'HARD_OFFLINE'; }
158 elsif ($stored_state eq 'REPLICATION_FAIL' || $agent_state eq 'REPLICATION_FAIL' ) { $state = 'REPLICATION_FAIL'; }
159 elsif ($stored_state eq 'REPLICATION_DELAY' || $agent_state eq 'REPLICATION_DELAY') { $state = 'REPLICATION_DELAY'; }
160 elsif ($stored_state eq 'ONLINE' || $agent_state eq 'ONLINE' ) { $state = 'ONLINE'; }
161 else { $state = 'AWAITING_RECOVERY'; }
162
163 $self->{result}->{$host} = { state => $state, roles => [] };
164 }
165
166 foreach my $role_str (keys(%{$self->{roles}})) {
167 my $role = MMM::Monitor::Role->from_string($role_str);
168 next unless(defined($role));
169
170 if ($roles->is_active_master_role($role->name)) {
171 # active master role
172 my $max = 0;
173 my $target = undef;
174 my $system_cnt = 0;
175 foreach my $host (keys(%{$self->{roles}->{$role_str}})) {
176 my $votes = 0;
177 my $info = $self->{roles}->{$role_str}->{$host};
178 my $host_info = $self->{hosts}->{$host};
179
180 # host is writable
181 $votes += 4 if (defined($host_info->{system}->{writable}) && $host_info->{system}->{writable});
182
183 # IP is configured
184 if (defined($info->{system})) {
185 $votes += 2;
186 $system_cnt++;
187 }
188
189 $votes += 1 if (defined($info->{stored}));
190 $votes += 1 if (defined($info->{agent}));
191
192 foreach my $slave_host (keys(%{$self->{hosts}})) {
193 my $slave_info = $self->{hosts}->{$slave_host};
194 next if MMM::Monitor::Roles->instance()->is_master($slave_host);
195 $votes++ if (defined($slave_info->{system}->{master}) && $slave_info->{system}->{master} eq $host);
196 }
197
198
199 my $state = $self->{result}->{$host}->{state};
200 $votes = 0 if ($state eq 'ADMIN_OFFLINE');
201 $votes = 0 if ($state eq 'HARD_OFFLINE' && !$is_manual);
202
203 if ($votes > $max) {
204 $target = $host;
205 $max = $votes;
206 }
207 }
208 if ($system_cnt > 1) {
209 WARN "Role '$role_str' was configured on $system_cnt hosts during monitor startup.";
210 $conflict = 1;
211 }
212 if (defined($target)) {
213 push (@{$self->{result}->{$target}->{roles}}, $role);
214 my $state = $self->{result}->{$target}->{state};
215 $self->{result}->{$target}->{state} = 'ONLINE' if (!$is_manual || $state eq 'REPLICATION_FAIL' || $state eq 'REPLICATION_DELAY');
216 }
217 next;
218 }
219
220 # Handle non-writer roles
221 my $max = 0;
222 my $target = undef;
223 my $system_cnt = 0;
224 foreach my $host (keys(%{$self->{roles}->{$role_str}})) {
225 my $votes = 0;
226 my $info = $self->{roles}->{$role_str}->{$host};
227
228 # IP is configured
229 if (defined($info->{system})) {
230 $votes += 4;
231 $system_cnt++;
232 }
233
234 $votes += 2 if (defined($info->{stored}));
235 $votes += 1 if (defined($info->{agent}));
236
237
238 my $state = $self->{result}->{$host}->{state};
239 if ($state eq 'ADMIN_OFFLINE' || (!$is_manual && $state ne 'ONLINE' && $state ne 'AWAITING_RECOVERY')) {
240 $votes = 0;
241 }
242 if ($votes > $max) {
243 $target = $host;
244 $max = $votes;
245 }
246 }
247 if ($system_cnt > 1) {
248 WARN "Role '$role_str' was configured on $system_cnt hosts during monitor startup.";
249 }
250 if (defined($target)) {
251 push (@{$self->{result}->{$target}->{roles}}, $role);
252 $self->{result}->{$target}->{state} = 'ONLINE' if ($self->{result}->{$target}->{state} eq 'AWAITING_RECOVERY');
253 }
254 }
255 return $conflict;
256}
257
258
259sub to_string($) {
260 my $self = shift;
261 my $ret = "Startup status:\n";
262 $ret .= "\nRoles:\n";
263
264 my $role_len = 4; # "Role"
265 my $host_len = 6; # "Master"
266
267 foreach my $role (keys(%{$main::config->{role}})) { $role_len = max($role_len, length $role) }
268 foreach my $host (keys(%{$main::config->{host}})) { $host_len = max($host_len, length $host) }
269 $role_len += 17; # "(999.999.999.999)"
270
271 $ret .= sprintf(" %-*s %-*s %-6s %-6s %-5s\n", $role_len, 'Role', $host_len, 'Host', 'Stored', 'System', 'Agent');
272 foreach my $role (keys(%{$self->{roles}})) {
273 foreach my $host (keys(%{$self->{roles}->{$role}})) {
274 my $info = $self->{roles}->{$role}->{$host};
275 $ret .= sprintf(" %-*s %-*s %-6s %-6s %-5s\n", $role_len, $role, $host_len, $host,
276 defined($info->{stored}) ? 'Yes' : '-',
277 defined($info->{system}) ? 'Yes' : '-',
278 defined($info->{agent}) ? 'Yes' : '-'
279 );
280 }
281 }
282
283 $ret .= "\nHosts:\n";
284 $ret .= sprintf(" %-*s %-*s %-8s %-16s %-16s\n", $host_len, 'Host', $host_len, 'Master', 'Writable', 'Stored state', 'Agent state');
285 foreach my $host (keys(%{$self->{hosts}})) {
286 my $info = $self->{hosts}->{$host};
287 my $is_master = MMM::Monitor::Roles->instance()->is_master($host);
288 $ret .= sprintf(" %-*s %-*s %-8s %-16s %-16s\n", $host_len, $host, $host_len,
289 $is_master ? '-' : (defined($info->{system}->{master}) ? $info->{system}->{master} : '?'),
290 defined($info->{system}->{writable}) ? ($info->{system}->{writable} ? 'Yes' : 'No') : '?',
291 defined($info->{stored}->{state}) ? $info->{stored}->{state} : '?',
292 defined($info->{agent}->{state}) ? $info->{agent}->{state} : '?',
293 );
294 }
295 return $ret;
296}
297
2981;
0299
=== modified file 'lib/Monitor/t/Roles.t'
--- lib/Monitor/t/Roles.t 2009-02-05 08:43:52 +0000
+++ lib/Monitor/t/Roles.t 2010-03-09 10:25:26 +0000
@@ -55,7 +55,7 @@
55$roles->assign($role_writer, 'db1');55$roles->assign($role_writer, 'db1');
56is($roles->get_active_master(), 'db1', 'Active master after assigning writer role');56is($roles->get_active_master(), 'db1', 'Active master after assigning writer role');
5757
58$roles->clear_host_roles($roles->get_active_master());58$roles->clear_roles($roles->get_active_master());
59is($roles->get_active_master(), '', 'No active master with active master host cleared');59is($roles->get_active_master(), '', 'No active master with active master host cleared');
6060
61$roles->assign($role_writer, 'db2');61$roles->assign($role_writer, 'db2');
@@ -84,7 +84,7 @@
84is($roles->count_host_roles('db2'), 2, 'balance roles (role count db2)');84is($roles->count_host_roles('db2'), 2, 'balance roles (role count db2)');
8585
86$agents->{db2}->state('HARD_OFFLINE');86$agents->{db2}->state('HARD_OFFLINE');
87$roles->clear_host_roles('db2');87$roles->clear_roles('db2');
88$roles->process_orphans('exclusive');88$roles->process_orphans('exclusive');
89$roles->process_orphans('balanced');89$roles->process_orphans('balanced');
90is($roles->count_host_roles('db1'), 4, 'process orphans assigns all orphaned roles');90is($roles->count_host_roles('db1'), 4, 'process orphans assigns all orphaned roles');
9191
=== modified file 'sbin/mmm_mond'
--- sbin/mmm_mond 2010-02-11 02:23:38 +0000
+++ sbin/mmm_mond 2010-03-09 10:25:26 +0000
@@ -72,11 +72,6 @@
7272
73our $monitor = new MMM::Monitor::Monitor::();73our $monitor = new MMM::Monitor::Monitor::();
7474
75if (!MMM::Monitor::NetworkChecker->initial_check()) {
76 LOGDIE "None of the 'ping_ips' could be reached during startup. Network seems to be down - mmm_mond will shutdown now.";
77}
78
79
80my $pidfilename = $config->{monitor}->{pid_path};75my $pidfilename = $config->{monitor}->{pid_path};
81my $pidfile = new MMM::Common::PidFile:: $pidfilename;76my $pidfile = new MMM::Common::PidFile:: $pidfilename;
8277
@@ -106,9 +101,9 @@
106$SIG{PIPE} = 'IGNORE';101$SIG{PIPE} = 'IGNORE';
107$SIG{CHLD} = \&ChildHandler;102$SIG{CHLD} = \&ChildHandler;
108103
109$monitor->init();104if ($monitor->init()) {
110105 $monitor->main();
111$monitor->main();106}
112107
113INFO 'END';108INFO 'END';
114exit(0);109exit(0);

Subscribers

People subscribed via source and target branches