Merge lp:~mysql-mmm-core/mysql-mmm/devel into lp:mysql-mmm
- devel
- Merge into trunk-2.x
Proposed by
Pascal Hofmann
| Status: | Merged |
|---|---|
| Merged at revision: | not available |
| Proposed branch: | lp:~mysql-mmm-core/mysql-mmm/devel |
| Merge into: | lp:mysql-mmm |
| Diff against target: |
1821 lines (+914/-368) 13 files modified
doc/mmm_configuration.texi (+23/-0) doc/mmm_control.texi (+18/-95) doc/mmm_monitor.texi (+19/-16) lib/Agent/Agent.pm (+21/-2) lib/Common/Config.pm (+4/-1) lib/Monitor/Checker.pm (+2/-2) lib/Monitor/Commands.pm (+92/-35) lib/Monitor/Monitor.pm (+278/-190) lib/Monitor/NetworkChecker.pm (+18/-15) lib/Monitor/Roles.pm (+136/-2) lib/Monitor/StartupStatus.pm (+298/-0) lib/Monitor/t/Roles.t (+2/-2) sbin/mmm_mond (+3/-8) |
| To merge this branch: | bzr merge lp:~mysql-mmm-core/mysql-mmm/devel |
| Related bugs: |
| Reviewer | Review Type | Date Requested | Status |
|---|---|---|---|
| Pascal Hofmann | Approve | ||
|
Review via email:
|
|||
Commit message
* Added manual mode (bug #531011), wait mode, config values 'mode' and 'wait_for_
* Don't die at startup when no network connection is available - wait for it to appear instead (bug #416572)
* Changed startup behaviour. mmm_mond will only go into passive mode if it detects the active_master_role on more than one host.
* Added config value 'careful_startup' (bug #422549). If set to 0 mmm_mond won't ever switch into passive mode at startup.
* Added check for invalid agent commands (prevents crash when mmmd_mon version 1.x talks to an 2.x agent).
Description of the change
To post a comment you must log in.
Revision history for this message
| Pascal Hofmann (pascalhofmann) : | # |
review:
Approve
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
| 1 | === modified file 'doc/mmm_configuration.texi' |
| 2 | --- doc/mmm_configuration.texi 2010-02-05 17:27:32 +0000 |
| 3 | +++ doc/mmm_configuration.texi 2010-03-09 10:25:26 +0000 |
| 4 | @@ -231,6 +231,29 @@ |
| 5 | @item Used by: @tab monitor |
| 6 | @end multitable |
| 7 | |
| 8 | +@item @strong{careful_startup} |
| 9 | +@multitable @columnfractions 0.2 0.8 |
| 10 | +@item Description: @tab Startup carefully i.e. switch into passive mode when writer role is configured on multiple hosts. |
| 11 | +@item Allowed values: @tab true/yes/1/on false/no/0/off |
| 12 | +@item Default value: @tab 0 |
| 13 | +@item Used by: @tab monitor |
| 14 | +@end multitable |
| 15 | + |
| 16 | +@item @strong{mode} |
| 17 | +@multitable @columnfractions 0.2 0.8 |
| 18 | +@item Description: @tab Default mode of monitor. |
| 19 | +@item Allowed values: @tab active manual wait passive |
| 20 | +@item Default value: @tab active |
| 21 | +@item Used by: @tab monitor |
| 22 | +@end multitable |
| 23 | + |
| 24 | +@item @strong{wait_for_other_master} |
| 25 | +@multitable @columnfractions 0.2 0.8 |
| 26 | +@item Description: @tab How many seconds to wait for other master to become @code{ONLINE} before switching from mode @code{WAIT} to mode @code{ACTIVE}. 0 = infinite. |
| 27 | +@item Default value: @tab 120 |
| 28 | +@item Used by: @tab monitor |
| 29 | +@end multitable |
| 30 | + |
| 31 | @end itemize |
| 32 | |
| 33 | |
| 34 | |
| 35 | === modified file 'doc/mmm_control.texi' |
| 36 | --- doc/mmm_control.texi 2010-02-18 14:32:50 +0000 |
| 37 | +++ doc/mmm_control.texi 2010-03-09 10:25:26 +0000 |
| 38 | @@ -132,7 +132,7 @@ |
| 39 | @end example |
| 40 | |
| 41 | @noindent |
| 42 | -See @ref{Passive mode}. |
| 43 | +See @ref{Modes}. |
| 44 | |
| 45 | @section @code{set_active} |
| 46 | Switch the monitor into @code{ACTIVE} mode: |
| 47 | @@ -143,7 +143,18 @@ |
| 48 | @end example |
| 49 | |
| 50 | @noindent |
| 51 | -See @ref{Passive mode}. |
| 52 | +See @ref{Modes}. |
| 53 | + |
| 54 | +@section @code{set_manual} |
| 55 | +Switch the monitor into @code{MANUAL} mode: |
| 56 | + |
| 57 | +@example |
| 58 | +# mmm_control set_manual |
| 59 | +OK: Switched into manual mode. |
| 60 | +@end example |
| 61 | + |
| 62 | +@noindent |
| 63 | +See @ref{Modes}. |
| 64 | |
| 65 | @section @code{set_passive} |
| 66 | Switch the monitor into @code{PASSIVE} mode: |
| 67 | @@ -154,10 +165,10 @@ |
| 68 | @end example |
| 69 | |
| 70 | @noindent |
| 71 | -See @ref{Passive mode}. |
| 72 | +See @ref{Modes}. |
| 73 | |
| 74 | @section @code{move_role @var{role} @var{host}} |
| 75 | -Used to move an exclusive role between the cluster nodes. This command is available in @code{ACTIVE} mode only. Lets assume the following situation: |
| 76 | +Used to move an exclusive role between the cluster nodes. This command is not available in @code{PASSIVE} mode. Lets assume the following situation: |
| 77 | |
| 78 | @smallexample |
| 79 | # mmm_control show |
| 80 | @@ -179,96 +190,8 @@ |
| 81 | @end smallexample |
| 82 | |
| 83 | @section @code{move_role --force @var{role} @var{host}} |
| 84 | -Can be used to move the @var{active_master_role} to a host with state @code{REPLICATION_FAIL} or @code{REPLICATION_BACKLOG}. Use this with caution! This command is available in @code{ACTIVE} mode only. |
| 85 | +Can be used to move the @var{active_master_role} to a host with state @code{REPLICATION_FAIL} or @code{REPLICATION_BACKLOG}. Use this with caution! This command is not available in @code{PASSIVE} mode. |
| 86 | |
| 87 | @section @code{set_ip @var{ip} @var{host}} |
| 88 | -@code{set_ip} can be used to manipulate the roles in @code{PASSIVE} mode. The changes won't be applied until the monitor is switched into @code{ACTIVE} mode via @code{set_active}. |
| 89 | - |
| 90 | -@* |
| 91 | -Let's assume we have our cluster up and running with the following status: |
| 92 | - |
| 93 | -@smallexample |
| 94 | -# mmm_control show |
| 95 | - db1(192.168.0.31) master/ONLINE. Roles: writer(192.168.0.50) |
| 96 | - db2(192.168.0.32) master/ONLINE. Roles: reader(192.168.0.51) |
| 97 | - db3(192.168.0.33) slave/ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53) |
| 98 | -@end smallexample |
| 99 | - |
| 100 | -@noindent |
| 101 | -Now, several bad things happen: |
| 102 | -@enumerate |
| 103 | -@item network connection to db1 fails |
| 104 | -@item mmm_mond detects that db1 has failed |
| 105 | -@item mmm_mond moves the writer role to db2, but can't remove it from db1 (because it can't connect to it) |
| 106 | -@item mmm_mond crashes and the status file gets corrupted. |
| 107 | -@item network connection to db1 recovers |
| 108 | -@item The admin restarts mmm_mond |
| 109 | -@end enumerate |
| 110 | - |
| 111 | -mmm_mond has no status information now, and two nodes report, that they have the |
| 112 | -@code{writer} role, so mmm_mond doesn't know what it should do and will switch |
| 113 | -into @code{PASSIVE} mode. |
| 114 | - |
| 115 | -@smallexample |
| 116 | -# mmm_control mode |
| 117 | -PASSIVE |
| 118 | - |
| 119 | -# mmm_control show |
| 120 | -# --- Monitor is in PASSIVE MODE --- |
| 121 | -# Cause: Discrepancies between stored status, agent status and system status during startup. |
| 122 | -# |
| 123 | -# Stored status: |
| 124 | -# db1(192.168.0.31) master/UNKNOWN. Roles: |
| 125 | -# db2(192.168.0.32) master/UNKNOWN. Roles: |
| 126 | -# db3(192.168.0.33) slave/UNKNOWN. Roles: |
| 127 | -# |
| 128 | -# Agent status: |
| 129 | -# db1 ONLINE. Roles: writer(192.168.0.50). Master: ? |
| 130 | -# db2 ONLINE. Roles: writer(192.168.0.50), reader(192.168.0.51). Master: ? |
| 131 | -# db3 ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53). Master: db2 |
| 132 | -# |
| 133 | -# System status: |
| 134 | -# db1 writable. Roles: writer(192.168.0.50) |
| 135 | -# db2 writable. Roles: writer(192.168.0.50), reader(192.168.0.51) |
| 136 | -# db3 readonly. Roles: reader(192.168.0.52), reader(192.168.0.53) |
| 137 | -# |
| 138 | - db1(192.168.0.31) master/ONLINE. Roles: writer(192.168.0.50) |
| 139 | - db2(192.168.0.32) master/ONLINE. Roles: reader(192.168.0.51) |
| 140 | - db3(192.168.0.33) slave/ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53) |
| 141 | -@end smallexample |
| 142 | - |
| 143 | -@noindent |
| 144 | -As you see, mmm_mond tries to recover the status as well as possible. But in this situation it's wrong so one must move the writer role to db2 manually: |
| 145 | - |
| 146 | -@smallexample |
| 147 | -# mmm_control set_ip 192.168.0.50 db2 |
| 148 | -OK: Set role 'writer(192.168.0.50)' to host 'db2'. |
| 149 | -@end smallexample |
| 150 | - |
| 151 | -@noindent |
| 152 | -Now take a look at the status, everything looks ok: |
| 153 | - |
| 154 | -@smallexample |
| 155 | -# mmm_control show |
| 156 | -# --- Monitor is in PASSIVE MODE --- |
| 157 | -# [...] |
| 158 | - db1(192.168.0.31) master/ONLINE. Roles: |
| 159 | - db2(192.168.0.32) master/ONLINE. Roles: writer(192.168.0.50), reader(192.168.0.51) |
| 160 | - db3(192.168.0.33) slave/ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53) |
| 161 | -@end smallexample |
| 162 | - |
| 163 | -@noindent |
| 164 | -Finally switch the monitor into active mode, so that it will apply the roles: |
| 165 | - |
| 166 | -@smallexample |
| 167 | -# mmm_control set_active |
| 168 | -OK: Switched into active mode. |
| 169 | - |
| 170 | -# mmm_control show |
| 171 | - db1(192.168.0.31) master/ONLINE. Roles: reader(192.168.0.51) |
| 172 | - db2(192.168.0.32) master/ONLINE. Roles: writer(192.168.0.50) |
| 173 | - db3(192.168.0.33) slave/ONLINE. Roles: reader(192.168.0.52), reader(192.168.0.53) |
| 174 | -@end smallexample |
| 175 | - |
| 176 | -@* |
| 177 | -@strong{Note:} The role @code{reader(192.168.0.51)} has been moved to db1, because @code{reader} is a @code{balanced} role. |
| 178 | +@code{set_ip} can be used to manipulate the roles in @code{PASSIVE} mode. The changes won't be applied until the monitor is switched into @code{ACTIVE} or @code{manual} mode via @code{set_active} or @code{set_manual}. |
| 179 | + |
| 180 | |
| 181 | === modified file 'doc/mmm_monitor.texi' |
| 182 | --- doc/mmm_monitor.texi 2010-02-05 17:27:32 +0000 |
| 183 | +++ doc/mmm_monitor.texi 2010-03-09 10:25:26 +0000 |
| 184 | @@ -108,7 +108,7 @@ |
| 185 | @end itemize |
| 186 | |
| 187 | @noindent |
| 188 | -If the network connection doesn't work during startup, mmm_mond will switch into passive mode (@pxref{Passive mode}). |
| 189 | +If the network connection doesn't work during startup, mmm_mond will delay startup until it's available again. |
| 190 | |
| 191 | |
| 192 | @node Flapping |
| 193 | @@ -130,20 +130,22 @@ |
| 194 | If @var{auto_set_online} is > 0, flapping hosts will automatically be set to @code{ONLINE} |
| 195 | after @var{flap_duration} seconds. |
| 196 | |
| 197 | -@node Passive mode |
| 198 | -@section Passive mode |
| 199 | -@cindex passive mode |
| 200 | - |
| 201 | -entered if no network connection during startup |
| 202 | -entered if discrepancies are detected during startup |
| 203 | -entered with set_passive |
| 204 | - |
| 205 | -roles can be changed (unclean) with set_ip |
| 206 | -changed to active with set_active |
| 207 | - |
| 208 | -roles get never changed automatically |
| 209 | -nothing is send to agents |
| 210 | -status file won't be updated |
| 211 | +@node Modes |
| 212 | +@section Modes |
| 213 | +@cindex Modes |
| 214 | + |
| 215 | +@subsection Active mode |
| 216 | +The monitor will remove roles from failed hosts and assign them to other hosts automatically. |
| 217 | +@subsection Manual mode |
| 218 | +The monitor will only distribute @code{balanced} roles across the hosts, but will not remove roles from failed hosts automatically. You can remove roles from failed hosts manually with @code{move_role}. |
| 219 | +@subsection Wait mode |
| 220 | +Like @code{MANUAL} mode, but mode will be changed into @code{ACTIVE} mode when both master hosts are @code{ONLINE} or after @code{wait_for_other_master} seconds have elapsed. |
| 221 | +@subsection Passive mode |
| 222 | +In passive mode the monitor doesn't change roles, update the status file nor send anything to agents. |
| 223 | +In passive mode you can modify roles (unclean) with @code{set_ip} - the changes won't be applied until the monitor is switched to @code{ACTIVE} or @code{MANUAL} mode with @code{set_active} or @code{set_manual}. |
| 224 | +Passive mode will be entered if conflicting roles are detected during startup. You should then analyze the situation, fix the role information (if needed) and switch into @code{ACTIVE} or @code{MANUAL} mode. |
| 225 | +It also can be entered manually with @code{set_passive}. |
| 226 | + |
| 227 | |
| 228 | @node Startup |
| 229 | @section Startup |
| 230 | @@ -152,6 +154,7 @@ |
| 231 | @itemize |
| 232 | |
| 233 | @item Initial network check |
| 234 | +@item If network is down startup will be delayed until it's reachable again. |
| 235 | @item Initial host checks |
| 236 | @item reads status information from ... |
| 237 | @itemize @minus |
| 238 | @@ -159,7 +162,7 @@ |
| 239 | @item agents (agent info) |
| 240 | @item hosts (system info) |
| 241 | @end itemize |
| 242 | -@item If status information doesn't match or network is down @code{PASSIVE} mode will be entered. |
| 243 | +and tries to figure out the cluster status. |
| 244 | @end itemize |
| 245 | |
| 246 | @node Role transition |
| 247 | |
| 248 | === modified file 'lib/Agent/Agent.pm' |
| 249 | --- lib/Agent/Agent.pm 2009-10-30 07:19:35 +0000 |
| 250 | +++ lib/Agent/Agent.pm 2010-03-09 10:25:26 +0000 |
| 251 | @@ -4,7 +4,9 @@ |
| 252 | use warnings FATAL => 'all'; |
| 253 | use English qw(EVAL_ERROR); |
| 254 | use Algorithm::Diff; |
| 255 | +use DBI; |
| 256 | use Class::Struct; |
| 257 | +use Errno qw(EINTR); |
| 258 | use Log::Log4perl qw(:easy); |
| 259 | use MMM::Common::Role; |
| 260 | use MMM::Common::Socket; |
| 261 | @@ -81,6 +83,7 @@ |
| 262 | DEBUG "Received Command $cmd"; |
| 263 | my ($cmd_name, $version, $host, @params) = split('\|', $cmd, -1); |
| 264 | |
| 265 | + return "ERROR: Invalid command '$cmd'!" unless (defined($host)); |
| 266 | return "ERROR: Invalid hostname in command ($host)! My name is '" . $self->name . "'" if ($host ne $self->name); |
| 267 | |
| 268 | if ($version > main::MMM_PROTOCOL_VERSION) { |
| 269 | @@ -114,7 +117,23 @@ |
| 270 | sub cmd_get_system_status($) { |
| 271 | my $self = shift; |
| 272 | |
| 273 | - # TODO maybe determine and send master info if we are a slave host. |
| 274 | + # determine master info |
| 275 | + my $dsn = sprintf("DBI:mysql:host=%s;port=%s;mysql_connect_timeout=3", $self->ip, $self->mysql_port); |
| 276 | + my $eintr = EINTR; |
| 277 | + my $master_ip = ''; |
| 278 | + |
| 279 | + my $dbh; |
| 280 | +CONNECT: { |
| 281 | + DEBUG "Connecting to mysql"; |
| 282 | + $dbh = DBI->connect($dsn, $self->mysql_user, $self->mysql_password, { PrintError => 0 }); |
| 283 | + unless ($dbh) { |
| 284 | + redo CONNECT if ($DBI::err == 2003 && $DBI::errstr =~ /\($eintr\)/); |
| 285 | + WARN "Couldn't connect to mysql. Can't determine current master host." . $DBI::err . " " . $DBI::errstr; |
| 286 | + } |
| 287 | +} |
| 288 | + |
| 289 | + my $slave_status = $dbh->selectrow_hashref('SHOW SLAVE STATUS'); |
| 290 | + $master_ip = $slave_status->{Master_Host} if (defined($slave_status)); |
| 291 | |
| 292 | my @roles; |
| 293 | foreach my $role (keys(%{$main::config->{role}})) { |
| 294 | @@ -133,7 +152,7 @@ |
| 295 | return "ERROR: Could not check if MySQL is writable: $res" if ($ret == 255); |
| 296 | my $writable = ($ret == 1); |
| 297 | |
| 298 | - my $answer = join('|', ($writable, join(',', @roles))); |
| 299 | + my $answer = join('|', ($writable, join(',', @roles), $master_ip)); |
| 300 | return "OK: Returning status!|$answer"; |
| 301 | } |
| 302 | |
| 303 | |
| 304 | === modified file 'lib/Common/Config.pm' |
| 305 | --- lib/Common/Config.pm 2010-02-03 09:06:11 +0000 |
| 306 | +++ lib/Common/Config.pm 2010-03-09 10:25:26 +0000 |
| 307 | @@ -37,7 +37,10 @@ |
| 308 | 'flap_duration' => { 'default' => 60 * 60 }, |
| 309 | 'flap_count' => { 'default' => 3 }, |
| 310 | 'auto_set_online' => { 'default' => 0 }, |
| 311 | - 'kill_host_bin' => { 'default' => 'kill_host' } |
| 312 | + 'kill_host_bin' => { 'default' => 'kill_host' }, |
| 313 | + 'careful_startup' => { 'default' => 1, 'boolean' => 1 }, |
| 314 | + 'mode' => { 'default' => 'active', 'values' => ['passive', 'active', 'manual', 'wait'] }, |
| 315 | + 'wait_for_other_master' => { 'default' => 120 } |
| 316 | } |
| 317 | }, |
| 318 | 'socket' => { 'create_if_empty' => ['AGENT', 'CONTROL', 'MONITOR'], 'section' => { |
| 319 | |
| 320 | === modified file 'lib/Monitor/Checker.pm' |
| 321 | --- lib/Monitor/Checker.pm 2010-02-08 15:06:09 +0000 |
| 322 | +++ lib/Monitor/Checker.pm 2010-03-09 10:25:26 +0000 |
| 323 | @@ -184,7 +184,7 @@ |
| 324 | my $self = shift; |
| 325 | my $name = $self->{name}; |
| 326 | |
| 327 | - DEBUG "Pinging checker '$name'..."; |
| 328 | +# DEBUG "Pinging checker '$name'..."; |
| 329 | |
| 330 | my $reader = $self->{reader}; |
| 331 | my $writer = $self->{writer}; |
| 332 | @@ -202,7 +202,7 @@ |
| 333 | return 0; |
| 334 | } |
| 335 | |
| 336 | - DEBUG "Checker '$name' is OK ($recv_res)"; |
| 337 | +# DEBUG "Checker '$name' is OK ($recv_res)"; |
| 338 | return 1; |
| 339 | } |
| 340 | |
| 341 | |
| 342 | === modified file 'lib/Monitor/Commands.pm' |
| 343 | --- lib/Monitor/Commands.pm 2010-03-03 00:34:21 +0000 |
| 344 | +++ lib/Monitor/Commands.pm 2010-03-09 10:25:26 +0000 |
| 345 | @@ -61,7 +61,7 @@ |
| 346 | my $roles = MMM::Monitor::Roles->instance(); |
| 347 | |
| 348 | my $ret = ''; |
| 349 | - if ($monitor->passive) { |
| 350 | + if ($monitor->is_passive) { |
| 351 | $ret .= "--- Monitor is in PASSIVE MODE ---\n"; |
| 352 | $ret .= sprintf("Cause: %s\n", $monitor->passive_info); |
| 353 | $ret =~ s/^/# /mg; |
| 354 | @@ -193,7 +193,7 @@ |
| 355 | |
| 356 | FATAL "Admin changed state of '$host' from $host_state to ADMIN_OFFLINE"; |
| 357 | $agents->set_state($host, 'ADMIN_OFFLINE'); |
| 358 | - MMM::Monitor::Roles->instance()->clear_host_roles($host); |
| 359 | + MMM::Monitor::Roles->instance()->clear_roles($host); |
| 360 | MMM::Monitor::Monitor->instance()->send_agent_status($host); |
| 361 | |
| 362 | return "OK: State of '$host' changed to ADMIN_OFFLINE. Now you can wait some time and check all roles!"; |
| 363 | @@ -203,7 +203,7 @@ |
| 364 | my $ip = shift; |
| 365 | my $host = shift; |
| 366 | |
| 367 | - return "ERROR: This command is only allowed in passive mode" unless (MMM::Monitor::Monitor->instance()->passive); |
| 368 | + return "ERROR: This command is only allowed in passive mode" unless (MMM::Monitor::Monitor->instance()->is_passive); |
| 369 | |
| 370 | my $agents = MMM::Monitor::Agents->instance(); |
| 371 | my $roles = MMM::Monitor::Roles->instance(); |
| 372 | @@ -239,14 +239,19 @@ |
| 373 | my $role = shift; |
| 374 | my $host = shift; |
| 375 | |
| 376 | - return "ERROR: This command is only allowed in active mode" if (MMM::Monitor::Monitor->instance()->passive); |
| 377 | + my $monitor = MMM::Monitor::Monitor->instance(); |
| 378 | + return "ERROR: This command is not allowed in passive mode" if ($monitor->is_passive); |
| 379 | |
| 380 | my $agents = MMM::Monitor::Agents->instance(); |
| 381 | my $roles = MMM::Monitor::Roles->instance(); |
| 382 | |
| 383 | return "ERROR: Unknown role name '$role'!" unless ($roles->exists($role)); |
| 384 | return "ERROR: Unknown host name '$host'!" unless ($agents->exists($host)); |
| 385 | - return "ERROR: move_role may be used for exclusive roles only!" unless ($roles->is_exclusive($role)); |
| 386 | + |
| 387 | + unless ($roles->is_exclusive($role)) { |
| 388 | + $roles->clear_balanced_role($host, $role); |
| 389 | + return "OK: Balanced role $role has been removed from host '$host'. Now you can wait some time and check new roles info!"; |
| 390 | + } |
| 391 | |
| 392 | my $host_state = $agents->state($host); |
| 393 | return "ERROR: Can't move role to host with state $host_state." unless ($host_state eq 'ONLINE'); |
| 394 | @@ -261,7 +266,9 @@ |
| 395 | my $agent = MMM::Monitor::Agents->instance()->get($host); |
| 396 | return "ERROR: Can't reach agent daemon on '$host'! Can't move roles there!" unless ($agent->cmd_ping()); |
| 397 | |
| 398 | - return "ERROR: Role '$role' is assigned to preferred host '$old_owner'. Can't move it!" if ($roles->assigned_to_preferred_host($role)); |
| 399 | + if ($monitor->is_active && $roles->assigned_to_preferred_host($role)) { |
| 400 | + return "ERROR: Role '$role' is assigned to preferred host '$old_owner'. Can't move it!"; |
| 401 | + } |
| 402 | |
| 403 | my $ip = $roles->get_exclusive_role_ip($role); |
| 404 | return "Error: Role $role has no IP." unless ($ip); |
| 405 | @@ -272,13 +279,13 @@ |
| 406 | $roles->set_role($role, $ip, $host); |
| 407 | |
| 408 | # Notify old host (if is_active_master_role($role) this will make the host non writable) |
| 409 | - MMM::Monitor::Monitor->instance()->send_agent_status($old_owner); |
| 410 | + $monitor->send_agent_status($old_owner); |
| 411 | |
| 412 | # Notify slaves (this will make them switch the master) |
| 413 | - MMM::Monitor::Monitor->instance()->notify_slaves($host) if ($roles->is_active_master_role($role)); |
| 414 | + $monitor->notify_slaves($host) if ($roles->is_active_master_role($role)); |
| 415 | |
| 416 | # Notify new host (if is_active_master_role($role) this will make the host writable) |
| 417 | - MMM::Monitor::Monitor->instance()->send_agent_status($host); |
| 418 | + $monitor->send_agent_status($host); |
| 419 | |
| 420 | return "OK: Role '$role' has been moved from '$old_owner' to '$host'. Now you can wait some time and check new roles info!"; |
| 421 | |
| 422 | @@ -288,7 +295,8 @@ |
| 423 | my $role = shift; |
| 424 | my $host = shift; |
| 425 | |
| 426 | - return "ERROR: This command is only allowed in active mode" if (MMM::Monitor::Monitor->instance()->passive); |
| 427 | + my $monitor = MMM::Monitor::Monitor->instance(); |
| 428 | + return "ERROR: This command is not allowed in passive mode" if (MMM::Monitor::Monitor->instance()->is_passive); |
| 429 | |
| 430 | my $agents = MMM::Monitor::Agents->instance(); |
| 431 | my $roles = MMM::Monitor::Roles->instance(); |
| 432 | @@ -328,12 +336,12 @@ |
| 433 | if (!$checks->rep_threads($old_owner)) { |
| 434 | FATAL "State of host '$old_owner' changed from ONLINE to REPLICATION_FAIL (because of move_role --force)"; |
| 435 | $old_agent->state('REPLICATION_FAIL'); |
| 436 | - $roles->clear_host_roles($old_owner); |
| 437 | + $roles->clear_roles($old_owner) if ($monitor->is_active); |
| 438 | } |
| 439 | elsif (!$checks->rep_backlog($old_owner)) { |
| 440 | FATAL "State of host '$old_owner' changed from ONLINE to REPLICATION_BACKLOG (because of move_role --force)"; |
| 441 | $old_agent->state('REPLICATION_BACKLOG'); |
| 442 | - $roles->clear_host_roles($old_owner); |
| 443 | + $roles->clear_roles($old_owner) if ($monitor->is_active); |
| 444 | } |
| 445 | |
| 446 | # Notify old host (this will make the host non writable) |
| 447 | @@ -352,13 +360,13 @@ |
| 448 | |
| 449 | =item mode |
| 450 | |
| 451 | -Get information about current mode (active or passive) |
| 452 | +Get information about current mode (active, manual or passive) |
| 453 | |
| 454 | =cut |
| 455 | |
| 456 | sub mode() { |
| 457 | - return 'PASSIVE' if (MMM::Monitor::Monitor->instance()->passive); |
| 458 | - return 'ACTIVE'; |
| 459 | + my $monitor = MMM::Monitor::Monitor->instance(); |
| 460 | + return $monitor->get_mode_string(); |
| 461 | } |
| 462 | |
| 463 | |
| 464 | @@ -369,26 +377,69 @@ |
| 465 | =cut |
| 466 | |
| 467 | sub set_active() { |
| 468 | - return 'OK: Already in active mode.' unless (MMM::Monitor::Monitor->instance()->passive); |
| 469 | - |
| 470 | - |
| 471 | - # Send status to agents |
| 472 | - MMM::Monitor::Monitor->instance()->send_status_to_agents(); |
| 473 | - |
| 474 | - # Clear 'bad' roles |
| 475 | - my $agents = MMM::Monitor::Agents->instance(); |
| 476 | - foreach my $host (keys(%{$main::config->{host}})) { |
| 477 | - my $agent = $agents->get($host); |
| 478 | - $agent->cmd_clear_bad_roles(); # TODO check result |
| 479 | - } |
| 480 | - |
| 481 | - |
| 482 | - MMM::Monitor::Monitor->instance()->passive(0); |
| 483 | - MMM::Monitor::Monitor->instance()->passive_info(''); |
| 484 | + my $monitor = MMM::Monitor::Monitor->instance(); |
| 485 | + |
| 486 | + return 'OK: Already in active mode.' if ($monitor->is_active); |
| 487 | + |
| 488 | + my $old_mode = $monitor->get_mode_string(); |
| 489 | + INFO "Admin changed mode from '$old_mode' to 'ACTIVE'"; |
| 490 | + |
| 491 | + if ($monitor->is_passive) { |
| 492 | + $monitor->set_active(); # so that we can send status to agents |
| 493 | + $monitor->cleanup_and_send_status(); |
| 494 | + $monitor->passive_info(''); |
| 495 | + } |
| 496 | + elsif ($monitor->is_manual) { |
| 497 | + # remove all roles from hosts which are not ONLINE |
| 498 | + my $roles = MMM::Monitor::Roles->instance(); |
| 499 | + my $agents = MMM::Monitor::Agents->instance(); |
| 500 | + my $checks = MMM::Monitor::ChecksStatus->instance(); |
| 501 | + foreach my $host (keys(%{$main::config->{host}})) { |
| 502 | + my $host_state = $agents->state($host); |
| 503 | + next if ($host_state eq 'ONLINE' || $roles->get_host_roles($host) == 0); |
| 504 | + my $agent = $agents->get($host); |
| 505 | + $roles->clear_roles($host); |
| 506 | + my $ret = $monitor->send_agent_status($host); |
| 507 | +# next if ($host_state eq 'REPLICATION_FAIL'); |
| 508 | +# next if ($host_state eq 'REPLICATION_BACKLOG'); |
| 509 | + # NOTE host_state should never be ADMIN_OFFLINE at this point |
| 510 | + if (!$ret) { |
| 511 | + ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host); |
| 512 | + $monitor->_kill_host($host, $checks->ping($host)); |
| 513 | + } |
| 514 | + } |
| 515 | + } |
| 516 | + |
| 517 | + $monitor->set_active(); |
| 518 | return 'OK: Switched into active mode.'; |
| 519 | } |
| 520 | |
| 521 | |
| 522 | +=item set_manual |
| 523 | + |
| 524 | +Switch to manual mode. |
| 525 | + |
| 526 | +=cut |
| 527 | + |
| 528 | +sub set_manual() { |
| 529 | + my $monitor = MMM::Monitor::Monitor->instance(); |
| 530 | + |
| 531 | + return 'OK: Already in manual mode.' if ($monitor->is_manual); |
| 532 | + |
| 533 | + my $old_mode = $monitor->get_mode_string(); |
| 534 | + INFO "Admin changed mode from '$old_mode' to 'MANUAL'"; |
| 535 | + |
| 536 | + if ($monitor->is_passive) { |
| 537 | + $monitor->set_manual(); # so that we can send status to agents |
| 538 | + $monitor->cleanup_and_send_status(); |
| 539 | + $monitor->passive_info(''); |
| 540 | + } |
| 541 | + |
| 542 | + $monitor->set_manual(); |
| 543 | + return 'OK: Switched into manual mode.'; |
| 544 | +} |
| 545 | + |
| 546 | + |
| 547 | =item set_passive |
| 548 | |
| 549 | Switch to passive mode. |
| 550 | @@ -396,10 +447,15 @@ |
| 551 | =cut |
| 552 | |
| 553 | sub set_passive() { |
| 554 | - return 'OK: Already in passive mode.' if (MMM::Monitor::Monitor->instance()->passive); |
| 555 | - |
| 556 | - MMM::Monitor::Monitor->instance()->passive(1); |
| 557 | - MMM::Monitor::Monitor->instance()->passive_info('Admin switched to passive mode.'); |
| 558 | + my $monitor = MMM::Monitor::Monitor->instance(); |
| 559 | + |
| 560 | + return 'OK: Already in passive mode.' if ($monitor->is_passive); |
| 561 | + |
| 562 | + my $old_mode = $monitor->get_mode_string(); |
| 563 | + INFO "Admin changed mode from '$old_mode' to 'PASSIVE'"; |
| 564 | + |
| 565 | + $monitor->set_passive(); |
| 566 | + $monitor->passive_info('Admin switched to passive mode.'); |
| 567 | return 'OK: Switched into passive mode.'; |
| 568 | } |
| 569 | |
| 570 | @@ -413,6 +469,7 @@ |
| 571 | set_offline <host> - set host <host> offline |
| 572 | mode - print current mode. |
| 573 | set_active - switch into active mode. |
| 574 | + set_manual - switch into manual mode. |
| 575 | set_passive - switch into passive mode. |
| 576 | move_role [--force] <role> <host> - move exclusive role <role> to host <host> |
| 577 | (Only use --force if you know what you are doing!) |
| 578 | |
| 579 | === modified file 'lib/Monitor/Monitor.pm' |
| 580 | --- lib/Monitor/Monitor.pm 2010-02-11 01:05:09 +0000 |
| 581 | +++ lib/Monitor/Monitor.pm 2010-03-09 10:25:26 +0000 |
| 582 | @@ -19,6 +19,7 @@ |
| 583 | use MMM::Monitor::NetworkChecker; |
| 584 | use MMM::Monitor::Role; |
| 585 | use MMM::Monitor::Roles; |
| 586 | +use MMM::Monitor::StartupStatus; |
| 587 | |
| 588 | =head1 NAME |
| 589 | |
| 590 | @@ -28,6 +29,11 @@ |
| 591 | |
| 592 | our $VERSION = '0.01'; |
| 593 | |
| 594 | +use constant MMM_MONITOR_MODE_PASSIVE => 0; |
| 595 | +use constant MMM_MONITOR_MODE_ACTIVE => 1; |
| 596 | +use constant MMM_MONITOR_MODE_MANUAL => 2; |
| 597 | +use constant MMM_MONITOR_MODE_WAIT => 3; |
| 598 | + |
| 599 | use Class::Struct; |
| 600 | |
| 601 | sub instance() { |
| 602 | @@ -40,12 +46,13 @@ |
| 603 | command_queue => 'Thread::Queue', |
| 604 | result_queue => 'Thread::Queue', |
| 605 | roles => 'MMM::Monitor::Roles', |
| 606 | - passive => '$', |
| 607 | + mode => '$', |
| 608 | passive_info => '$', |
| 609 | kill_host_bin => '$' |
| 610 | }; |
| 611 | |
| 612 | |
| 613 | + |
| 614 | =head1 FUNCTIONS |
| 615 | |
| 616 | =over 4 |
| 617 | @@ -59,6 +66,24 @@ |
| 618 | sub init($) { |
| 619 | my $self = shift; |
| 620 | |
| 621 | + #___________________________________________________________________________ |
| 622 | + # |
| 623 | + # Wait until network connection is available |
| 624 | + #___________________________________________________________________________ |
| 625 | + |
| 626 | + INFO "Waiting for network connection..."; |
| 627 | + unless (MMM::Monitor::NetworkChecker->wait_for_network()) { |
| 628 | + INFO "Received shutdown request while waiting for network connection."; |
| 629 | + return 0; |
| 630 | + } |
| 631 | + INFO "Network connection is available."; |
| 632 | + |
| 633 | + |
| 634 | + #___________________________________________________________________________ |
| 635 | + # |
| 636 | + # Create thread queues and other stuff... |
| 637 | + #___________________________________________________________________________ |
| 638 | + |
| 639 | my $agents = MMM::Monitor::Agents->instance(); |
| 640 | |
| 641 | $self->checker_queue(new Thread::Queue::); |
| 642 | @@ -68,6 +93,23 @@ |
| 643 | $self->roles(MMM::Monitor::Roles->instance()); |
| 644 | $self->passive_info(''); |
| 645 | |
| 646 | + if ($main::config->{monitor}->{mode} eq 'active') { |
| 647 | + $self->mode(MMM_MONITOR_MODE_ACTIVE); |
| 648 | + } |
| 649 | + elsif ($main::config->{monitor}->{mode} eq 'manual') { |
| 650 | + $self->mode(MMM_MONITOR_MODE_MANUAL); |
| 651 | + } |
| 652 | + elsif ($main::config->{monitor}->{mode} eq 'wait') { |
| 653 | + $self->mode(MMM_MONITOR_MODE_WAIT); |
| 654 | + } |
| 655 | + elsif ($main::config->{monitor}->{mode} eq 'passive') { |
| 656 | + $self->mode(MMM_MONITOR_MODE_PASSIVE); |
| 657 | + $self->passive_info('Configured to start up in passive mode.'); |
| 658 | + } |
| 659 | + else { |
| 660 | + LOGDIE "Something very, very strange just happend - dieing..." |
| 661 | + } |
| 662 | + |
| 663 | |
| 664 | #___________________________________________________________________________ |
| 665 | # |
| 666 | @@ -89,14 +131,6 @@ |
| 667 | |
| 668 | my $checks = $self->checks_status; |
| 669 | |
| 670 | - #___________________________________________________________________________ |
| 671 | - # |
| 672 | - # Go into passive mode if we have no network connection at startup |
| 673 | - #___________________________________________________________________________ |
| 674 | - |
| 675 | - $self->passive(!$main::have_net); |
| 676 | - $self->passive_info('No network connection during startup.') unless ($main::have_net); |
| 677 | - |
| 678 | |
| 679 | #___________________________________________________________________________ |
| 680 | # |
| 681 | @@ -108,21 +142,21 @@ |
| 682 | |
| 683 | #___________________________________________________________________________ |
| 684 | # |
| 685 | - # Figure out current status. Go into passive mode if there are discrepancies |
| 686 | + # Fetch stored status, agent status and system status |
| 687 | #___________________________________________________________________________ |
| 688 | |
| 689 | - $agents->load_status(); |
| 690 | - |
| 691 | - my $system_status = {}; |
| 692 | - my $agent_status = {}; |
| 693 | - my $status = 1; |
| 694 | + $agents->load_status(); # load stored status |
| 695 | + |
| 696 | + |
| 697 | + my $startup_status = new MMM::Monitor::StartupStatus; |
| 698 | + |
| 699 | my $res; |
| 700 | |
| 701 | foreach my $host (keys(%{$main::config->{host}})) { |
| 702 | |
| 703 | my $agent = $agents->get($host); |
| 704 | - my $host_status = 1; |
| 705 | |
| 706 | + $startup_status->set_stored_status($host, $agent->state, $agent->roles); |
| 707 | |
| 708 | #_______________________________________________________________________ |
| 709 | # |
| 710 | @@ -132,28 +166,23 @@ |
| 711 | $res = $agent->cmd_get_agent_status(2); |
| 712 | |
| 713 | if ($res =~ /^OK/) { |
| 714 | - |
| 715 | my ($msg, $state, $roles_str, $master) = split('\|', $res); |
| 716 | my @roles_str_arr = sort(split(/\,/, $roles_str)); |
| 717 | my @roles; |
| 718 | |
| 719 | foreach my $role_str (@roles_str_arr) { |
| 720 | my $role = MMM::Monitor::Role->from_string($role_str); |
| 721 | - if (defined($role)) { |
| 722 | - push @roles, $role; |
| 723 | - } |
| 724 | + push(@roles, $role) if (defined($role)); |
| 725 | } |
| 726 | |
| 727 | - $agent_status->{$host} = { state => $state, roles => \@roles, master => $master }; |
| 728 | + $startup_status->set_agent_status($host, $state, \@roles, $master); |
| 729 | } |
| 730 | elsif ($agent->state ne 'ADMIN_OFFLINE') { |
| 731 | if ($checks->ping($host) && $checks->mysql($host) && !$agent->agent_down()) { |
| 732 | ERROR "Can't reach agent on host '$host'"; |
| 733 | $agent->agent_down(1); |
| 734 | } |
| 735 | - ERROR "Switching to passive mode: The status of the agent on host '$host' could not be determined (answer was: $res)."; |
| 736 | - $status = 0; |
| 737 | - $host_status = 0; |
| 738 | + ERROR "The status of the agent on host '$host' could not be determined (answer was: $res)."; |
| 739 | } |
| 740 | |
| 741 | |
| 742 | @@ -163,180 +192,61 @@ |
| 743 | #_______________________________________________________________________ |
| 744 | |
| 745 | $res = $agent->cmd_get_system_status(2); |
| 746 | + |
| 747 | if ($res =~ /^OK/) { |
| 748 | - my ($msg, $writable, $roles_str) = split('\|', $res); |
| 749 | + my ($msg, $writable, $roles_str, $master_ip) = split('\|', $res); |
| 750 | my @roles_str_arr = sort(split(/\,/, $roles_str)); |
| 751 | my @roles; |
| 752 | + |
| 753 | foreach my $role_str (@roles_str_arr) { |
| 754 | my $role = MMM::Monitor::Role->from_string($role_str); |
| 755 | - if (defined($role)) { |
| 756 | - push @roles, $role; |
| 757 | + push(@roles, $role) if (defined($role)); |
| 758 | + } |
| 759 | + |
| 760 | + my $master = ''; |
| 761 | + if (defined($master_ip)) { |
| 762 | + foreach my $a_host (keys(%{$main::config->{host}})) { |
| 763 | + $master = $a_host if ($main::config->{host}->{$a_host}->{ip} eq $master_ip); |
| 764 | } |
| 765 | } |
| 766 | - $system_status->{$host} = { |
| 767 | - writable => $writable, |
| 768 | - roles => \@roles |
| 769 | - }; |
| 770 | + $startup_status->set_system_status($host, $writable, \@roles, $master); |
| 771 | } |
| 772 | elsif ($agent->state ne 'ADMIN_OFFLINE') { |
| 773 | if ($checks->ping($host) && $checks->mysql($host) && !$agent->agent_down()) { |
| 774 | ERROR "Can't reach agent on host '$host'"; |
| 775 | $agent->agent_down(1); |
| 776 | } |
| 777 | - ERROR "Switching to passive mode: The status of the system '$host' could not be determined (answer was: $res)."; |
| 778 | - $status = 0; |
| 779 | - $host_status = 0; |
| 780 | - |
| 781 | - } |
| 782 | - |
| 783 | - |
| 784 | - #_______________________________________________________________________ |
| 785 | - # |
| 786 | - # Skip comparison, if we coult not fetch AGENT/SYSTEM status |
| 787 | - #_______________________________________________________________________ |
| 788 | - |
| 789 | - next unless (defined($agent_status->{$host})); |
| 790 | - next unless (defined($system_status->{$host})); |
| 791 | - |
| 792 | - |
| 793 | - #_______________________________________________________________________ |
| 794 | - # |
| 795 | - # Compare agent and system status ... |
| 796 | - #_______________________________________________________________________ |
| 797 | - |
| 798 | - if ($agent_status->{$host}->{state} ne 'UNKNOWN' && $agent_status->{$host}->{state} ne $agent->state) { |
| 799 | - ERROR "Switching to passive mode: Agent state '", $agent_status->{$host}->{state}, "' differs from stored one '", $agent->state, "' for host '$host'."; |
| 800 | - $status = 0; |
| 801 | - $host_status = 0; |
| 802 | - next; |
| 803 | - } |
| 804 | - |
| 805 | - |
| 806 | - #_______________________________________________________________________ |
| 807 | - # |
| 808 | - # ... determine if roles differ |
| 809 | - #_______________________________________________________________________ |
| 810 | - |
| 811 | - my $changes = 0; |
| 812 | - my $diff = new Algorithm::Diff:: ( |
| 813 | - $system_status->{$host}->{roles}, |
| 814 | - $agent->roles, |
| 815 | - { keyGen => \&MMM::Common::Role::to_string } |
| 816 | - ); |
| 817 | - |
| 818 | - while ($diff->Next) { |
| 819 | - next if ($diff->Same); |
| 820 | - |
| 821 | - ERROR sprintf( |
| 822 | - "Switching to passive mode: Roles of host '$host' [%s] differ from stored ones [%s]", |
| 823 | - join(', ', @{$system_status->{$host}->{roles}}), |
| 824 | - join(', ', @{$agent->roles}) |
| 825 | - ); |
| 826 | - $status = 0; |
| 827 | - $host_status = 0; |
| 828 | - last; |
| 829 | - } |
| 830 | - |
| 831 | - next unless ($host_status); |
| 832 | - foreach my $role (@{$agent->roles}) { |
| 833 | - next unless ($self->roles->is_active_master_role($role->name)); |
| 834 | - next if ($system_status->{$host}->{writable}); |
| 835 | - WARN "Active master $host was not writable at monitor startup. (Don't mind, the host will be made writable soon)" |
| 836 | - } |
| 837 | - |
| 838 | - } |
| 839 | - |
| 840 | - DEBUG "STATE INFO\n", Data::Dumper->Dump([$agents, $agent_status, $system_status], ['Stored status', 'Agent status', 'System status']); |
| 841 | - |
| 842 | - |
| 843 | - #___________________________________________________________________________ |
| 844 | - # |
| 845 | - # Maybe switch into passive mode? |
| 846 | - #___________________________________________________________________________ |
| 847 | - |
| 848 | - unless ($status) { |
| 849 | - # Enter PASSIVE MODE |
| 850 | - $self->passive(1); |
| 851 | - my $agent_status_str = ''; |
| 852 | - foreach my $host (sort(keys(%{$agent_status}))) { |
| 853 | - $agent_status_str .= sprintf( |
| 854 | - " %s %s. Roles: %s. Master: %s\n", |
| 855 | - $host, |
| 856 | - $agent_status->{$host}->{state}, |
| 857 | - scalar(@{$agent_status->{$host}->{roles}}) > 0 ? join(', ', sort(@{$agent_status->{$host}->{roles}})) : 'none', |
| 858 | - $agent_status->{$host}->{master} ? $agent_status->{$host}->{master} : '?' |
| 859 | - ); |
| 860 | - } |
| 861 | - my $system_status_str = ''; |
| 862 | - foreach my $host (sort(keys(%{$system_status}))) { |
| 863 | - $system_status_str .= sprintf( |
| 864 | - " %s %s. Roles: %s\n", |
| 865 | - $host, |
| 866 | - $system_status->{$host}->{writable} ? 'writable' : 'readonly', |
| 867 | - scalar(@{$system_status->{$host}->{roles}}) > 0 ? join(', ', sort(@{$system_status->{$host}->{roles}})) : 'none' |
| 868 | - ); |
| 869 | - } |
| 870 | - my $status_str = sprintf("\nStored status:\n%s\nAgent status:\n%s\nSystem status:\n%s", $agents->get_status_info(), $agent_status_str, $system_status_str); |
| 871 | - $self->passive_info("Discrepancies between stored status, agent status and system status during startup.\n" . $status_str); |
| 872 | - FATAL "Switching to passive mode now. See output of 'mmm_control show' for details."; |
| 873 | - INFO $status_str; |
| 874 | - |
| 875 | - foreach my $host (keys(%{$main::config->{host}})) { |
| 876 | - my $agent = $agents->get($host); |
| 877 | - |
| 878 | - # Set all unknown hosts to AWAITING_RECOVERY |
| 879 | - $agent->state('AWAITING_RECOVERY') if ($agent->state eq 'UNKNOWN'); |
| 880 | - |
| 881 | - next unless ($system_status->{$host}); |
| 882 | - next unless (scalar(@{$system_status->{$host}->{roles}})); |
| 883 | - # Set status restored from agent systems |
| 884 | - $agent->state('ONLINE'); |
| 885 | - foreach my $role (@{$system_status->{$host}->{roles}}) { |
| 886 | - next unless ($self->roles->exists_ip($role->name, $role->ip)); |
| 887 | - next unless ($self->roles->can_handle($role->name, $host)); |
| 888 | - $self->roles->set_role($role->name, $role->ip, $host); |
| 889 | - } |
| 890 | - } |
| 891 | - |
| 892 | - # propagate roles to agent objects |
| 893 | - foreach my $host (keys(%{$main::config->{host}})) { |
| 894 | - my $agent = $agents->get($host); |
| 895 | - my @roles = sort($self->roles->get_host_roles($host)); |
| 896 | - $agent->roles(\@roles); |
| 897 | - } |
| 898 | - |
| 899 | - WARN "Monitor started in passive mode."; |
| 900 | - |
| 901 | - return; |
| 902 | - } |
| 903 | - |
| 904 | - # Stay in ACTIVE MODE |
| 905 | - # Everything is okay, apply roles from status file. |
| 906 | - foreach my $host (keys(%{$main::config->{host}})) { |
| 907 | + ERROR "The status of the system '$host' could not be determined (answer was: $res)."; |
| 908 | + } |
| 909 | + } |
| 910 | + |
| 911 | + my $conflict = $startup_status->determine_status(); |
| 912 | + |
| 913 | + DEBUG "STATE INFO\n", Data::Dumper->Dump([$startup_status], ['Startup status']); |
| 914 | + INFO $startup_status->to_string(); |
| 915 | + |
| 916 | + foreach my $host (keys(%{$startup_status->{result}})) { |
| 917 | my $agent = $agents->get($host); |
| 918 | - |
| 919 | - # Set new hosts to AWAITING_RECOVERY |
| 920 | - if ($agent->state eq 'UNKNOWN') { |
| 921 | - WARN "Detected new host '$host': Setting its initial state to 'AWAITING_RECOVERY'. Use 'mmm_control set_online $host' to switch it online."; |
| 922 | - $agent->state('AWAITING_RECOVERY'); |
| 923 | - } |
| 924 | - |
| 925 | - # Apply roles loaded from status file |
| 926 | - foreach my $role (@{$agent->roles}) { |
| 927 | - unless ($self->roles->exists_ip($role->name, $role->ip)) { |
| 928 | - WARN "Detected change in role definitions: Role '$role' was removed."; |
| 929 | - next; |
| 930 | - } |
| 931 | - unless ($self->roles->can_handle($role->name, $host)) { |
| 932 | - WARN "Detected change in role definitions: Host '$host' can't handle role '$role' anymore."; |
| 933 | - next; |
| 934 | - } |
| 935 | + $agent->state($startup_status->{result}->{$host}->{state}); |
| 936 | + foreach my $role (@{$startup_status->{result}->{$host}->{roles}}) { |
| 937 | $self->roles->set_role($role->name, $role->ip, $host); |
| 938 | } |
| 939 | } |
| 940 | |
| 941 | - INFO "Monitor started in active mode." unless ($self->passive); |
| 942 | - WARN "Monitor started in passive mode." if ($self->passive); |
| 943 | + if ($conflict && $main::config->{monitor}->{careful_startup}) { |
| 944 | + $self->set_passive(); |
| 945 | + $self->passive_info("Conflicting roles during startup:\n\n" . $startup_status->to_string()); |
| 946 | + } |
| 947 | + elsif (!$self->is_passive) { |
| 948 | + $self->cleanup_and_send_status(); |
| 949 | + } |
| 950 | + |
| 951 | + INFO "Monitor started in active mode." if ($self->mode == MMM_MONITOR_MODE_ACTIVE); |
| 952 | + INFO "Monitor started in manual mode." if ($self->mode == MMM_MONITOR_MODE_MANUAL); |
| 953 | + INFO "Monitor started in wait mode." if ($self->mode == MMM_MONITOR_MODE_WAIT); |
| 954 | + INFO "Monitor started in passive mode." if ($self->mode == MMM_MONITOR_MODE_PASSIVE); |
| 955 | + |
| 956 | + return 1; |
| 957 | } |
| 958 | |
| 959 | sub check_master_configuration($) { |
| 960 | @@ -507,7 +417,7 @@ |
| 961 | |
| 962 | foreach my $host (keys(%{$main::config->{host}})) { |
| 963 | |
| 964 | - $agents->save_status() unless ($self->passive); |
| 965 | + $agents->save_status() unless ($self->is_passive); |
| 966 | |
| 967 | my $agent = $agents->get($host); |
| 968 | my $state = $agent->state; |
| 969 | @@ -539,7 +449,8 @@ |
| 970 | unless ($ping && $mysql) { |
| 971 | FATAL sprintf("State of host '%s' changed from %s to HARD_OFFLINE (ping: %s, mysql: %s)", $host, $state, ($ping? 'OK' : 'not OK'), ($mysql? 'OK' : 'not OK')); |
| 972 | $agent->state('HARD_OFFLINE'); |
| 973 | - $self->roles->clear_host_roles($host); |
| 974 | + next if ($self->is_manual); |
| 975 | + $self->roles->clear_roles($host); |
| 976 | if (!$self->send_agent_status($host)) { |
| 977 | ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host); |
| 978 | $self->_kill_host($host, $checks->ping($host)); |
| 979 | @@ -557,8 +468,12 @@ |
| 980 | if ($ping && $mysql && !$rep_threads && $peer_state eq 'ONLINE' && $checks->ping($peer) && $checks->mysql($peer)) { |
| 981 | FATAL "State of host '$host' changed from $state to REPLICATION_FAIL"; |
| 982 | $agent->state('REPLICATION_FAIL'); |
| 983 | - $self->roles->clear_host_roles($host); |
| 984 | - $self->send_agent_status($host); |
| 985 | + next if ($self->is_manual); |
| 986 | + $self->roles->clear_roles($host); |
| 987 | + if (!$self->send_agent_status($host)) { |
| 988 | + ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host); |
| 989 | + $self->_kill_host($host, $checks->ping($host)); |
| 990 | + } |
| 991 | next; |
| 992 | } |
| 993 | |
| 994 | @@ -566,8 +481,12 @@ |
| 995 | if ($ping && $mysql && !$rep_backlog && $rep_threads && $peer_state eq 'ONLINE' && $checks->ping($peer) && $checks->mysql($peer)) { |
| 996 | FATAL "State of host '$host' changed from $state to REPLICATION_DELAY"; |
| 997 | $agent->state('REPLICATION_DELAY'); |
| 998 | - $self->roles->clear_host_roles($host); |
| 999 | - $self->send_agent_status($host); |
| 1000 | + next if ($self->is_manual); |
| 1001 | + $self->roles->clear_roles($host); |
| 1002 | + if (!$self->send_agent_status($host)) { |
| 1003 | + ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host); |
| 1004 | + $self->_kill_host($host, $checks->ping($host)); |
| 1005 | + } |
| 1006 | next; |
| 1007 | } |
| 1008 | next; |
| 1009 | @@ -711,7 +630,47 @@ |
| 1010 | next; |
| 1011 | } |
| 1012 | } |
| 1013 | - $agents->save_status() unless ($self->passive); |
| 1014 | + |
| 1015 | + if ($self->mode == MMM_MONITOR_MODE_WAIT) { |
| 1016 | + my $master_one = $self->roles->get_first_master(); |
| 1017 | + my $master_two = $self->roles->get_second_master(); |
| 1018 | + my $state_one = $agents->state($master_one); |
| 1019 | + my $state_two = $agents->state($master_two); |
| 1020 | + |
| 1021 | + if ($state_one eq 'ONLINE' && $state_two eq 'ONLINE') { |
| 1022 | + INFO "Nodes $master_one and $master_two are ONLINE, switching from mode 'WAIT' to 'ACTIVE'."; |
| 1023 | + $self->set_active(); |
| 1024 | + } |
| 1025 | + elsif ($main::config->{monitor}->{wait_for_other_master} > 0 && ($state_one eq 'ONLINE' || $state_two eq 'ONLINE')) { |
| 1026 | + my $living_master = $state_one eq 'ONLINE' ? $master_one : $master_two; |
| 1027 | + my $dead_master = $state_one eq 'ONLINE' ? $master_two : $master_one; |
| 1028 | + |
| 1029 | + if ($main::config->{monitor}->{wait_for_other_master} <= time() - $agents->online_since($living_master)) { |
| 1030 | + $self->set_active(); |
| 1031 | + WARN sprintf("Master $dead_master did not come online for %d(wait_for_other_master) seconds. Switching from mode 'WAIT' to 'ACTIVE'", $main::config->{monitor}->{wait_for_other_master}); |
| 1032 | + } |
| 1033 | + |
| 1034 | + } |
| 1035 | + if ($self->is_active) { |
| 1036 | + # cleanup |
| 1037 | + foreach my $host (keys(%{$main::config->{host}})) { |
| 1038 | + my $host_state = $agents->state($host); |
| 1039 | + next if ($host_state eq 'ONLINE' || $self->roles->get_host_roles($host) == 0); |
| 1040 | + my $agent = $agents->get($host); |
| 1041 | + $self->roles->clear_roles($host); |
| 1042 | + my $ret = $self->send_agent_status($host); |
| 1043 | +# next if ($host_state eq 'REPLICATION_FAIL'); |
| 1044 | +# next if ($host_state eq 'REPLICATION_BACKLOG'); |
| 1045 | + # NOTE host_state should never be ADMIN_OFFLINE at this point |
| 1046 | + if (!$ret) { |
| 1047 | + ERROR sprintf("Can't send offline status notification to '%s' - killing it!", $host); |
| 1048 | + $self->_kill_host($host, $checks->ping($host)); |
| 1049 | + } |
| 1050 | + } |
| 1051 | + } |
| 1052 | + } |
| 1053 | + |
| 1054 | + $agents->save_status() unless ($self->is_passive); |
| 1055 | } |
| 1056 | |
| 1057 | |
| 1058 | @@ -725,7 +684,7 @@ |
| 1059 | my $self = shift; |
| 1060 | |
| 1061 | # Never change roles if we are in PASSIVE mode |
| 1062 | - return if ($self->passive); |
| 1063 | + return if ($self->is_passive); |
| 1064 | |
| 1065 | my $old_active_master = $self->roles->get_active_master(); |
| 1066 | |
| 1067 | @@ -734,7 +693,7 @@ |
| 1068 | $self->roles->process_orphans('balanced'); |
| 1069 | |
| 1070 | # obey preferences |
| 1071 | - $self->roles->obey_preferences(); |
| 1072 | + $self->roles->obey_preferences() if ($self->is_active); |
| 1073 | |
| 1074 | # Balance roles |
| 1075 | $self->roles->balance(); |
| 1076 | @@ -749,6 +708,46 @@ |
| 1077 | } |
| 1078 | |
| 1079 | |
| 1080 | +=item cleanup_and_send_status() |
| 1081 | + |
| 1082 | +Send status information to all agents and clean up old roles. |
| 1083 | + |
| 1084 | +=cut |
| 1085 | +sub cleanup_and_send_status($) { |
| 1086 | + my $self = shift; |
| 1087 | + |
| 1088 | + my $agents = MMM::Monitor::Agents->instance(); |
| 1089 | + my $roles = MMM::Monitor::Roles->instance(); |
| 1090 | + |
| 1091 | + my $active_master = $roles->get_active_master(); |
| 1092 | + my $passive_master = $roles->get_passive_master(); |
| 1093 | + |
| 1094 | + # Notify passive master first |
| 1095 | + if ($passive_master ne '') { |
| 1096 | + my $host = $passive_master; |
| 1097 | + $self->send_agent_status($host); |
| 1098 | + my $agent = $agents->get($host); |
| 1099 | + $agent->cmd_clear_bad_roles(); # TODO check result |
| 1100 | + } |
| 1101 | + |
| 1102 | + # Notify all slave hosts |
| 1103 | + foreach my $host (keys(%{$main::config->{host}})) { |
| 1104 | + next if ($self->roles->is_master($host)); |
| 1105 | + $self->send_agent_status($host); |
| 1106 | + my $agent = $agents->get($host); |
| 1107 | + $agent->cmd_clear_bad_roles(); # TODO check result |
| 1108 | + } |
| 1109 | + |
| 1110 | + # Notify active master at the end |
| 1111 | + if ($active_master ne '') { |
| 1112 | + my $host = $active_master; |
| 1113 | + $self->send_agent_status($host); |
| 1114 | + my $agent = $agents->get($host); |
| 1115 | + $agent->cmd_clear_bad_roles(); # TODO check result |
| 1116 | + } |
| 1117 | +} |
| 1118 | + |
| 1119 | + |
| 1120 | =item send_status_to_agents |
| 1121 | |
| 1122 | Send status information to all agents. |
| 1123 | @@ -797,7 +796,7 @@ |
| 1124 | |
| 1125 | # Never send anything to agents if we are in PASSIVE mode |
| 1126 | # Never send anything to agents if we have no network connection |
| 1127 | - return if ($self->passive || !$main::have_net); |
| 1128 | + return if ($self->is_passive || !$main::have_net); |
| 1129 | |
| 1130 | # Determine active master if it was not passed |
| 1131 | $master = $self->roles->get_active_master() unless (defined($master)); |
| 1132 | @@ -903,6 +902,7 @@ |
| 1133 | elsif ($command eq 'mode' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::mode(); } |
| 1134 | elsif ($command eq 'set_active' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::set_active(); } |
| 1135 | elsif ($command eq 'set_passive' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::set_passive(); } |
| 1136 | + elsif ($command eq 'set_manual' && $arg_cnt == 0) { $res = MMM::Monitor::Commands::set_manual(); } |
| 1137 | elsif ($command eq 'set_online' && $arg_cnt == 1) { $res = MMM::Monitor::Commands::set_online ($args[0]); } |
| 1138 | elsif ($command eq 'set_offline' && $arg_cnt == 1) { $res = MMM::Monitor::Commands::set_offline($args[0]); } |
| 1139 | elsif ($command eq 'move_role' && $arg_cnt == 2) { $res = MMM::Monitor::Commands::move_role($args[0], $args[1]); } |
| 1140 | @@ -917,5 +917,93 @@ |
| 1141 | } |
| 1142 | } |
| 1143 | |
| 1144 | + |
| 1145 | +=item is_active() |
| 1146 | + |
| 1147 | +Check if monitor is in active mode |
| 1148 | + |
| 1149 | +=cut |
| 1150 | + |
| 1151 | +sub is_active($$) { |
| 1152 | + my $self = shift; |
| 1153 | + return ($self->mode == MMM_MONITOR_MODE_ACTIVE); |
| 1154 | +} |
| 1155 | + |
| 1156 | + |
| 1157 | +=item is_manual() |
| 1158 | + |
| 1159 | +Check if monitor is in manual mode |
| 1160 | + |
| 1161 | +=cut |
| 1162 | + |
| 1163 | +sub is_manual($$) { |
| 1164 | + my $self = shift; |
| 1165 | + return ($self->mode == MMM_MONITOR_MODE_MANUAL || $self->mode == MMM_MONITOR_MODE_WAIT); |
| 1166 | +} |
| 1167 | + |
| 1168 | + |
| 1169 | +=item is_passive() |
| 1170 | + |
| 1171 | +Check if monitor is in passive mode |
| 1172 | + |
| 1173 | +=cut |
| 1174 | + |
| 1175 | +sub is_passive($$) { |
| 1176 | + my $self = shift; |
| 1177 | + return ($self->mode == MMM_MONITOR_MODE_PASSIVE); |
| 1178 | +} |
| 1179 | + |
| 1180 | + |
| 1181 | +=item set_active() |
| 1182 | + |
| 1183 | +Set mode to active |
| 1184 | + |
| 1185 | +=cut |
| 1186 | + |
| 1187 | +sub set_active($$) { |
| 1188 | + my $self = shift; |
| 1189 | + $self->mode(MMM_MONITOR_MODE_ACTIVE); |
| 1190 | +} |
| 1191 | + |
| 1192 | + |
| 1193 | +=item set_manual() |
| 1194 | + |
| 1195 | +Set mode to manual |
| 1196 | + |
| 1197 | +=cut |
| 1198 | + |
| 1199 | +sub set_manual($$) { |
| 1200 | + my $self = shift; |
| 1201 | + $self->mode(MMM_MONITOR_MODE_MANUAL); |
| 1202 | +} |
| 1203 | + |
| 1204 | + |
| 1205 | +=item set_passive() |
| 1206 | + |
| 1207 | +Set mode to passive |
| 1208 | + |
| 1209 | +=cut |
| 1210 | + |
| 1211 | +sub set_passive($$) { |
| 1212 | + my $self = shift; |
| 1213 | + $self->mode(MMM_MONITOR_MODE_PASSIVE); |
| 1214 | +} |
| 1215 | + |
| 1216 | + |
| 1217 | +=item get_mode_string() |
| 1218 | + |
| 1219 | +Get string representation of current mode |
| 1220 | + |
| 1221 | +=cut |
| 1222 | + |
| 1223 | +sub get_mode_string($) { |
| 1224 | + my $self = shift; |
| 1225 | + return 'ACTIVE' if ($self->mode == MMM_MONITOR_MODE_ACTIVE); |
| 1226 | + return 'MANUAL' if ($self->mode == MMM_MONITOR_MODE_MANUAL); |
| 1227 | + return 'WAIT' if ($self->mode == MMM_MONITOR_MODE_WAIT); |
| 1228 | + return 'PASSIVE' if ($self->mode == MMM_MONITOR_MODE_PASSIVE); |
| 1229 | + return 'UNKNOWN'; # should never happen |
| 1230 | +} |
| 1231 | + |
| 1232 | 1; |
| 1233 | |
| 1234 | |
| 1235 | === modified file 'lib/Monitor/NetworkChecker.pm' |
| 1236 | --- lib/Monitor/NetworkChecker.pm 2009-02-10 08:18:57 +0000 |
| 1237 | +++ lib/Monitor/NetworkChecker.pm 2010-03-09 10:25:26 +0000 |
| 1238 | @@ -54,29 +54,32 @@ |
| 1239 | $checker->shutdown(); |
| 1240 | } |
| 1241 | |
| 1242 | -sub initial_check() { |
| 1243 | +sub wait_for_network() { |
| 1244 | my @ips = @{$main::config->{monitor}->{ping_ips}}; |
| 1245 | - my $state = 0; |
| 1246 | |
| 1247 | # Create checker |
| 1248 | my $checker = new MMM::Monitor::Checker::('ping_ip'); |
| 1249 | |
| 1250 | - # Ping all ips |
| 1251 | - foreach my $ip (@ips) { |
| 1252 | - # Ping checker |
| 1253 | - $checker->spawn() unless $checker->ping(); |
| 1254 | - |
| 1255 | - my $res = $checker->check($ip); |
| 1256 | - if ($res =~ /^OK/) { |
| 1257 | - DEBUG "IP '$ip' is reachable: $res"; |
| 1258 | - $state = 1; |
| 1259 | - last; |
| 1260 | + while (!$main::shutdown) { |
| 1261 | + # Ping all ips |
| 1262 | + foreach my $ip (@ips) { |
| 1263 | + last if ($main::shutdown); |
| 1264 | + # Ping checker |
| 1265 | + $checker->spawn() unless $checker->ping(); |
| 1266 | + |
| 1267 | + my $res = $checker->check($ip); |
| 1268 | + if ($res =~ /^OK/) { |
| 1269 | + DEBUG "IP '$ip' is reachable: $res"; |
| 1270 | + $checker->shutdown(); |
| 1271 | + return 1; |
| 1272 | + } |
| 1273 | } |
| 1274 | - DEBUG "IP '$ip' is not reachable: $res"; |
| 1275 | + |
| 1276 | + # Sleep a while before checking every ip again |
| 1277 | + sleep($main::config->{monitor}->{ping_interval}); |
| 1278 | } |
| 1279 | $checker->shutdown(); |
| 1280 | - |
| 1281 | - return $state; |
| 1282 | + return 0; |
| 1283 | } |
| 1284 | |
| 1285 | 1; |
| 1286 | |
| 1287 | === modified file 'lib/Monitor/Roles.pm' |
| 1288 | --- lib/Monitor/Roles.pm 2009-10-29 15:27:32 +0000 |
| 1289 | +++ lib/Monitor/Roles.pm 2010-03-09 10:25:26 +0000 |
| 1290 | @@ -112,6 +112,29 @@ |
| 1291 | } |
| 1292 | |
| 1293 | |
| 1294 | +=item host_has_roles($host) |
| 1295 | + |
| 1296 | +Check whether there are roles assigned to host $host |
| 1297 | + |
| 1298 | +=cut |
| 1299 | + |
| 1300 | +sub host_has_roles($$) { |
| 1301 | + my $self = shift; |
| 1302 | + my $host = shift; |
| 1303 | + |
| 1304 | + return 0 unless (defined($host)); |
| 1305 | + |
| 1306 | + foreach my $role (keys(%$self)) { |
| 1307 | + my $role_info = $self->{$role}; |
| 1308 | + foreach my $ip (keys(%{$role_info->{ips}})) { |
| 1309 | + my $ip_info = $role_info->{ips}->{$ip}; |
| 1310 | + return 1 if ($ip_info->{assigned_to} eq $host); |
| 1311 | + } |
| 1312 | + } |
| 1313 | + return 0; |
| 1314 | +} |
| 1315 | + |
| 1316 | + |
| 1317 | =item count_host_roles($host) |
| 1318 | |
| 1319 | Count all roles assigned to host $host |
| 1320 | @@ -155,6 +178,74 @@ |
| 1321 | } |
| 1322 | |
| 1323 | |
| 1324 | +=item get_passive_master |
| 1325 | + |
| 1326 | +Get the passive master |
| 1327 | + |
| 1328 | +=cut |
| 1329 | + |
| 1330 | +sub get_passive_master($) { |
| 1331 | + my $self = shift; |
| 1332 | + |
| 1333 | + my $role = $self->{$main::config->{active_master_role}}; |
| 1334 | + my $active_master = $self->get_active_master(); |
| 1335 | + return '' unless $role; |
| 1336 | + return '' unless $active_master; |
| 1337 | + |
| 1338 | + foreach my $host ( @{ $role->{hosts} } ) { |
| 1339 | + return $host if ($host ne $active_master); |
| 1340 | + } |
| 1341 | + return ''; |
| 1342 | +} |
| 1343 | + |
| 1344 | + |
| 1345 | +=item get_first_master |
| 1346 | + |
| 1347 | +Get the first master |
| 1348 | + |
| 1349 | +=cut |
| 1350 | + |
| 1351 | +sub get_first_master($) { |
| 1352 | + my $self = shift; |
| 1353 | + |
| 1354 | + my $role = $self->{$main::config->{active_master_role}}; |
| 1355 | + return '' unless $role; |
| 1356 | + return '' unless $role->{hosts}[0]; |
| 1357 | + return $role->{hosts}[0]; |
| 1358 | +} |
| 1359 | + |
| 1360 | + |
| 1361 | +=item get_second_master |
| 1362 | + |
| 1363 | +Get the second master |
| 1364 | + |
| 1365 | +=cut |
| 1366 | + |
| 1367 | +sub get_second_master($) { |
| 1368 | + my $self = shift; |
| 1369 | + |
| 1370 | + my $role = $self->{$main::config->{active_master_role}}; |
| 1371 | + return '' unless $role; |
| 1372 | + return '' unless $role->{hosts}[1]; |
| 1373 | + return $role->{hosts}[1]; |
| 1374 | +} |
| 1375 | + |
| 1376 | + |
| 1377 | +=item get_master_hosts |
| 1378 | + |
| 1379 | +Get the hosts which can handle the active master-role |
| 1380 | + |
| 1381 | +=cut |
| 1382 | + |
| 1383 | +sub get_master_hosts($) { |
| 1384 | + my $self = shift; |
| 1385 | + |
| 1386 | + my $role = $self->{$main::config->{active_master_role}}; |
| 1387 | + return '' unless $role; |
| 1388 | + return $self->{$role}->{hosts}; |
| 1389 | +} |
| 1390 | + |
| 1391 | + |
| 1392 | =item get_exclusive_role_owner($role) |
| 1393 | |
| 1394 | Get the host which has the exclusive role $role assigned |
| 1395 | @@ -211,13 +302,13 @@ |
| 1396 | } |
| 1397 | |
| 1398 | |
| 1399 | -=item clear_host_roles($host) |
| 1400 | +=item clear_roles($host) |
| 1401 | |
| 1402 | Remove all roles from host $host. |
| 1403 | |
| 1404 | =cut |
| 1405 | |
| 1406 | -sub clear_host_roles($$) { |
| 1407 | +sub clear_roles($$) { |
| 1408 | my $self = shift; |
| 1409 | my $host = shift; |
| 1410 | |
| 1411 | @@ -238,6 +329,34 @@ |
| 1412 | } |
| 1413 | |
| 1414 | |
| 1415 | +=item clear_balanced_role($host, $role) |
| 1416 | + |
| 1417 | +Remove balanced role $role from host $host. |
| 1418 | + |
| 1419 | +=cut |
| 1420 | + |
| 1421 | +sub clear_balanced_role($$$) { |
| 1422 | + my $self = shift; |
| 1423 | + my $host = shift; |
| 1424 | + my $role = shift; |
| 1425 | + |
| 1426 | + INFO "Removing balanced role $role from host '$host':"; |
| 1427 | + |
| 1428 | + my $role_info = $self->{$role}; |
| 1429 | + return 0 unless $role_info; |
| 1430 | + my $cnt = 0; |
| 1431 | + next unless ($role_info->{mode} eq 'balanced'); |
| 1432 | + foreach my $ip (keys(%{$role_info->{ips}})) { |
| 1433 | + my $ip_info = $role_info->{ips}->{$ip}; |
| 1434 | + next unless ($ip_info->{assigned_to} eq $host); |
| 1435 | + $cnt++; |
| 1436 | + INFO " Removed role '$role($ip)' from host '$host'"; |
| 1437 | + $ip_info->{assigned_to} = ''; |
| 1438 | + } |
| 1439 | + return $cnt; |
| 1440 | +} |
| 1441 | + |
| 1442 | + |
| 1443 | =item find_eligible_host($role) |
| 1444 | |
| 1445 | find host which can take over the role $role |
| 1446 | @@ -562,6 +681,21 @@ |
| 1447 | } |
| 1448 | |
| 1449 | |
| 1450 | +=item is_master($host) |
| 1451 | + |
| 1452 | +Check if host $host can handle role $role. |
| 1453 | + |
| 1454 | +=cut |
| 1455 | + |
| 1456 | +sub is_master($$) { |
| 1457 | + my $self = shift; |
| 1458 | + my $host = shift; |
| 1459 | + my $role = $self->{$main::config->{active_master_role}}; |
| 1460 | + return 0 unless defined($role); |
| 1461 | + return grep({$_ eq $host} @{$role->{hosts}}); |
| 1462 | +} |
| 1463 | + |
| 1464 | + |
| 1465 | =item is_active_master_role($role) |
| 1466 | |
| 1467 | Check whether $role is the active master role. |
| 1468 | |
| 1469 | === added file 'lib/Monitor/StartupStatus.pm' |
| 1470 | --- lib/Monitor/StartupStatus.pm 1970-01-01 00:00:00 +0000 |
| 1471 | +++ lib/Monitor/StartupStatus.pm 2010-03-09 10:25:26 +0000 |
| 1472 | @@ -0,0 +1,298 @@ |
| 1473 | +package MMM::Monitor::StartupStatus; |
| 1474 | + |
| 1475 | +use strict; |
| 1476 | +use warnings FATAL => 'all'; |
| 1477 | +use List::Util qw(max); |
| 1478 | +use Log::Log4perl qw(:easy); |
| 1479 | +use MMM::Common::Role; |
| 1480 | +use MMM::Monitor::Role; |
| 1481 | +use MMM::Monitor::Roles; |
| 1482 | + |
| 1483 | +our $VERSION = '0.01'; |
| 1484 | + |
| 1485 | +=head1 NAME |
| 1486 | + |
| 1487 | +MMM::Monitor::StartupStatus - holds information about agent/system/stored status during startup |
| 1488 | + |
| 1489 | +=cut |
| 1490 | + |
| 1491 | +sub new($) { |
| 1492 | + my $class = shift; |
| 1493 | + |
| 1494 | + my $self = { |
| 1495 | + roles => {}, |
| 1496 | + hosts => {}, |
| 1497 | + result=> {} |
| 1498 | + }; |
| 1499 | + return bless $self, $class; |
| 1500 | +} |
| 1501 | + |
| 1502 | + |
| 1503 | +=head1 FUNCTIONS |
| 1504 | + |
| 1505 | +=over 4 |
| 1506 | + |
| 1507 | +=item set_agent_status($host, $state, $roles, $master) |
| 1508 | + |
| 1509 | +Set agent status |
| 1510 | + |
| 1511 | +=cut |
| 1512 | + |
| 1513 | +sub set_agent_status($$\@$) { |
| 1514 | + my $self = shift; |
| 1515 | + my $host = shift; |
| 1516 | + my $state = shift; |
| 1517 | + my $roles = shift; |
| 1518 | + my $master = shift; |
| 1519 | + |
| 1520 | + $self->{hosts}->{$host} = {} unless (defined($self->{hosts}->{$host})); |
| 1521 | + $self->{hosts}->{$host}->{agent} = { |
| 1522 | + state => $state, |
| 1523 | + master => $master |
| 1524 | + }; |
| 1525 | + foreach my $role (@{$roles}) { |
| 1526 | + unless (MMM::Monitor::Roles->instance()->exists_ip($role->name, $role->ip)) { |
| 1527 | + WARN "Detected change in role definitions: Role '$role' was removed."; |
| 1528 | + next; |
| 1529 | + } |
| 1530 | + unless (MMM::Monitor::Roles->instance()->can_handle($role->name, $host)) { |
| 1531 | + WARN "Detected change in role definitions: Host '$host' can't handle role '$role' anymore."; |
| 1532 | + next; |
| 1533 | + } |
| 1534 | + my $role_str = $role->to_string(); |
| 1535 | + $self->{roles}->{$role_str} = {} unless (defined($self->{roles}->{$role_str})); |
| 1536 | + $self->{roles}->{$role_str}->{$host} = {} unless (defined($self->{roles}->{$role_str}->{$host})); |
| 1537 | + $self->{roles}->{$role_str}->{$host}->{agent} = 1; |
| 1538 | + } |
| 1539 | +} |
| 1540 | + |
| 1541 | + |
| 1542 | +=item set_stored_status($host, $state, $roles) |
| 1543 | + |
| 1544 | +Set stored status |
| 1545 | + |
| 1546 | +=cut |
| 1547 | + |
| 1548 | +sub set_stored_status($$\@$) { |
| 1549 | + my $self = shift; |
| 1550 | + my $host = shift; |
| 1551 | + my $state = shift; |
| 1552 | + my $roles = shift; |
| 1553 | + |
| 1554 | + $self->{hosts}->{$host} = {} unless (defined($self->{hosts}->{$host})); |
| 1555 | + $self->{hosts}->{$host}->{stored} = { |
| 1556 | + state => $state, |
| 1557 | + }; |
| 1558 | + foreach my $role (@{$roles}) { |
| 1559 | + unless (MMM::Monitor::Roles->instance()->exists_ip($role->name, $role->ip)) { |
| 1560 | + WARN "Detected change in role definitions: Role '$role' was removed."; |
| 1561 | + next; |
| 1562 | + } |
| 1563 | + unless (MMM::Monitor::Roles->instance()->can_handle($role->name, $host)) { |
| 1564 | + WARN "Detected change in role definitions: Host '$host' can't handle role '$role' anymore."; |
| 1565 | + next; |
| 1566 | + } |
| 1567 | + my $role_str = $role->to_string(); |
| 1568 | + $self->{roles}->{$role_str} = {} unless (defined($self->{roles}->{$role_str})); |
| 1569 | + $self->{roles}->{$role_str}->{$host} = {} unless (defined($self->{roles}->{$role_str}->{$host})); |
| 1570 | + $self->{roles}->{$role_str}->{$host}->{stored} = 1; |
| 1571 | + } |
| 1572 | +} |
| 1573 | + |
| 1574 | + |
| 1575 | +=item set_system_status($host, $writable, $roles, $master) |
| 1576 | + |
| 1577 | +Set system status |
| 1578 | + |
| 1579 | +=cut |
| 1580 | + |
| 1581 | +sub set_system_status($$\@$) { |
| 1582 | + my $self = shift; |
| 1583 | + my $host = shift; |
| 1584 | + my $writable= shift; |
| 1585 | + my $roles = shift; |
| 1586 | + my $master = shift; |
| 1587 | + |
| 1588 | + $self->{hosts}->{$host} = {} unless (defined($self->{hosts}->{$host})); |
| 1589 | + $self->{hosts}->{$host}->{system} = { |
| 1590 | + writable=> $writable, |
| 1591 | + master => $master |
| 1592 | + }; |
| 1593 | + foreach my $role (@{$roles}) { |
| 1594 | + unless (MMM::Monitor::Roles->instance()->exists_ip($role->name, $role->ip)) { |
| 1595 | + WARN "Detected change in role definitions: Role '$role' was removed."; |
| 1596 | + next; |
| 1597 | + } |
| 1598 | + unless (MMM::Monitor::Roles->instance()->can_handle($role->name, $host)) { |
| 1599 | + WARN "Detected change in role definitions: Host '$host' can't handle role '$role' anymore."; |
| 1600 | + next; |
| 1601 | + } |
| 1602 | + my $role_str = $role->to_string(); |
| 1603 | + $self->{roles}->{$role_str} = {} unless (defined($self->{roles}->{$role_str})); |
| 1604 | + $self->{roles}->{$role_str}->{$host} = {} unless (defined($self->{roles}->{$role_str}->{$host})); |
| 1605 | + $self->{roles}->{$role_str}->{$host}->{system} = 1; |
| 1606 | + } |
| 1607 | +} |
| 1608 | + |
| 1609 | +sub determine_status() { |
| 1610 | + my $self = shift; |
| 1611 | + my $roles = MMM::Monitor::Roles->instance(); |
| 1612 | + |
| 1613 | + my $is_manual = MMM::Monitor::Monitor->instance()->is_manual(); |
| 1614 | + |
| 1615 | + my $conflict = 0; |
| 1616 | + |
| 1617 | + foreach my $host (keys(%{$main::config->{host}})) { |
| 1618 | + |
| 1619 | + # Figure out host state |
| 1620 | + |
| 1621 | + my $stored_state = 'UNKNOWN'; |
| 1622 | + my $agent_state = 'UNKNOWN'; |
| 1623 | + my $state; |
| 1624 | + |
| 1625 | + $stored_state = $self->{hosts}->{$host}->{stored}->{state} if (defined($self->{hosts}->{$host}->{stored}->{state})); |
| 1626 | + $agent_state = $self->{hosts}->{$host}->{agent}->{state} if (defined($self->{hosts}->{$host}->{agent}->{state} )); |
| 1627 | + |
| 1628 | + if ( $stored_state eq 'ADMIN_OFFLINE' || $agent_state eq 'ADMIN_OFFLINE' ) { $state = 'ADMIN_OFFLINE'; } |
| 1629 | + elsif ($stored_state eq 'HARD_OFFLINE' || $agent_state eq 'HARD_OFFLINE' ) { $state = 'HARD_OFFLINE'; } |
| 1630 | + elsif ($stored_state eq 'REPLICATION_FAIL' || $agent_state eq 'REPLICATION_FAIL' ) { $state = 'REPLICATION_FAIL'; } |
| 1631 | + elsif ($stored_state eq 'REPLICATION_DELAY' || $agent_state eq 'REPLICATION_DELAY') { $state = 'REPLICATION_DELAY'; } |
| 1632 | + elsif ($stored_state eq 'ONLINE' || $agent_state eq 'ONLINE' ) { $state = 'ONLINE'; } |
| 1633 | + else { $state = 'AWAITING_RECOVERY'; } |
| 1634 | + |
| 1635 | + $self->{result}->{$host} = { state => $state, roles => [] }; |
| 1636 | + } |
| 1637 | + |
| 1638 | + foreach my $role_str (keys(%{$self->{roles}})) { |
| 1639 | + my $role = MMM::Monitor::Role->from_string($role_str); |
| 1640 | + next unless(defined($role)); |
| 1641 | + |
| 1642 | + if ($roles->is_active_master_role($role->name)) { |
| 1643 | + # active master role |
| 1644 | + my $max = 0; |
| 1645 | + my $target = undef; |
| 1646 | + my $system_cnt = 0; |
| 1647 | + foreach my $host (keys(%{$self->{roles}->{$role_str}})) { |
| 1648 | + my $votes = 0; |
| 1649 | + my $info = $self->{roles}->{$role_str}->{$host}; |
| 1650 | + my $host_info = $self->{hosts}->{$host}; |
| 1651 | + |
| 1652 | + # host is writable |
| 1653 | + $votes += 4 if (defined($host_info->{system}->{writable}) && $host_info->{system}->{writable}); |
| 1654 | + |
| 1655 | + # IP is configured |
| 1656 | + if (defined($info->{system})) { |
| 1657 | + $votes += 2; |
| 1658 | + $system_cnt++; |
| 1659 | + } |
| 1660 | + |
| 1661 | + $votes += 1 if (defined($info->{stored})); |
| 1662 | + $votes += 1 if (defined($info->{agent})); |
| 1663 | + |
| 1664 | + foreach my $slave_host (keys(%{$self->{hosts}})) { |
| 1665 | + my $slave_info = $self->{hosts}->{$slave_host}; |
| 1666 | + next if MMM::Monitor::Roles->instance()->is_master($slave_host); |
| 1667 | + $votes++ if (defined($slave_info->{system}->{master}) && $slave_info->{system}->{master} eq $host); |
| 1668 | + } |
| 1669 | + |
| 1670 | + |
| 1671 | + my $state = $self->{result}->{$host}->{state}; |
| 1672 | + $votes = 0 if ($state eq 'ADMIN_OFFLINE'); |
| 1673 | + $votes = 0 if ($state eq 'HARD_OFFLINE' && !$is_manual); |
| 1674 | + |
| 1675 | + if ($votes > $max) { |
| 1676 | + $target = $host; |
| 1677 | + $max = $votes; |
| 1678 | + } |
| 1679 | + } |
| 1680 | + if ($system_cnt > 1) { |
| 1681 | + WARN "Role '$role_str' was configured on $system_cnt hosts during monitor startup."; |
| 1682 | + $conflict = 1; |
| 1683 | + } |
| 1684 | + if (defined($target)) { |
| 1685 | + push (@{$self->{result}->{$target}->{roles}}, $role); |
| 1686 | + my $state = $self->{result}->{$target}->{state}; |
| 1687 | + $self->{result}->{$target}->{state} = 'ONLINE' if (!$is_manual || $state eq 'REPLICATION_FAIL' || $state eq 'REPLICATION_DELAY'); |
| 1688 | + } |
| 1689 | + next; |
| 1690 | + } |
| 1691 | + |
| 1692 | + # Handle non-writer roles |
| 1693 | + my $max = 0; |
| 1694 | + my $target = undef; |
| 1695 | + my $system_cnt = 0; |
| 1696 | + foreach my $host (keys(%{$self->{roles}->{$role_str}})) { |
| 1697 | + my $votes = 0; |
| 1698 | + my $info = $self->{roles}->{$role_str}->{$host}; |
| 1699 | + |
| 1700 | + # IP is configured |
| 1701 | + if (defined($info->{system})) { |
| 1702 | + $votes += 4; |
| 1703 | + $system_cnt++; |
| 1704 | + } |
| 1705 | + |
| 1706 | + $votes += 2 if (defined($info->{stored})); |
| 1707 | + $votes += 1 if (defined($info->{agent})); |
| 1708 | + |
| 1709 | + |
| 1710 | + my $state = $self->{result}->{$host}->{state}; |
| 1711 | + if ($state eq 'ADMIN_OFFLINE' || (!$is_manual && $state ne 'ONLINE' && $state ne 'AWAITING_RECOVERY')) { |
| 1712 | + $votes = 0; |
| 1713 | + } |
| 1714 | + if ($votes > $max) { |
| 1715 | + $target = $host; |
| 1716 | + $max = $votes; |
| 1717 | + } |
| 1718 | + } |
| 1719 | + if ($system_cnt > 1) { |
| 1720 | + WARN "Role '$role_str' was configured on $system_cnt hosts during monitor startup."; |
| 1721 | + } |
| 1722 | + if (defined($target)) { |
| 1723 | + push (@{$self->{result}->{$target}->{roles}}, $role); |
| 1724 | + $self->{result}->{$target}->{state} = 'ONLINE' if ($self->{result}->{$target}->{state} eq 'AWAITING_RECOVERY'); |
| 1725 | + } |
| 1726 | + } |
| 1727 | + return $conflict; |
| 1728 | +} |
| 1729 | + |
| 1730 | + |
| 1731 | +sub to_string($) { |
| 1732 | + my $self = shift; |
| 1733 | + my $ret = "Startup status:\n"; |
| 1734 | + $ret .= "\nRoles:\n"; |
| 1735 | + |
| 1736 | + my $role_len = 4; # "Role" |
| 1737 | + my $host_len = 6; # "Master" |
| 1738 | + |
| 1739 | + foreach my $role (keys(%{$main::config->{role}})) { $role_len = max($role_len, length $role) } |
| 1740 | + foreach my $host (keys(%{$main::config->{host}})) { $host_len = max($host_len, length $host) } |
| 1741 | + $role_len += 17; # "(999.999.999.999)" |
| 1742 | + |
| 1743 | + $ret .= sprintf(" %-*s %-*s %-6s %-6s %-5s\n", $role_len, 'Role', $host_len, 'Host', 'Stored', 'System', 'Agent'); |
| 1744 | + foreach my $role (keys(%{$self->{roles}})) { |
| 1745 | + foreach my $host (keys(%{$self->{roles}->{$role}})) { |
| 1746 | + my $info = $self->{roles}->{$role}->{$host}; |
| 1747 | + $ret .= sprintf(" %-*s %-*s %-6s %-6s %-5s\n", $role_len, $role, $host_len, $host, |
| 1748 | + defined($info->{stored}) ? 'Yes' : '-', |
| 1749 | + defined($info->{system}) ? 'Yes' : '-', |
| 1750 | + defined($info->{agent}) ? 'Yes' : '-' |
| 1751 | + ); |
| 1752 | + } |
| 1753 | + } |
| 1754 | + |
| 1755 | + $ret .= "\nHosts:\n"; |
| 1756 | + $ret .= sprintf(" %-*s %-*s %-8s %-16s %-16s\n", $host_len, 'Host', $host_len, 'Master', 'Writable', 'Stored state', 'Agent state'); |
| 1757 | + foreach my $host (keys(%{$self->{hosts}})) { |
| 1758 | + my $info = $self->{hosts}->{$host}; |
| 1759 | + my $is_master = MMM::Monitor::Roles->instance()->is_master($host); |
| 1760 | + $ret .= sprintf(" %-*s %-*s %-8s %-16s %-16s\n", $host_len, $host, $host_len, |
| 1761 | + $is_master ? '-' : (defined($info->{system}->{master}) ? $info->{system}->{master} : '?'), |
| 1762 | + defined($info->{system}->{writable}) ? ($info->{system}->{writable} ? 'Yes' : 'No') : '?', |
| 1763 | + defined($info->{stored}->{state}) ? $info->{stored}->{state} : '?', |
| 1764 | + defined($info->{agent}->{state}) ? $info->{agent}->{state} : '?', |
| 1765 | + ); |
| 1766 | + } |
| 1767 | + return $ret; |
| 1768 | +} |
| 1769 | + |
| 1770 | +1; |
| 1771 | |
| 1772 | === modified file 'lib/Monitor/t/Roles.t' |
| 1773 | --- lib/Monitor/t/Roles.t 2009-02-05 08:43:52 +0000 |
| 1774 | +++ lib/Monitor/t/Roles.t 2010-03-09 10:25:26 +0000 |
| 1775 | @@ -55,7 +55,7 @@ |
| 1776 | $roles->assign($role_writer, 'db1'); |
| 1777 | is($roles->get_active_master(), 'db1', 'Active master after assigning writer role'); |
| 1778 | |
| 1779 | -$roles->clear_host_roles($roles->get_active_master()); |
| 1780 | +$roles->clear_roles($roles->get_active_master()); |
| 1781 | is($roles->get_active_master(), '', 'No active master with active master host cleared'); |
| 1782 | |
| 1783 | $roles->assign($role_writer, 'db2'); |
| 1784 | @@ -84,7 +84,7 @@ |
| 1785 | is($roles->count_host_roles('db2'), 2, 'balance roles (role count db2)'); |
| 1786 | |
| 1787 | $agents->{db2}->state('HARD_OFFLINE'); |
| 1788 | -$roles->clear_host_roles('db2'); |
| 1789 | +$roles->clear_roles('db2'); |
| 1790 | $roles->process_orphans('exclusive'); |
| 1791 | $roles->process_orphans('balanced'); |
| 1792 | is($roles->count_host_roles('db1'), 4, 'process orphans assigns all orphaned roles'); |
| 1793 | |
| 1794 | === modified file 'sbin/mmm_mond' |
| 1795 | --- sbin/mmm_mond 2010-02-11 02:23:38 +0000 |
| 1796 | +++ sbin/mmm_mond 2010-03-09 10:25:26 +0000 |
| 1797 | @@ -72,11 +72,6 @@ |
| 1798 | |
| 1799 | our $monitor = new MMM::Monitor::Monitor::(); |
| 1800 | |
| 1801 | -if (!MMM::Monitor::NetworkChecker->initial_check()) { |
| 1802 | - LOGDIE "None of the 'ping_ips' could be reached during startup. Network seems to be down - mmm_mond will shutdown now."; |
| 1803 | -} |
| 1804 | - |
| 1805 | - |
| 1806 | my $pidfilename = $config->{monitor}->{pid_path}; |
| 1807 | my $pidfile = new MMM::Common::PidFile:: $pidfilename; |
| 1808 | |
| 1809 | @@ -106,9 +101,9 @@ |
| 1810 | $SIG{PIPE} = 'IGNORE'; |
| 1811 | $SIG{CHLD} = \&ChildHandler; |
| 1812 | |
| 1813 | -$monitor->init(); |
| 1814 | - |
| 1815 | -$monitor->main(); |
| 1816 | +if ($monitor->init()) { |
| 1817 | + $monitor->main(); |
| 1818 | +} |
| 1819 | |
| 1820 | INFO 'END'; |
| 1821 | exit(0); |