ganglia-monitor will not start on boot

Bug #656427 reported by Mark Russell
44
This bug affects 7 people
Affects Status Importance Assigned to Milestone
ganglia (Ubuntu)
Fix Released
Medium
Jonathan Davies

Bug Description

ganglia-monitor can be started manually after booting, it also runs properly after installaion.

However, on boot it fails every time with one of a couple of network related messages. If using the default /etc/ganglia/gmond.conf settings the error message in /var/log/daemon.log will be:

Oct 7 12:41:19 myhostname /usr/sbin/gmond[701]: Error creating multicast server mcast_join=239.2.11.71 port=8649 mcast_if=NULL family='inet4'. Exiting.#012

While another configuration, for example without multicast settings, produces this error:

Oct 7 11:49:01 dl3801-1 /usr/sbin/gmond[1165]: Unable to create UDP client for ganglia-host:8649. Exiting.#012

It appears that gmond is trying to start before the network is available, it fails to bind, and then exits.

ProblemType: Bug
DistroRelease: Ubuntu 10.10
Package: ganglia-monitor 3.1.7-1
ProcVersionSignature: Ubuntu 2.6.35-22.33-generic 2.6.35.4
Uname: Linux 2.6.35-22-generic x86_64
Architecture: amd64
Date: Thu Oct 7 13:06:15 2010
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release Candidate amd64 (20100928)
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: ganglia

Revision history for this message
Mark Russell (marrusl) wrote :
Revision history for this message
Mark Russell (marrusl) wrote :

I checked the init script and it does contain:

# Required-Start: $network $named $remote_fs $syslog

Revision history for this message
Mark Russell (marrusl) wrote :

Here is a workaround:

I added a tiny script to /etc/network/if-up.d/ that runs "/etc/init.d/ganglia-monitor restart". Scripts in this folder are run whenever a interface is brought up, and it seems to do the trick.

Not sure what's wrong with the package. I did not have this problem with ganglia-monitor 3.1.2 on Lucid.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hi Mark,

I may be wrong, but I believe $network doesn't exist anymore as an init.d service, so the Required-Start: $network won't do anything.

The issue is that sysvinit scripts are started as soon as lo is up:

start on filesystem and net-device-up IFACE=lo

The simplest workaround is to change IFACE=lo to IFACE=ethX, but if there are any sysvinit scripts required to bring up ethX this may be problematic.

The eventual way to solve this is to make this into an upstart controlled service, and then you can just do

start on net-device-up IFACE=ethX

in just the upstart job.

Changed in ganglia (Ubuntu):
status: New → Confirmed
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Sorry, to clarify, the workaround of

start on filesystem and net-device-up IFACE=ethX

 would go in

/etc/init/rc-sysinit.conf

Revision history for this message
Mark Russell (marrusl) wrote :

Removed the init scripts for gmond (ganglia-monitor) and gmetad and replaced them with upstart scripts (provided by Bernard Li of the Ganglia project).

Revision history for this message
Mark Russell (marrusl) wrote :

This version checks to make sure you are building for Ubuntu 9.10 and above. If so, it creates upstart jobs for ganglia-monitor and gmetad. Otherwise it will use the original init scripts. This should make the change Debian-friendly.

Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Mark, IIRC, if there is a .init and a .upstart file, Ubuntu's dh_init will use the .upstart file, but Debian and older ubuntu's will just ignore it and use the .init file.

So I'm not sure you actually need all of that code. Did you try just putting both in and building it on Debian and Hardy?

I ask because this is how the current MongoDB packages are managed and it seems to work fine.

Changed in ganglia (Ubuntu):
importance: Undecided → Medium
tags: added: patch
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Oh first, Mark, thanks so much for the patch!

Also, 'start on started network' will not solve the issue that you reported, as it will just start gmond as soon as the interfaces have been started, which doesn't mean they're up. You will need to do

start on net-device-up

Plus for your particular issue, this wouldn't work because lo would fire the 'net-device-up' event, so you'd probably need to add IFACE=eth0. However that is too specific to your setup, so it probably wouldn't be acceptable in packages.

I would wait on doing any changes though, as we'll be defining some best practices for upstart jobs, and possibly defining more generic events (like 'multicast-ready' or something like that) during the upcoming UDS (which starts this Monday)

Revision history for this message
Jonathan Davies (jpds) wrote :

Up to date debdiff file with necessary changes.

Changed in ganglia (Ubuntu):
status: Confirmed → In Progress
assignee: nobody → Jonathan Davies (jpds)
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

First off, thanks for the great work on this Jonathan. We discussed in a private chat, but just to document the debdiff:

* start on net-device-up IFACE!=lo is not really a valid option. Servers often have *many* devices to bring up so starting on the first non-loopback is really not a valid solution. Since 11.10, runlevel 2 waits for all interfaces in /etc/network/interfaces (or 2 minutes to pass trying to bring up said interfaces). Given that, you want

start on runlevel [2345]
stop on runlevel [^2345]

* Just as a general convention, I'd expect respawn up in the block with 'expect fork'. Usually the primary process exec is the last thing.

* Its not clear at all why the debdiff adds a sysvinit script. Please explain.

Revision history for this message
Jonathan Davies (jpds) wrote :
Jonathan Davies (jpds)
Changed in ganglia (Ubuntu):
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ganglia - 3.3.8-1ubuntu1

---------------
ganglia (3.3.8-1ubuntu1) raring; urgency=low

  * Upstartify init scripts (LP: #656427, LP: #1078711).
 -- Jonathan Davies <email address hidden> Wed, 14 Nov 2012 15:23:46 +0000

Changed in ganglia (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.