MAAS-dhcpd fails to start, commissioning fails

Bug #1792965 reported by Chris Gregan
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Medium
Andres Rodriguez
2.5
Fix Released
Medium
Andres Rodriguez

Bug Description

[error] Service 'maas-dhcpd' failed to start. Its current state is 'dead' and 'Result: exit-code'.

ep 17 09:43:29 leafeon systemd[1]: Starting MAAS instance of ISC DHCP server for IPv4...
Sep 17 09:43:29 leafeon systemd[1]: Started MAAS instance of ISC DHCP server for IPv4.
Sep 17 09:43:29 leafeon maas.service_monitor: [info] Service 'maas-dhcpd' has been started and is 'running'.
Sep 17 09:43:29 leafeon dhcpd[26246]: There's already a DHCP server running.
Sep 17 09:43:29 leafeon dhcpd[26246]:
Sep 17 09:43:29 leafeon dhcpd[26246]: If you think you have received this message due to a bug rather
Sep 17 09:43:29 leafeon dhcpd[26246]: than a configuration issue please read the section on submitting
Sep 17 09:43:29 leafeon dhcpd[26246]: bugs on either our web page at www.isc.org or in the README file
Sep 17 09:43:29 leafeon dhcpd[26246]: before submitting a bug. These pages explain the proper
Sep 17 09:43:29 leafeon dhcpd[26246]: process and the information we find helpful for debugging..
Sep 17 09:43:29 leafeon dhcpd[26246]:
Sep 17 09:43:29 leafeon dhcpd[26246]: exiting.
Sep 17 09:43:29 leafeon systemd[1]: maas-dhcpd.service: Main process exited, code=exited, status=1/FAILURE
Sep 17 09:43:29 leafeon systemd[1]: maas-dhcpd.service: Unit entered failed state.
Sep 17 09:43:29 leafeon systemd[1]: maas-dhcpd.service: Failed with result 'exit-code'.

Related branches

Revision history for this message
Chris Gregan (cgregan) wrote :
tags: added: cdo-qa foundation-engine
Revision history for this message
Christian Reis (kiko) wrote :
Revision history for this message
Chris Gregan (cgregan) wrote : Re: [Bug 1792965] Re: MAAS-dhcpd fails to start, commissioning fails

Yes...this is where we found it and is the only instance of this failure we
have seen.

On Tue, Sep 18, 2018 at 7:25 PM Christian Reis <email address hidden> wrote:

> This killed MAAS 2.3.5 on xenial/queens on this run:
>
> https://solutions.qa.canonical.com/#/qa/testRun/3ec4e343-06df-4666-8a59-356a7de066f7
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1792965
>
> Title:
> MAAS-dhcpd fails to start, commissioning fails
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1792965/+subscriptions
>

--
Chris Gregan
Quality Assurance Manager
Field Engineering/CPE
<email address hidden>

Revision history for this message
Andres Rodriguez (andreserl) wrote :

As per discussion on the call:

1. This issue has been seen multiple times most recently.
2. Solutions QA has changed the way how they install and configure MAAS, by re-using a previously used hardware.
3. MAAS 2.3.5 has not changed in quite a while.
4. Since there's been environmental changes, this could be related and the reason why this issue surfaces.

AS such, given that the error message clearly shows that there's another DHCP server running when MAAS' DHCP is being brought up, could you please provide the following information:

1. The steps you take you install MAAS on your re-usable environment and any logs of the installation process (e.g. /var/log/apt/{history,term}.log of the installation)
2. The steps you take to clean-up your environment
3. After cleaning up your environment, is isc-dhcp installed? is isc-dhcp left running? (ps faux | grep dhcpd && dpkg -l | grep isc-dhcp)

Changed in maas:
status: New → Incomplete
Revision history for this message
Chris Gregan (cgregan) wrote :

Just for the record this is not a 2.3.5 bug, but instead is occurring on
2.4.2.

On Wed, Nov 21, 2018 at 9:30 AM Andres Rodriguez <email address hidden>
wrote:

> As per discussion on the call:
>
> 1. This issue has been seen multiple times most recently.
> 2. Solutions QA has changed the way how they install and configure MAAS,
> by re-using a previously used hardware.
> 3. MAAS 2.3.5 has not changed in quite a while.
> 4. Since there's been environmental changes, this could be related and the
> reason why this issue surfaces.
>
> AS such, given that the error message clearly shows that there's another
> DHCP server running when MAAS' DHCP is being brought up, could you
> please provide the following information:
>
> 1. The steps you take you install MAAS on your re-usable environment and
> any logs of the installation process (e.g. /var/log/apt/{history,term}.log
> of the installation)
> 2. The steps you take to clean-up your environment
> 3. After cleaning up your environment, is isc-dhcp installed? is isc-dhcp
> left running? (ps faux | grep dhcpd && dpkg -l | grep isc-dhcp)
>
>
> ** Changed in: maas
> Status: New => Incomplete
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1792965
>
> Title:
> MAAS-dhcpd fails to start, commissioning fails
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1792965/+subscriptions
>

--

Chris Gregan
Engineering Manager
Solutions QA/Techops
<email address hidden>

Revision history for this message
Andres Rodriguez (andreserl) wrote :

@Chris,

Comment #2 implies that this issue was seen when installing 2.3.5. In light of it being a 2.4.2, I would also request this information:

1. Version of MAAS installed /before/ the issue happens.
2. The removal process of such installation (what steps you take to remove, whether DHCP keeps running, etc, including apt logs.).
3. Logs to the exact steps you take to install the new version (which seems different from 1), and logs of such (including apt logging).

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Some of the requested logs are here https://bugs.launchpad.net/maas/+bug/1794114/comments/20

Additionally, here is dpkg -l, ps auxfww, and ls -laR /var/lib/maas after cleaning prior to the installation that failed:
https://pastebin.canonical.com/p/3yBwCqJXXz/

The attached tarball has ls -laR output at /var/log/maas/var.lib.maas.log and ps auxf output at /var/log/maas/psauxf.log, captured a few minutes after the failure, for each server. The tarball also contains /var/log/journal for each maas server.

In this run the failure was at:
10.244.40.30/var/log/syslog:Dec 12 15:31:18 leafeon dhcpd[5224]: There's already a DHCP server running.
10.244.40.30/var/log/syslog:Dec 12 15:31:20 leafeon dhcpd[5460]: There's already a DHCP server running.
10.244.40.30/var/log/syslog:Dec 12 15:31:20 leafeon dhcpd[5492]: There's already a DHCP server running.
10.244.40.30/var/log/syslog:Dec 12 15:31:50 leafeon dhcpd[7799]: There's already a DHCP server running.
10.244.40.30/var/log/syslog:Dec 12 15:32:20 leafeon dhcpd[10360]: There's already a DHCP server running.
10.244.40.30/var/log/syslog:Dec 12 15:32:50 leafeon dhcpd[13714]: There's already a DHCP server running.
10.244.40.30/var/log/syslog:Dec 12 15:33:20 leafeon dhcpd[17031]: There's already a DHCP server running.
10.244.40.30/var/log/syslog:Dec 12 15:33:50 leafeon dhcpd[19712]: There's already a DHCP server running.
10.244.40.30/var/log/syslog:Dec 12 15:34:20 leafeon dhcpd[21468]: There's already a DHCP server running.

Changed in maas:
status: Incomplete → New
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Sorry, the pastebin in #7 only had the output for one of the hosts; here it is for all three https://pastebin.canonical.com/p/kDVFcz5DMb/

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Subscribed to field critical as this is causing a very high number of solutions QA test failures and we don't have a workaround for it.

Changed in maas:
assignee: nobody → Newell Jensen (newell-jensen)
Revision history for this message
Andres Rodriguez (andreserl) wrote :

After reviewing all the logs, I still don't have enough information to determine why the DHCP is failing to start with the following message "There's already a DHCP server running.". I've tried to reproduced locally and have been unsuccessful.

Could you please try something after you remove/purge MAAS:

rm -rf /run/maas/dhcp

That should ensure the pidfile is removed, which i would think is the only state kept in disk that could prevent isc-dhcp from starting with the message that other DHCP server is already running.

Changed in maas:
assignee: Newell Jensen (newell-jensen) → Andres Rodriguez (andreserl)
status: New → Triaged
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

So the dhcpd.pid file is indeed being left there:

[10.244.40.31] out: -rw-r--r-- 1 root root 6 Jan 24 19:38 dhcpd.pid

We have a workaround in place now to remove it during clean, but this should be fixed in the appropriate package.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

Dropped to field-high since we have a work around.

Changed in maas:
status: Triaged → In Progress
importance: Undecided → Medium
milestone: none → 2.6.0alpha2
Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.