Comment 27 for bug 1743249

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

We're working on getting a packet capture right now.

On Wed, Jan 31, 2018 at 9:08 AM, Blake Rouse <email address hidden> wrote:
> I know its almost impossible for you, but we need a network capture. If
> we could reproduce this locally and reliably I would not be asking for
> it. But since your all are the only ones running into this issue, we
> must have a network capture.
>
> I know that TFTP is random after the request from port 69, so it will be
> a lot of traffic but you need to capture all UDP traffic.
>
> We also need to know the MAC addresses or received IP addresses of the
> machines that are failing, along with the regiond, rackd, maas logs for
> that time range to correlate.
>
> This sounds like I am asking for the world, but that is the only way I
> know to look into this further. With no guarantee that it will tell me
> anything more.
>
> ** Changed in: maas
> Importance: Undecided => Critical
>
> ** Changed in: maas
> Status: New => Incomplete
>
> ** Changed in: maas
> Milestone: None => 2.4.x
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1743249
>
> Title:
> Failed Deployment after timeout trying to retrieve grub cfg
>
> Status in MAAS:
> Incomplete
>
> Bug description:
> A node failed to deploy after it failed to retrieve a grub.cfg from
> MAAS due to a timeout. In the logs, it's clear that the server tried
> to retrieve the grub cfg many times, over about 30 seconds:
>
> http://paste.ubuntu.com/26387256/
>
> We see the same thing for other hosts around the same time:
>
> http://paste.ubuntu.com/26387262/
>
> It seems like MAAS is taking way too long to respond to these
> requests.
>
> This is very similar to bug 1724677, which was happening pre-
> metldown/spectre. The only difference is we don't see "[critical] TFTP
> back-end failed" in the logs anymore.
>
> I connected to the console on this system and it had errors about
> timing out retrieving the grub-cfg, then it had an error message along
> the lines of "error not an ip" and then "double free". After I
> connected but before I could get a screenshot the system rebooted and
> was directed by maas to power off, which it did successfully after
> booting to linux.
>
> Full logs are available here:
> https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
> ed277a020e7c/cpe_cloud_395/infra-logs.tar
>
> This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions