Comment 11 for bug 1743249

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1743249] Re: Failed Deployment after timeout trying to retrieve grub cfg

The servers are booting in UEFI, but not secure boot. They do not
have TPMs so they are not capable of secure boot.

On Thu, Jan 18, 2018 at 1:44 PM, Jason Hobbs <email address hidden> wrote:
> That's not a sound assumption. It's a race condition - if MAAS has had
> an issue where it responds slowly to grub cfg requests for a long time,
> and it's taking just a little longer sometimes now than it used to, that
> can be enough to push it over the edge to take too long. If it used to
> take 29 seconds, the timeout is 30 seconds, and it takes 31 seconds
> sometimes now, we will see failures where we didn't used to due to the 1
> second change in time - not necessarily a new bug anywhere else. We did
> just get kernel patches for metldown/spectre that reduce performance...
> that seems like a possible culprit to me.
>
> Note also that one system made it to running cloud-init and was unable
> to retrieve network config info from the maas data source. How could
> that possibly be a grub (or secure boot issue?).
>
> Can you please explain the relevance of secure boot? It's running grub
> it retrieved from the network, and the systems usually boot - it's a
> race condition where it doesn't sometimes. It's not something that
> either works or it doesn't - I don't see how secure boot could possibly
> be related.
>
> ** Changed in: maas
> Status: Incomplete => New
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1743249
>
> Title:
> Failed Deployment after timeout trying to retrieve grub cfg
>
> Status in MAAS:
> New
>
> Bug description:
> A node failed to deploy after it failed to retrieve a grub.cfg from
> MAAS due to a timeout. In the logs, it's clear that the server tried
> to retrieve the grub cfg many times, over about 30 seconds:
>
> http://paste.ubuntu.com/26387256/
>
> We see the same thing for other hosts around the same time:
>
> http://paste.ubuntu.com/26387262/
>
> It seems like MAAS is taking way too long to respond to these
> requests.
>
> I connected to the console on this system and it had errors about
> timing out retrieving the grub-cfg, then it had an error message along
> the lines of "error not an ip" and then "double free". After I
> connected but before I could get a screenshot the system rebooted and
> was directed by maas to power off, which it did successfully after
> booting to linux.
>
> Full logs are available here:
> https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
> ed277a020e7c/cpe_cloud_395/infra-logs.tar
>
> This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions