The servers are booting in UEFI, but not secure boot. They do not
have TPMs so they are not capable of secure boot.
On Thu, Jan 18, 2018 at 1:44 PM, Jason Hobbs <email address hidden> wrote:
> That's not a sound assumption. It's a race condition - if MAAS has had
> an issue where it responds slowly to grub cfg requests for a long time,
> and it's taking just a little longer sometimes now than it used to, that
> can be enough to push it over the edge to take too long. If it used to
> take 29 seconds, the timeout is 30 seconds, and it takes 31 seconds
> sometimes now, we will see failures where we didn't used to due to the 1
> second change in time - not necessarily a new bug anywhere else. We did
> just get kernel patches for metldown/spectre that reduce performance...
> that seems like a possible culprit to me.
>
> Note also that one system made it to running cloud-init and was unable
> to retrieve network config info from the maas data source. How could
> that possibly be a grub (or secure boot issue?).
>
> Can you please explain the relevance of secure boot? It's running grub
> it retrieved from the network, and the systems usually boot - it's a
> race condition where it doesn't sometimes. It's not something that
> either works or it doesn't - I don't see how secure boot could possibly
> be related.
>
> ** Changed in: maas
> Status: Incomplete => New
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1743249
>
> Title:
> Failed Deployment after timeout trying to retrieve grub cfg
>
> Status in MAAS:
> New
>
> Bug description:
> A node failed to deploy after it failed to retrieve a grub.cfg from
> MAAS due to a timeout. In the logs, it's clear that the server tried
> to retrieve the grub cfg many times, over about 30 seconds:
>
> http://paste.ubuntu.com/26387256/
>
> We see the same thing for other hosts around the same time:
>
> http://paste.ubuntu.com/26387262/
>
> It seems like MAAS is taking way too long to respond to these
> requests.
>
> I connected to the console on this system and it had errors about
> timing out retrieving the grub-cfg, then it had an error message along
> the lines of "error not an ip" and then "double free". After I
> connected but before I could get a screenshot the system rebooted and
> was directed by maas to power off, which it did successfully after
> booting to linux.
>
> Full logs are available here:
> https://10.245.162.101/artifacts/14a34b5a-9321-4d1a-b2fa-
> ed277a020e7c/cpe_cloud_395/infra-logs.tar
>
> This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1743249/+subscriptions
The servers are booting in UEFI, but not secure boot. They do not
have TPMs so they are not capable of secure boot.
On Thu, Jan 18, 2018 at 1:44 PM, Jason Hobbs <email address hidden> wrote: /bugs.launchpad .net/bugs/ 1743249 paste.ubuntu. com/26387256/ paste.ubuntu. com/26387262/ /10.245. 162.101/ artifacts/ 14a34b5a- 9321-4d1a- b2fa- cpe_cloud_ 395/infra- logs.tar gd354690- 0ubuntu1~ 16.04.1. /bugs.launchpad .net/maas/ +bug/1743249/ +subscriptions
> That's not a sound assumption. It's a race condition - if MAAS has had
> an issue where it responds slowly to grub cfg requests for a long time,
> and it's taking just a little longer sometimes now than it used to, that
> can be enough to push it over the edge to take too long. If it used to
> take 29 seconds, the timeout is 30 seconds, and it takes 31 seconds
> sometimes now, we will see failures where we didn't used to due to the 1
> second change in time - not necessarily a new bug anywhere else. We did
> just get kernel patches for metldown/spectre that reduce performance...
> that seems like a possible culprit to me.
>
> Note also that one system made it to running cloud-init and was unable
> to retrieve network config info from the maas data source. How could
> that possibly be a grub (or secure boot issue?).
>
> Can you please explain the relevance of secure boot? It's running grub
> it retrieved from the network, and the systems usually boot - it's a
> race condition where it doesn't sometimes. It's not something that
> either works or it doesn't - I don't see how secure boot could possibly
> be related.
>
> ** Changed in: maas
> Status: Incomplete => New
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https:/
>
> Title:
> Failed Deployment after timeout trying to retrieve grub cfg
>
> Status in MAAS:
> New
>
> Bug description:
> A node failed to deploy after it failed to retrieve a grub.cfg from
> MAAS due to a timeout. In the logs, it's clear that the server tried
> to retrieve the grub cfg many times, over about 30 seconds:
>
> http://
>
> We see the same thing for other hosts around the same time:
>
> http://
>
> It seems like MAAS is taking way too long to respond to these
> requests.
>
> I connected to the console on this system and it had errors about
> timing out retrieving the grub-cfg, then it had an error message along
> the lines of "error not an ip" and then "double free". After I
> connected but before I could get a screenshot the system rebooted and
> was directed by maas to power off, which it did successfully after
> booting to linux.
>
> Full logs are available here:
> https:/
> ed277a020e7c/
>
> This is with 2.3.0-6434-
>
> To manage notifications about this bug go to:
> https:/