tripleo nova containers restarting in a loop causing - 503 errors - starting container trunk.registry.rdoproject.org/tripleomaster/centos-binary-nova-compute-ironic

Bug #1775698 reported by Ronelle Landy
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
yatin

Bug Description

Jobs are failing in the master promotion - starting on 06/07. Possible issue with nova/quotas .. We are seeing the undercloud installation failing with 503 error reporting (variations of):

2018-06-07 19:11:59 | "Status: Downloaded newer image for trunk.registry.rdoproject.org/tripleomaster/centos-binary-nova-compute-ironic:d596a4ea9f0b0d9255f21bbd4c3757eec6017669_ceffe88f",
2018-06-07 19:11:59 | "",
2018-06-07 19:11:59 | "stderr: ",
2018-06-07 19:11:59 | "stdout: 948fd04b70f8e36b43dc4ac2c61555418e2f07f5c33790628e7c9e246030f62e",
2018-06-07 19:11:59 | "stdout: ",
2018-06-07 19:11:59 | "stderr: /usr/lib/python2.7/site-packages/openstack/_meta/connection.py:122: ImportWarning: Could not import data-processing service filter: No module named data_processing.data_processing_service",
.
.
.
.
2018-06-07 19:11:59 | "Skipping cell0 since it does not contain hosts.",
2018-06-07 19:11:59 | "Getting computes from cell 'default': c2984f55-d8b8-462e-85e5-bcad8f29140b",
2018-06-07 19:11:59 | "Creating host mapping for service undercloud.localdomain",
2018-06-07 19:11:59 | "Found 1 unmapped computes in cell: c2984f55-d8b8-462e-85e5-bcad8f29140b",
2018-06-07 19:11:59 | "stderr: Unknown Error (HTTP 503)",
2018-06-07 19:11:59 | "Unknown Error (HTTP 503)"
2018-06-07 19:11:59 | ]
2018-06-07 19:11:59 | }

...

Unable to establish connection to http://192.168.24.1:8774/v2.1/os-quota-sets/e68b399367024f39849bf1deaccb67fe: HTTPConnectionPool(host='192.168.24.1', port=8774): Max retries exceeded with url: /v2.1/os-quota-sets/e68b399367024f39849bf1deaccb67fe (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7ff21a83dd10>: Failed to establish a new connection: [Errno 111] Connection refused',))"

2018-06-07 19:12:08 | TASK [Run deployment UndercloudPostDeployment] *********************************
"++ openstack project show admin", "++ awk '$2==\"id\" {print $4}'", "+ openstack quota set --cores -1 --instances -1 --ram -1 5e38b3f0fccb448386cdb39ba3d45d8b", "Unknown Error (HTTP 503)", "", "[2018-06-07 19:12:13,557] (heat-config) [ERROR] Error running /var/lib/heat-config/heat-config-script/6a41a6d0-19b9-49f9-a942-b15782c2a730. [1]", "", "", "[2018-06-07 19:12:13,562] (heat-config)

....

The undercloud deployment stops at:
http://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/extraconfig/post_deploy/undercloud_post.sh#n69

See the logs for jobs:

fs001
https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/60b4155/undercloud/home/jenkins/undercloud_install.log.txt.gz#_2018-06-07_19_11_59

fs016
https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-master/cc496dd/undercloud/home/jenkins/undercloud_install.log.txt.gz#_2018-06-07_19_15_39

fs018
https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset018-master/2721c57/undercloud/home/jenkins/undercloud_install.log.txt.gz#_2018-06-07_19_24_12

Ronelle Landy (rlandy)
Changed in tripleo:
milestone: none → rocky-3
importance: Undecided → Critical
status: New → Triaged
tags: added: alert ci promotion-blocker
Revision history for this message
wes hayutin (weshayutin) wrote :

Last known working
===================================

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/0830d9b/undercloud/home/jenkins/undercloud_install.log.txt.gz#_2018-06-05_05_55_03

openstack-nova-18.0.0-0.20180601221704.f902e0d.el7.noarch.rpm
===================================

Latest Failed job installs is at head of nova as of 6/07/2018
https://github.com/openstack/nova/commit/54c9a944c618dc173bd1214be4de9a44479c8959
https://trunk.rdoproject.org/centos7-master/d5/96/d596a4ea9f0b0d9255f21bbd4c3757eec6017669_ceffe88f/
openstack-nova-18.0.0-0.20180607163313.54c9a94.el7.noarch.rpm

Ronelle Landy (rlandy)
description: updated
Revision history for this message
wes hayutin (weshayutin) wrote :

Nova is clearly down or unreachable

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset016-master/cc496dd/undercloud/var/log/extra/nova_list.txt.gz

ERROR (ConnectFailure): Unable to establish connection to http://192.168.24.1:8774/v2.1: HTTPConnectionPool(host='192.168.24.1', port=8774): Max retries exceeded with url: /v2.1 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fabdd786190>: Failed to establish a new connection: [Errno 111] Connection refused',))

wes hayutin (weshayutin)
summary: - [master promotion] undercloud install is failing - 503 errors -
+ tripleo nova containers restarting in a loop causing - 503 errors -
starting container trunk.registry.rdoproject.org/tripleomaster/centos-
binary-nova-compute-ironic
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The issue looks not related to ironic. Nova-api container instead, and a few more, failing by a healthcheck a several seconds after it starts. The strange thing is that there is no logs stored in the nova api log file. And the container stdout looks normal.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

That I can see with recreated nova api container via paunch, for debugging disabled health checks:

* nova api works normally then I start it after kolla init steps in the container.
* healthcheck script connects to undercloud.internalapi.localdomain:8774 taken from /etc/httpd/conf.d/10-nova_api_wsgi.conf, it is resolved to 192.168.24.1:8774
* nothing listens there (haproxy issue)? there is only 192.168.24.3:8774 available.

haproxy config stanza:
 listen nova_osapi
   bind 192.168.24.2:13774 transparent ssl crt /etc/pki/tls/private/overcloud_endpoint.pem
   bind 192.168.24.3:8774 transparent
   mode http
   http-request set-header X-Forwarded-Proto https if { ssl_fc }
   http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
   option httpchk
   option httplog
   redirect scheme https code 301 if { hdr(host) -i 192.168.24.2 } !{ ssl_fc }
   rsprep ^Location:\ http:(.*) Location:\ https:\1
   server undercloud.internalapi.localdomain 192.168.24.1:8774 check fall 5 inter 2000 rise 2

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The commands I used to start a debug container w/o healthchecks (edited /var/lib/tripleo-config/docker-container-startup-config-step_4.json manually for nova_api)

$ sudo paunch --debug debug --file /var/lib/tripleo-config/docker-container-startup-config-step_4.json --interactive --shell --user root --container nova_api --action run #--config-id tripleo_step4 --managed-by tripleo-Controller

$ (in the container) sudo -E kolla_set_configs; nova-api

Revision history for this message
yatin (yatinkarel) wrote :

This is happening after switched to containerized undercloud:- https://review.openstack.org/#/c/571529/
Happening only in promotion jobs(new container image for nova-api), caused by:- https://review.openstack.org/#/c/558765. The patch is wrong, proposing fix for it, let's see how it goes.

Revision history for this message
yatin (yatinkarel) wrote :
yatin (yatinkarel)
Changed in tripleo:
assignee: nobody → yatin (yatinkarel)
Matt Young (halcyondude)
tags: removed: alert
Revision history for this message
Arx Cruz (arxcruz) wrote :

Fix released, master promoted.

Changed in tripleo:
status: Triaged → Fix Committed
Changed in tripleo:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla 7.0.0.0b3

This issue was fixed in the openstack/kolla 7.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.