deferToThread cannot wait for a thread in the same threadpool

Bug #1447208 reported by Blake Rouse
38
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Fix Released
Critical
Blake Rouse

Bug Description

This is the core issue we identified at the sprint. We need to make sure this is not being done, as it will cause the regiond to just dead lock and can also cause timeout errors.

Any deferToThread that is done from inside a thread, and the thread waits for the result can fail. This is because only 10 threads can run at once and if the pool is waiting for 10 other threads that cannot run, then a dead lock will occur.

There is 2 ways around this:

1. Do not wait for a thread inside of a thread.
2. Defer to a different threadpool then the one the thread is using. *We might be able to alternate between pools to fix this issue.*

Tags: oil

Related branches

description: updated
summary: - deferToThread should not wait for a thread in the same threadpool
+ deferToThread cannot wait for a thread in the same threadpool
Changed in maas:
status: Triaged → In Progress
assignee: nobody → Blake Rouse (blake-rouse)
tags: added: oil
Revision history for this message
Andres Rodriguez (andreserl) wrote :
Download full text (3.1 KiB)

I just came across this issue today again:

https://bugs.launchpad.net/maas/+bug/1446915

=> /var/log/maas/regiond.log <==
2015-04-27 15:08:53 [-] 127.0.0.1 - - [27/Apr/2015:15:08:52 +0000] "GET /MAAS/metadata//2012-03-01/user-data HTTP/1.1" 200 3439 "-" "python-requests/2.2.1 CPython/2.7.6 Linux/3.13.0-35-generic"
2015-04-27 15:08:55 [-] 127.0.0.1 - - [27/Apr/2015:15:08:54 +0000] "GET /MAAS/api/1.0/nodes/?nodes=node-a0754dc0-c4cd-11e3-824b-00163efc5068&op=deployment_status HTTP/1.1" 200 64 "-" "Go 1.1 package http"
2015-04-27 15:08:58 [-] 127.0.0.1 - - [27/Apr/2015:15:08:58 +0000] "GET /MAAS/api/1.0/nodes/?agent_name=a4a9be1b-5c0a-424d-81d7-35a43d9478b1&id=node-a5224922-ae98-11e3-b194-00163efc5068&op=list HTTP/1.1" 200 2 "-" "Go 1.1 package http"
2015-04-27 15:08:59 [root] ERROR:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/django/core/handlers/base.py", line 112, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/api/support.py", line 52, in __call__
    response = upcall(request, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/views/decorators/vary.py", line 19, in inner_func
    response = func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/piston/resource.py", line 167, in __call__
    result = self.error_handler(e, request, meth, em_format)
  File "/usr/lib/python2.7/dist-packages/piston/resource.py", line 165, in __call__
    result = meth(request, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/api/support.py", line 200, in dispatch
    return function(self, request, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/api/nodes.py", line 412, in start
    node.start(request.user, user_data=user_data)
  File "/usr/lib/python2.7/dist-packages/maasserver/utils/orm.py", line 399, in call_within_transaction
    return func_within_txn(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/django/db/transaction.py", line 339, in inner
    return func(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/maasserver/models/node.py", line 1920, in start
    claims = self.claim_static_ip_addresses()
  File "/usr/lib/python2.7/dist-packages/maasserver/models/node.py", line 1742, in claim_static_ip_addresses
    alloc_type=alloc_type, requested_address=requested_address)
  File "/usr/lib/python2.7/dist-packages/maasserver/models/macaddress.py", line 358, in claim_static_ips
    interface, alloc_type, requested_address, user=user)
  File "/usr/lib/python2.7/dist-packages/maasserver/models/macaddress.py", line 247, in _allocate_static_address
    user=user)
  File "/usr/lib/python2.7/dist-packages/maasserver/models/staticipaddress.py", line 179, in allocate_new
    alloc_type, user, hostname=hostname).wait(30)
  File "/usr/lib/python2.7/dist-packages/crochet/_eventloop.py", line 217, in wait
    result = self._result(timeout)
  File "/usr/lib/python2.7/dist-packages/crochet/_eventloop.py", line 195, in _result
    raise TimeoutError()
TimeoutError

I also came across: https://bugs.launchpad.net/maas/+bug/1379370 ... I wonder if there's something related t...

Read more...

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

We hit an issue tonite where nodes are stuck in Releasing, and the cluster was disconnected from the region. I had to manually restart the cluster to get it reconnected, and the nodes are still stuck in Releasing.

Revision history for this message
Blake Rouse (blake-rouse) wrote :

Marking this "Fix Committed" change if this issue occurs again.

Changed in maas:
status: In Progress → Fix Committed
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.