Merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8
- pool-size-1.8
- Merge into 1.8
Status: | Merged |
---|---|
Approved by: | Raphaël Badin |
Approved revision: | no longer in the source branch. |
Merged at revision: | 4021 |
Proposed branch: | lp:~rvb/maas/pool-size-1.8 |
Merge into: | lp:maas/1.8 |
Diff against target: |
193 lines (+44/-10) 6 files modified
src/maasserver/api/nodes.py (+2/-3) src/maasserver/plugin.py (+14/-0) src/maasserver/tests/test_plugin.py (+14/-2) src/maasserver/websockets/handlers/node.py (+3/-4) src/provisioningserver/drivers/power/__init__.py (+9/-0) src/provisioningserver/rpc/power.py (+2/-1) |
To merge this branch: | bzr merge lp:~rvb/maas/pool-size-1.8 |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
Raphaël Badin (community) | Approve | ||
Review via email: mp+264390@code.launchpad.net |
Commit message
Backport 4072: Set the maximum thread pool size to 100: this is meant as a workaround for deadlocks such as the one described in the linked bug. It's not meant to be a permanent solution but rather to buy us time to come up with a proper solution. Unify the power query timeout: use a unique timeout across the board and use a value that can accommodate seamicro's tendency to take 30s to reply to a power query.
Description of the change
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Ign http://
Get:2 http://
Get:3 http://
Get:4 http://
Get:5 http://
Get:6 http://
Hit http://
Hit http://
Ign http://
Hit http://
Get:7 http://
Hit http://
Get:8 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:9 http://
Get:10 http://
Get:11 http://
Get:12 http://
Get:13 http://
Get:14 http://
Get:15 http://
Hit http://
Hit http://
Get:16 http://
Get:17 http://
Ign http://
Ign http://
Fetched 1,155 kB in 16min 35s (1,159 B/s)
W: Failed to fetch http://
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Get:2 http://
Ign http://
Ign http://
Hit http://
Get:3 http://
Get:4 http://
Get:5 http://
Get:6 http://
Hit http://
Hit http://
Get:7 http://
Hit http://
Get:8 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:9 http://
Get:10 http://
Get:11 http://
Get:12 http://
Get:13 http://
Hit http://
Hit http://
Get:14 http://
Ign http://
Ign http://
Fetched 1,604 kB in 11min 55s (2,243 B/s)
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Get:2 http://
Get:3 http://
Get:4 http://
Get:5 http://
Get:6 http://
Hit http://
Hit http://
Ign http://
Ign http://
Hit http://
Get:7 http://
Hit http://
Get:8 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:9 http://
Get:10 http://
Get:11 http://
Get:12 http://
Hit http://
Hit http://
Get:13 http://
Ign http://
Ign http://
Fetched 1,882 kB in 9min 57s (3,151 B/s)
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Get:2 http://
Ign http://
Ign http://
Hit http://
Get:3 http://
Hit http://
Get:4 http://
Get:5 http://
Get:6 http://
Get:7 http://
Get:8 http://
Hit http://
Hit http://
Get:9 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:10 http://
Get:11 http://
Get:12 http://
Get:13 http://
Hit http://
Hit http://
Get:14 http://
Ign http://
Ign http://
Fetched 1,732 kB in 11min 26s (2,521 B/s)
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
E: Could not get lock /var/lib/
E: Unable to lock directory /var/lib/apt/lists/
MAAS Lander (maas-lander) wrote : | # |
The attempt to merge lp:~rvb/maas/pool-size-1.8 into lp:maas/1.8 failed. Below is the output from the failed tests.
Ign http://
Get:1 http://
Get:2 http://
Ign http://
Ign http://
Hit http://
Get:3 http://
Hit http://
Get:4 http://
Get:5 http://
Get:6 http://
Get:7 http://
Hit http://
Hit http://
Get:8 http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Hit http://
Get:9 http://
Get:10 http://
Get:11 http://
Get:12 http://
Hit http://
Hit http://
Get:13 http://
Ign http://
Ign http://
Fetched 1,663 kB in 9min 19s (2,974 B/s)
W: Failed to fetch http://
E: Some index files failed to download. They have been ignored, or old ones used instead.
Preview Diff
1 | === modified file 'src/maasserver/api/nodes.py' |
2 | --- src/maasserver/api/nodes.py 2015-05-28 09:38:40 +0000 |
3 | +++ src/maasserver/api/nodes.py 2015-07-10 09:11:31 +0000 |
4 | @@ -77,6 +77,7 @@ |
5 | from maasserver.utils import find_nodegroup |
6 | from maasserver.utils.orm import get_first |
7 | from piston.utils import rc |
8 | +from provisioningserver.drivers.power import POWER_QUERY_TIMEOUT |
9 | from provisioningserver.power.poweraction import ( |
10 | PowerActionFail, |
11 | UnknownPowerType, |
12 | @@ -696,9 +697,7 @@ |
13 | power_type=power_info.power_type, |
14 | context=power_info.power_parameters) |
15 | try: |
16 | - # Allow 30 seconds for the power query max as we're holding |
17 | - # up an appserver thread here. |
18 | - state = call.wait(30) |
19 | + state = call.wait(POWER_QUERY_TIMEOUT) |
20 | except crochet.TimeoutError: |
21 | maaslog.error( |
22 | "%s: Timed out waiting for power response in Node.power_state", |
23 | |
24 | === modified file 'src/maasserver/plugin.py' |
25 | --- src/maasserver/plugin.py 2015-05-07 18:14:38 +0000 |
26 | +++ src/maasserver/plugin.py 2015-07-10 09:11:31 +0000 |
27 | @@ -24,6 +24,7 @@ |
28 | from twisted.plugin import IPlugin |
29 | from twisted.python import usage |
30 | from zope.interface import implementer |
31 | +from twisted.internet import reactor |
32 | |
33 | |
34 | def serverFromString(description): |
35 | @@ -45,6 +46,14 @@ |
36 | ] |
37 | |
38 | |
39 | +# The maximum number of threads used by the default twisted thread pool. |
40 | +# This value is a trade-off between a small value (such as the default: 10) |
41 | +# which can create deadlocks (see 1470013) and a huge value which can cause |
42 | +# MAAS to hit other limitations such as the number of open files or the |
43 | +# number of concurrent database connexions. |
44 | +MAX_THREADS = 100 |
45 | + |
46 | + |
47 | @implementer(IServiceMaker, IPlugin) |
48 | class RegionServiceMaker: |
49 | """Create a service for the Twisted plugin.""" |
50 | @@ -76,6 +85,10 @@ |
51 | import crochet |
52 | crochet.no_setup() |
53 | |
54 | + def _configurePoolSize(self): |
55 | + threadpool = reactor.getThreadPool() |
56 | + threadpool.adjustPoolsize(10, MAX_THREADS) |
57 | + |
58 | def _makeIntrospectionService(self, endpoint): |
59 | from provisioningserver.utils import introspect |
60 | introspect_service = ( |
61 | @@ -91,6 +104,7 @@ |
62 | self._configureLogging() |
63 | self._configureDjango() |
64 | self._configureCrochet() |
65 | + self._configurePoolSize() |
66 | |
67 | # Populate the region's event-loop with services. |
68 | from maasserver import eventloop |
69 | |
70 | === modified file 'src/maasserver/tests/test_plugin.py' |
71 | --- src/maasserver/tests/test_plugin.py 2015-05-07 18:14:38 +0000 |
72 | +++ src/maasserver/tests/test_plugin.py 2015-07-10 09:11:31 +0000 |
73 | @@ -17,6 +17,7 @@ |
74 | import crochet |
75 | from maasserver import eventloop |
76 | from maasserver.plugin import ( |
77 | + MAX_THREADS, |
78 | Options, |
79 | RegionServiceMaker, |
80 | ) |
81 | @@ -24,11 +25,13 @@ |
82 | from maastesting.testcase import MAASTestCase |
83 | from provisioningserver import logger |
84 | from provisioningserver.utils.twisted import asynchronous |
85 | +from testtools.matchers import GreaterThan |
86 | from twisted.application.service import MultiService |
87 | +from twisted.internet import reactor |
88 | |
89 | |
90 | class TestOptions(MAASTestCase): |
91 | - """Tests for `provisioningserver.plugin.Options`.""" |
92 | + """Tests for `maasserver.plugin.Options`.""" |
93 | |
94 | def test_defaults(self): |
95 | options = Options() |
96 | @@ -42,7 +45,7 @@ |
97 | |
98 | |
99 | class TestRegionServiceMaker(MAASTestCase): |
100 | - """Tests for `provisioningserver.plugin.RegionServiceMaker`.""" |
101 | + """Tests for `maasserver.plugin.RegionServiceMaker`.""" |
102 | |
103 | def setUp(self): |
104 | super(TestRegionServiceMaker, self).setUp() |
105 | @@ -78,3 +81,12 @@ |
106 | "Not all services are named.") |
107 | self.assertThat(logger.basicConfig, MockCalledOnceWith()) |
108 | self.assertThat(crochet.no_setup, MockCalledOnceWith()) |
109 | + |
110 | + @asynchronous(timeout=5) |
111 | + def test__sets_pool_size(self): |
112 | + service_maker = RegionServiceMaker("Harry", "Hill") |
113 | + service_maker.makeService(Options()) |
114 | + threadpool = reactor.getThreadPool() |
115 | + self.assertEqual(MAX_THREADS, threadpool.max) |
116 | + # Max threads is reasonable. |
117 | + self.assertThat(threadpool.max, GreaterThan(50)) |
118 | |
119 | === modified file 'src/maasserver/websockets/handlers/node.py' |
120 | --- src/maasserver/websockets/handlers/node.py 2015-05-28 15:07:01 +0000 |
121 | +++ src/maasserver/websockets/handlers/node.py 2015-07-10 09:11:31 +0000 |
122 | @@ -48,6 +48,7 @@ |
123 | ) |
124 | from metadataserver.enum import RESULT_TYPE |
125 | from metadataserver.models import NodeResult |
126 | +from provisioningserver.drivers.power import POWER_QUERY_TIMEOUT |
127 | from provisioningserver.logger import get_maas_logger |
128 | from provisioningserver.power.poweraction import ( |
129 | PowerActionFail, |
130 | @@ -547,9 +548,7 @@ |
131 | power_type=power_info.power_type, |
132 | context=power_info.power_parameters) |
133 | try: |
134 | - # Allow 15 seconds for the power query max as we're holding |
135 | - # up a thread waiting. |
136 | - state = call.wait(15)['state'] |
137 | + state = call.wait(POWER_QUERY_TIMEOUT)['state'] |
138 | except crochet.TimeoutError: |
139 | maaslog.error( |
140 | "%s: Timed out waiting for power response in " |
141 | @@ -569,5 +568,5 @@ |
142 | # sure the change is committed and retried if required. Not pushing |
143 | # this to another thread, would result in the entire power query being |
144 | # performed again. |
145 | - update_power_state(state).wait(15) |
146 | + update_power_state(state).wait(POWER_QUERY_TIMEOUT) |
147 | return state |
148 | |
149 | === modified file 'src/provisioningserver/drivers/power/__init__.py' |
150 | --- src/provisioningserver/drivers/power/__init__.py 2015-05-07 18:14:38 +0000 |
151 | +++ src/provisioningserver/drivers/power/__init__.py 2015-07-10 09:11:31 +0000 |
152 | @@ -13,6 +13,7 @@ |
153 | |
154 | __metaclass__ = type |
155 | __all__ = [ |
156 | + "POWER_QUERY_TIMEOUT", |
157 | "PowerActionError", |
158 | "PowerAuthError", |
159 | "PowerConnError", |
160 | @@ -29,6 +30,7 @@ |
161 | abstractmethod, |
162 | abstractproperty, |
163 | ) |
164 | +from datetime import timedelta |
165 | |
166 | from jsonschema import validate |
167 | from provisioningserver.drivers import ( |
168 | @@ -52,6 +54,13 @@ |
169 | } |
170 | |
171 | |
172 | +# Timeout for the power query action. We might be holding up a thread for that |
173 | +# long but some BMCs (notably seamicro) can take a long time to respond to |
174 | +# a power query request. |
175 | +# This should be configurable per-BMC. |
176 | +POWER_QUERY_TIMEOUT = timedelta(seconds=45).total_seconds() |
177 | + |
178 | + |
179 | class PowerError(Exception): |
180 | """Base error for all power driver failure commands.""" |
181 | |
182 | |
183 | === modified file 'src/provisioningserver/rpc/power.py' |
184 | --- src/provisioningserver/rpc/power.py 2015-05-14 07:22:16 +0000 |
185 | +++ src/provisioningserver/rpc/power.py 2015-07-10 09:11:31 +0000 |
186 | @@ -78,7 +78,8 @@ |
187 | |
188 | |
189 | # Timeout for change_power_state(). We set it to 5 minutes by default, |
190 | -# but it would be lovely if this was configurable. |
191 | +# but it would be lovely if this was configurable. This is only a backstop |
192 | +# meant to cope with broken BMCs. |
193 | CHANGE_POWER_STATE_TIMEOUT = timedelta(minutes=5).total_seconds() |
194 | |
195 |
Simple backport, self-approving. My testing in one of our lab (ivok's lab) indicates this definitely improves the situation.