Merge lp:~hazmat/pyjuju/security-specification into lp:pyjuju

Proposed by Kapil Thangavelu
Status: Work in progress
Proposed branch: lp:~hazmat/pyjuju/security-specification
Merge into: lp:pyjuju
Diff against target: 264 lines (+260/-0)
1 file modified
docs/source/drafts/security.rst (+260/-0)
To merge this branch: bzr merge lp:~hazmat/pyjuju/security-specification
Reviewer Review Type Date Requested Status
Gustavo Niemeyer Needs Fixing
Review via email: mp+63921@code.launchpad.net

Description of the change

A basic overview of the security architecture, attack vectors, and next steps. its a bit rambling but hopefully it makes a good start.

To post a comment you must log in.
Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :
Download full text (5.1 KiB)

This is going into a very good direction.

Here are some comments on a first read:

[1]

47 +to each node. Every zookeeper connection can associate principal
48 +credentials to its connection, and all access by that connection is
49 +validated against the per node ACL mapping.

This would be a good place to link some upstream documentation for zk.

[2]

Just in terms of style, on the big reorganization I've standardized the headings
on the three styles:

  1) ===
  2) ---
  3) ~~~

I suggest we try to keep this convention across all documents, otherwise it's
hard to follow what's a sub-header or not (e.g. I wouldn't expect === as a
sub-header of ---)

[3]

55 +An additional zookeeper connected actor responsible for creating principals
56 +and providing an up to date token database.

Do we need to manage an external agent for that?

[4]

65 +Each actor employs a security policy, to determine the ACL map for a given
66 +node path that may create. The policy simply takes the path to the node
67 +to be created, and returns back an ACL map that can be set on the node.

This wasn't very clear on a first read. Some ideas:

- "employs" could mean different things in that context, so I suggest replacing by
  a less open synonym ("uses"?).

- "that may create" feels like missing the subject ("that the actor may
  create?")

- At least the first comma should be dropped.

- s/can be set/must be set/?

[5]

73 Every actor in the system needs its own
74 +unique principal, to provide an auth identity, the credentials for a
75 +principal are known only to the actor utilizing them and transiently
76 +the security agent when they are created.

This sentence isn't reading well.

[6]

78 +Instead of passing principals credentials directly via insecure
79 +channels, an actor creating another actor also establishes a principal
80 +creation token via the security agent. The principal creation token is
81 +a one time use string which can be used to create a principal and its
82 +password, and update the token database.
83 +
84 +The security agent has a simple policy in place regarding principal
85 +names and which actors can create them, ie. a provisioning agent can
86 +create machine principals, but not service unit principals.

It's not clear from the document how that's going to work, and it also feels a
little bit like delegating permissions could be more complexity than is
necessary (e.g. what about the agent itself creating these identities instead
of providing a one time token?). This area needs some debate.

[7]

88 +If a malicious user intercepts the token and uses it, compared with
89 +passing credentials directly it minimizes the time that a third party
90 +has to perform such an interception. Moreover invalid use of a token
91 +can be logged as foresenic information.

This assumes knowledge about about how such interceptions would take place,
or why it minimizes the time, that isn't available in the document up to
then.

Also, we should design the security system in a way that such interceptions
can't take place, rather than minimizing the time an interception can be
made (ssh vs. telnet).

[8]

118 +Additionally services utilize relations to communic...

Read more...

review: Needs Fixing
Revision history for this message
Clint Byrum (clint-fewbar) wrote :

Hi Kapil.

I noticed you're suggesting MD5 for the password hashes. I'd suggest going 1 step further and using multiple iterations of MD5. Grid computing has made cracking a single MD5 password trivial. Hash 200,000 times, and at least you require 200,000 times more power to do a mass dictionary attack (and it shouldn't add much time considering how seldom the actual password will need to be checked.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

200k+MD5 is a bit too much, but I agree that using HMAC (perhaps HMAC+SHA1) rather than plain MD5 would be good. HMAC is targeted at specifically this use case too.

Revision history for this message
Kapil Thangavelu (hazmat) wrote :

Excerpts from Clint Byrum's message of Thu Jun 09 23:51:31 UTC 2011:
> Hi Kapil.
>
> I noticed you're suggesting MD5 for the password hashes. I'd suggest going 1 step further and using multiple iterations of MD5. Grid computing has made cracking a single MD5 password trivial. Hash 200,000 times, and at least you require 200,000 times more power to do a mass dictionary attack (and it shouldn't add much time considering how seldom the actual password will need to be checked.

Its actually not something we control explicitly as what we set for the acl identity token (the username:md5hash) needs to match the enforcement side which is provided by zookeeper. We could in future play around with a zk authentication plugin that we could provide custom logic for, but i think we've deemed modification of zk out of scope for the moment. I agree though that things like cuda/gpugpu programming and the fact that we're manipulating cloud environments, makes brute force attacks more likely and it would be nice to have a more resistant approach.

Revision history for this message
Gustavo Niemeyer (niemeyer) wrote :

Good point. On the bright side, these tokens are going to be automatically generated. We can easily make them long and random enough to invalidate any brute force attacks (not counting fancy quantum computing advancements ;-).

242. By Kapil Thangavelu

address some formatting comments

243. By Kapil Thangavelu

merge trunk and address some formatting comments

Unmerged revisions

243. By Kapil Thangavelu

merge trunk and address some formatting comments

242. By Kapil Thangavelu

address some formatting comments

241. By Kapil Thangavelu

additional escalation scenarios, outline next steps broadly

240. By Kapil Thangavelu

security overview and todo spec draft

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== added file 'docs/source/drafts/security.rst'
2--- docs/source/drafts/security.rst 1970-01-01 00:00:00 +0000
3+++ docs/source/drafts/security.rst 2012-06-22 15:29:20 +0000
4@@ -0,0 +1,260 @@
5+Security Overview
6+=================
7+
8+Ensemble is committed to providing a reliable secure mechanism for
9+deploying services. What follows is an overview of these different
10+mechanisms and how they contribute to keeping an ensemble environment
11+secure.
12+
13+Glossary
14+--------
15+
16+First a glossary of terms used in this document.
17+
18+
19+Principal
20+~~~~~~~~~
21+
22+A principal in the context of ensemble can represent any actor or
23+group of actors within the system. Each principal is authenticated via
24+a user name/principal id and password. For example a service unit
25+agent would associate its own unique principal information to its
26+connection, and would thus have access to all nodes that have their
27+ACL mapping explicitly giving access to the node.
28+
29+Token Database
30+~~~~~~~~~~~~~~
31+
32+A mapping of principal id to their acl identity token. The identity
33+token is a md5 checksum of username/password prefixed of the form
34+username:identity_scheme:checksum, where identity_scheme is md5 in the
35+case of ensemble. The mapping is stored in a zookeeper node which is
36+world readable but only writable by the security (ticket granting)
37+agent, which is responsible for creating principals.
38+
39+Zookeeper ACLs
40+~~~~~~~~~~~~~~
41+
42+Ensemble relies on the security facilities provided by the zookeeper's
43+coordination storage, whereby zookeeper automatically restricts access
44+to each node, based on the ACL permission map on each node. This ACL
45+facility maps permissions to principal identity tokens. Zookeeper
46+provides permissions for read, write, delete, create, and admin access
47+to each node. Every zookeeper connection can associate principal
48+credentials to its connection, and all access by that connection is
49+validated against the per node ACL mapping.
50+
51+
52+Additional documentation available here.
53+http://zookeeper.apache.org/doc/trunk/zookeeperProgrammers.html#sc_ZooKeeperAccessControl
54+
55+
56+Security Agent
57+~~~~~~~~~~~~~~
58+
59+An additional zookeeper connected actor responsible for creating principals
60+and providing an up to date token database.
61+
62+The security agent manages a token database (definition to follow),
63+and provides for the creation of new principals and handing out their
64+hash tokens to inquiring parties.
65+
66+Security Policy
67+~~~~~~~~~~~~~~~
68+
69+Each actor employs a security policy, to determine the ACL map for a given
70+node path that may create. The policy simply takes the path to the node
71+to be created, and returns back an ACL map that can be set on the node.
72+
73+
74+Identity
75+--------
76+
77+How the system passes credentials to an actor is a critical aspect to
78+managing principals securely. Every actor in the system needs its own
79+unique principal, to provide an auth identity, the credentials for a
80+principal are known only to the actor utilizing them and transiently
81+the security agent when they are created.
82+
83+Instead of passing principals credentials directly via insecure
84+channels, an actor creating another actor also establishes a principal
85+creation token via the security agent. The principal creation token is
86+a one time use string which can be used to create a principal and its
87+password, and update the token database.
88+
89+The security agent has a simple policy in place regarding principal
90+names and which actors can create them, ie. a provisioning agent can
91+create machine principals, but not service unit principals.
92+
93+If a malicious user intercepts the token and uses it, compared with
94+passing credentials directly it minimizes the time that a third party
95+has to perform such an interception. Moreover invalid use of a token
96+can be logged as foresenic information.
97+
98+One question that emerges with the use of a separate agent for creating
99+identities, is how agents needed for bootstrap recieve their credentials.
100+
101+ - The bootstrap can utilize a specialized OTP interface with a precreated
102+ known value, which it can use to initialize the tree.
103+
104+Transport level security
105+------------------------
106+
107+As zookeeper does not currently support SSL/TLS transport level
108+security, Ensemble utilizes SSH port forwarding to ensure encrypted
109+communications to zookeeper. One significant lacking to this approach,
110+is that any process on the set of ensemble machines can attempt to
111+connect zookeeper to brute force principal passwords.
112+
113+See Also Alternatives#NodeEncryption
114+
115+Privileged Data
116+---------------
117+
118+Certain data stored within zookeeper, is by its nature privileged and
119+should only be shared with agents requiring it for their function. For
120+example the Ensemble provider credentials should only be exposed to
121+the provisioning agent, as its required for it to function, any
122+additional access to the data, would be regarded as a data escalation
123+vulnerability.
124+
125+Additionally services utilize relations to communicate with each
126+other, every service unit of the services participating within a
127+relation gets write access only to its own node within the relation,
128+and has read access to all service unit relation settings. An
129+unrelated service unit from a different service, is not allowed to
130+read any settings from the relation.
131+
132+
133+Data Security
134+-------------
135+
136+A pub key/priv key can be associated to the OTP to
137+
138+
139+OTP Security
140+------------
141+
142+The otp is not secure without an additional enforcement, as it does
143+not exist as a native capability of the zk interface. An additional
144+actor responsible for creating identities and processing OTP tokens
145+would be an alternative (See futures).
146+
147+
148+Relations attacks
149+-----------------
150+
151+Ensemble is comprised of a number of actors connecting to and
152+communicating via a shared storage. When two services enter into a
153+relation, a private bidirectional channel is created for them to
154+exchange data.
155+
156+Ensemble ensures that the zookeeper nodes used for this communication
157+are subject to the proper ACL constraints such that unrelated services
158+are unable to access them.
159+
160+But these relations represent adhoc inter machine communication, which
161+are formula defined. A malicious agent could possibly abuse one of
162+these protocols to further compromise additional agents. Unlike other
163+attack vectors in ensemble, this is one that ensemble can only make
164+minimal safety guarantees regarding, outside of perhaps a simple
165+validation of relation data (currently treated as a binary blob) with
166+relation type associated schemas.
167+
168+The formulas executed by the unit agent provide for user executed code
169+done within an lxc container (with root privileges). LXC provides
170+limited support for security against root in a container, so a
171+container compromise can escalate to a machine level compromise and
172+those of the other units on a machine.
173+
174+
175+Privilege Escalation Scenarios
176+------------------------------
177+
178+We have serveral different levels of escalation within ensemble for
179+malicious code that need to be considered.
180+
181+container escalation
182+++++++++++++++++++++
183+
184+All formula hooks are executed within an lxc container to give a
185+minimally isolated environment. This lxc container is rather trivially
186+exploitable to gain root access on the machine, as formulas execute
187+as root within the container and lxc provides minimal security guarantees
188+atm, which leads to the next escalation level.
189+
190+Future work is needed to provide better security around lxc
191+integration, perhaps via integration of apparmor and ongoing lxc
192+isolation work.
193+
194+Machine escalation
195+++++++++++++++++++
196+
197+A machine is considered compromised if malicious code has root access
198+on the machine, all service units colocated on the machine are also
199+considered compromised if this occurs.
200+
201+Agent Escalation
202+++++++++++++++++
203+
204+An agent is considered compromised if malicious code has an open zookeeper
205+connection with a valid actor principal identity. The malicious code
206+has access to all data exposed via ACL to the compromised identity.
207+
208+Beyond these generic scenarios we have particular escalations which
209+are effectively fatal, as they entail access to sensitive data that
210+spans the ensemble environment or machine provider.
211+
212+A bootstrap machine compromise which allow for disk access could be
213+considered fatal as the Ensemble shared state (zookeeper) data is
214+resident on disk.
215+
216+Certain agents like the provisioning agent, compromise of whose identity
217+would allow malicious code to utilize the machine provider credentials.
218+
219+
220+Access to Deployed services
221+----------------------------
222+
223+A plan for controlled public access to deployed services is provided
224+separately by the expose-services specification.
225+
226+Currently all internal access within a machine provider environment
227+like ec2 is unfiltered.
228+
229+In future we should have machine level firewalling to allow access
230+between services based on their relations.
231+
232+Alternatives
233+------------
234+
235+
236+Node Encryption
237+~~~~~~~~~~~~~~~
238+
239+Principal Agent
240+~~~~~~~~~~~~~~~
241+
242+A security agent responsible for
243+ for transport security via node encryption.
244+
245+Next Steps
246+----------
247+
248+SSH Host Identity Checks
249+
250+we should pull the ssh key of the machine into zk, so connections to a
251+given machine can verify against valid keys of environment machines
252+
253+Formula Storage URLs
254+
255+Currently the formula storage access is referenced by a storage key
256+which is retrieved via the machine provider storage interface. This
257+requires access to the machine provider credentials by Formula Storage
258+by machine agents, which they shouldn't need.
259+
260+- Security Agent & Token Database
261+- Security Policy (Path Based ACL generator)
262+- Connections w/ Principal
263+
264+

Subscribers

People subscribed via source and target branches

to status/vote changes: