This is going into a very good direction.

Here are some comments on a first read:

[1]

47	+to each node. Every zookeeper connection can associate principal
48	+credentials to its connection, and all access by that connection is
49	+validated against the per node ACL mapping.

This would be a good place to link some upstream documentation for zk.

[2]

Just in terms of style, on the big reorganization I've standardized the headings
on the three styles:

  1) ===
  2) ---
  3) ~~~

I suggest we try to keep this convention across all documents, otherwise it's
hard to follow what's a sub-header or not (e.g. I wouldn't expect === as a
sub-header of ---)

[3]

55	+An additional zookeeper connected actor responsible for creating principals
56	+and providing an up to date token database.

Do we need to manage an external agent for that?

[4]

65	+Each actor employs a security policy, to determine the ACL map for a given
66	+node path that may create. The policy simply takes the path to the node
67	+to be created, and returns back an ACL map that can be set on the node.

This wasn't very clear on a first read.  Some ideas:

- "employs" could mean different things in that context, so I suggest replacing by
  a less open synonym ("uses"?).

- "that may create" feels like missing the subject ("that the actor may
  create?")

- At least the first comma should be dropped.

- s/can be set/must be set/?

[5]

73 Every actor in the system needs its own
74	+unique principal, to provide an auth identity, the credentials for a
75	+principal are known only to the actor utilizing them and transiently
76	+the security agent when they are created.

This sentence isn't reading well.

[6]

78	+Instead of passing principals credentials directly via insecure
79	+channels, an actor creating another actor also establishes a principal
80	+creation token via the security agent. The principal creation token is
81	+a one time use string which can be used to create a principal and its
82	+password, and update the token database. 
83	+
84	+The security agent has a simple policy in place regarding principal
85	+names and which actors can create them, ie. a provisioning agent can
86	+create machine principals, but not service unit principals.

It's not clear from the document how that's going to work, and it also feels a
little bit like delegating permissions could be more complexity than is
necessary (e.g. what about the agent itself creating these identities instead
of providing a one time token?).  This area needs some debate.

[7]

88	+If a malicious user intercepts the token and uses it, compared with
89	+passing credentials directly it minimizes the time that a third party
90	+has to perform such an interception. Moreover invalid use of a token
91	+can be logged as foresenic information.

This assumes knowledge about about how such interceptions would take place,
or why it minimizes the time, that isn't available in the document up to
then.

Also, we should design the security system in a way that such interceptions
can't take place, rather than minimizing the time an interception can be
made (ssh vs. telnet).

[8]

118	+Additionally services utilize relations to communicate with each
119	+other, every service unit of the services participating within a
120	+relation gets write access only to its own node within the relation,
121	+and has read access to all service unit relation settings. An
122	+unrelated service unit from a different service, is not allowed to
123	+read any settings from the relation.

Well written, just a few punctuation issues:

s/Additionally/Additionally,/
s/other, every/other. Every/
s/service, is/service is/


[9]

136	+are unable to access them.
137	+
138	+But these relations represent adhoc inter machine communication, which

s/them.\n\nBut/them. But/


[10]

146	+The formulas executed by the unit agent provide for user executed code
147	+done within an lxc container (with root privileges). LXC provides
148	+limited support for security against root in a container, so a
149	+container compromise can escalate to a machine level compromise and
150	+those of the other units on a machine.

This is sensible info, but it's unrelated to relation attacks.


[11]

s/lxc/LXC/ on the whole document.

[12]

153	+Privilege Escalation Scenarios
154	+------------------------------
159	+container escalation
160	+++++++++++++++++++++
198	+Access to Deployed services
199	+----------------------------

The document should use "Title case topics" consistently.

[13]

+like ec2 is unfiltered.

s/ec2/EC2/

[14]

+In future we should have machine level firewalling to allow access

s/future/the future/

[15]

+Next Steps

This section needs further love.  We can leave it unfinished, but it needs
to be at least polished enough to look basically ok (proper headers,
sentences starting with capitals, etc).

Also, I think we should possibly split that section onto a separate document
which is specific to the implementation details of the high-level spec, and
keep this text as a pleasant read for someone that wants to understand it
from a high-level perspective without diving into details.


Good work, thanks!