Merge lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme into lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk
- Trusty Tahr (14.04)
- readme
- Merge into trunk
Proposed by
Cory Johns
Status: | Merged |
---|---|
Merged at revision: | 42 |
Proposed branch: | lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme |
Merge into: | lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk |
Diff against target: |
278 lines (+110/-110) 3 files modified
README.dev.md (+72/-0) README.md (+36/-108) resources.yaml (+2/-2) |
To merge this branch: | bzr merge lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme |
Related bugs: |
Reviewer | Review Type | Date Requested | Status |
---|---|---|---|
amir sanjar (community) | Approve | ||
Review via email: mp+252617@code.launchpad.net |
Commit message
Description of the change
New READMEs and minor relation cleanups
To post a comment you must log in.
- 42. By Cory Johns
-
Disambiguate the interface in README.dev
Preview Diff
[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1 | === added file 'README.dev.md' |
2 | --- README.dev.md 1970-01-01 00:00:00 +0000 |
3 | +++ README.dev.md 2015-03-11 18:02:10 +0000 |
4 | @@ -0,0 +1,72 @@ |
5 | +## Overview |
6 | + |
7 | +This charm provides computation and storage resources for an Apache Hadoop |
8 | +deployment, and is intended to be used only as a part of that deployment. |
9 | +This document describes how this charm connects to and interacts with the |
10 | +other components of the deployment. |
11 | + |
12 | + |
13 | +## Provided Relations |
14 | + |
15 | +### datanode (interface: dfs-slave) |
16 | + |
17 | +This relation connects this charm to the apache-hadoop-hdfs-master charm. |
18 | +It is a bi-directional interface, with the following keys being exchanged: |
19 | + |
20 | +* Sent to hdfs-master: |
21 | + |
22 | + * `private-address`: Address of this unit, to be registered as a DataNode |
23 | + |
24 | +* Received from hdfs-master: |
25 | + |
26 | + * `private-address`: Address of the HDFS master unit, to provide the NameNode |
27 | + * `ready`: A flag indicating that HDFS is ready to register DataNodes |
28 | + |
29 | +Ports will soon be added to both of these. |
30 | + |
31 | + |
32 | +### nodemanager (interface: mapred-slave) |
33 | + |
34 | +This relation connects this charm to the apache-hadoop-yarn-master charm. |
35 | +It is a bi-directional interface, with the following keys being exchanged: |
36 | + |
37 | +* Sent to yarn-master: |
38 | + |
39 | + * `private-address`: Address of this unit, to be registered as a NodeManager |
40 | + |
41 | +* Received from yarn-master: |
42 | + |
43 | + * `private-address`: Address of the YARN master unit, to provide the ResourceManager |
44 | + * `ready`: A flag indicating that YARN is ready to register NodeManagers |
45 | + |
46 | +Ports will soon be added to both of these. |
47 | + |
48 | + |
49 | +## Required Relations |
50 | + |
51 | +*There are no required relations for this charm.* |
52 | + |
53 | + |
54 | +## Manual Deployment |
55 | + |
56 | +The easiest way to deploy the core Apache Hadoop platform is to use one of |
57 | +the [apache-core-batch-processing-* bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle). |
58 | +However, to manually deploy the base Apache Hadoop platform without using one of the |
59 | +bundles, you can use the following: |
60 | + |
61 | + juju deploy apache-hadoop-hdfs-master hdfs-master |
62 | + juju deploy apache-hadoop-hdfs-secondary secondary-namenode |
63 | + juju deploy apache-hadoop-yarn-master yarn-master |
64 | + juju deploy apache-hadoop-compute-slave compute-slave -n3 |
65 | + juju deploy apache-hadoop-client client |
66 | + juju add-relation yarn-master hdfs-master |
67 | + juju add-relation secondary-namenode hdfs-master |
68 | + juju add-relation compute-slave yarn-master |
69 | + juju add-relation compute-slave hdfs-master |
70 | + juju add-relation client yarn-master |
71 | + juju add-relation client hdfs-master |
72 | + |
73 | +This will create a scalable deployment with separate nodes for each master, |
74 | +and a three unit compute slave (NodeManager and DataNode) cluster. The master |
75 | +charms also support co-locating using the `--to` option to `juju deploy` for |
76 | +more dense deployments. |
77 | |
78 | === modified file 'README.md' |
79 | --- README.md 2015-02-13 22:34:21 +0000 |
80 | +++ README.md 2015-03-11 18:02:10 +0000 |
81 | @@ -1,118 +1,39 @@ |
82 | ## Overview |
83 | |
84 | -This charm is a component of the Apache Hadoop platform. It is intended |
85 | -to be deployed with the other components using the bundle: |
86 | -`bundle:~bigdata-charmers/apache-hadoop` |
87 | - |
88 | -**What is Apache Hadoop?** |
89 | - |
90 | The Apache Hadoop software library is a framework that allows for the |
91 | distributed processing of large data sets across clusters of computers |
92 | using a simple programming model. |
93 | |
94 | -It is designed to scale up from single servers to thousands of machines, |
95 | -each offering local computation and storage. Rather than rely on hardware |
96 | -to deliver high-avaiability, the library itself is designed to detect |
97 | -and handle failures at the application layer, so delivering a |
98 | -highly-availabile service on top of a cluster of computers, each of |
99 | -which may be prone to failures. |
100 | - |
101 | -Apache Hadoop 2.4.1 consists of significant improvements over the previous stable |
102 | -release (hadoop-1.x). |
103 | - |
104 | -Here is a short overview of the improvments to both HDFS and MapReduce. |
105 | - |
106 | - - **HDFS Federation** |
107 | - In order to scale the name service horizontally, federation uses multiple |
108 | - independent Namenodes/Namespaces. The Namenodes are federated, that is, the |
109 | - Namenodes are independent and don't require coordination with each other. |
110 | - The datanodes are used as common storage for blocks by all the Namenodes. |
111 | - Each datanode registers with all the Namenodes in the cluster. Datanodes |
112 | - send periodic heartbeats and block reports and handles commands from the |
113 | - Namenodes. |
114 | - |
115 | - More details are available in the HDFS Federation document: |
116 | - <http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/Federation.html> |
117 | - |
118 | - - **MapReduce NextGen aka YARN aka MRv2** |
119 | - The new architecture introduced in hadoop-0.23, divides the two major functions of the |
120 | - JobTracker: resource management and job life-cycle management into separate components. |
121 | - The new ResourceManager manages the global assignment of compute resources to |
122 | - applications and the per-application ApplicationMaster manages the application‚ |
123 | - scheduling and coordination. |
124 | - An application is either a single job in the sense of classic MapReduce jobs or a DAG of |
125 | - such jobs. |
126 | - |
127 | - The ResourceManager and per-machine NodeManager daemon, which manages the user |
128 | - processes on that machine, form the computation fabric. |
129 | - |
130 | - The per-application ApplicationMaster is, in effect, a framework specific |
131 | - library and is tasked with negotiating resources from the ResourceManager and |
132 | - working with the NodeManager(s) to execute and monitor the tasks. |
133 | - |
134 | - More details are available in the YARN document: |
135 | - <http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YARN.html> |
136 | +This charm deploys a compute / slave node running the NodeManager |
137 | +and DataNode components of |
138 | +[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/), |
139 | +which provides computation and storage resources to the platform. |
140 | |
141 | ## Usage |
142 | |
143 | -This charm manages the compute slave nodes, which include the DataNode and |
144 | -NodeManager components. It is intended to be used with `apache-hadoop-hdfs-master` |
145 | -and `apache-hadoop-yarn-master`. |
146 | - |
147 | -### Simple Usage: Single YARN / HDFS master deployment |
148 | - |
149 | -In this configuration, the YARN and HDFS master components run on the same |
150 | -machine. This is useful for lower-resource deployments:: |
151 | - |
152 | - juju deploy apache-hadoop-hdfs-master hdfs-master |
153 | - juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode --to 1 |
154 | - juju deploy apache-hadoop-yarn-master yarn-master --to 1 |
155 | - juju deploy apache-hadoop-compute-slave compute-slave |
156 | - juju deploy apache-hadoop-client client |
157 | - juju add-relation yarn-master hdfs-master |
158 | - juju add-relation secondary-namenode hdfs-master |
159 | - juju add-relation compute-slave yarn-master |
160 | - juju add-relation compute-slave hdfs-master |
161 | - juju add-relation client yarn-master |
162 | - juju add-relation client hdfs-master |
163 | - |
164 | -Note that the machine number (`--to 1`) should match the machine number |
165 | -for the `hdfs-master` charm. If you previously deployed other services |
166 | -in your environment, you may need to adjust the machine number appropriately. |
167 | - |
168 | - |
169 | -### Scale Out Usage: Separate HDFS, YARN, and compute nodes |
170 | - |
171 | -In this configuration the HDFS and YARN deployments operate on |
172 | -different service units as separate services:: |
173 | - |
174 | - juju deploy apache-hadoop-hdfs-master hdfs-master |
175 | - juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode |
176 | - juju deploy apache-hadoop-yarn-master yarn-master |
177 | - juju deploy apache-hadoop-compute-slave compute-slave -n 3 |
178 | - juju deploy apache-hadoop-client client |
179 | - juju add-relation yarn-master hdfs-master |
180 | - juju add-relation secondary-namenode hdfs-master |
181 | - juju add-relation compute-slave yarn-master |
182 | - juju add-relation compute-slave hdfs-master |
183 | - juju add-relation client yarn-master |
184 | - juju add-relation client hdfs-master |
185 | - |
186 | -The `-n 3` option can be adjusted according to the number of compute nodes |
187 | -you need. You can also add additional compute nodes later:: |
188 | - |
189 | - juju add-unit compute-slave -n 2 |
190 | - |
191 | - |
192 | -### To deploy a Hadoop service with elasticsearch service:: |
193 | - # deploy ElasticSearch locally: |
194 | - **juju deploy elasticsearch elasticsearch** |
195 | - # elasticsearch-hadoop.jar file will be added to LIBJARS path |
196 | - # Recommanded to use hadoop -libjars option to included elk jar file |
197 | - **juju add-unit -n elasticsearch** |
198 | - # deploy hive service by any senarios mentioned above |
199 | - # associate Hive with elasticsearch |
200 | - **juju add-relation {hadoop master}:elasticsearch elasticsearch:client** |
201 | +This charm is intended to be deployed via one of the |
202 | +[bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle). |
203 | +For example: |
204 | + |
205 | + juju quickstart u/bigdata-dev/apache-analytics-sql |
206 | + |
207 | +This will deploy the Apache Hadoop platform with Apache Hive available to |
208 | +perform SQL-like queries against your data. |
209 | + |
210 | +You can also manually load and run map-reduce jobs via the client: |
211 | + |
212 | + juju scp my-job.jar client/0: |
213 | + juju ssh client/0 |
214 | + hadoop jar my-job.jar |
215 | + |
216 | + |
217 | +### Scaling |
218 | + |
219 | +The compute-slave node is the "workhorse" of the Apache Hadoop platform. |
220 | +To scale your deployment's performance, you can simply add more compute-slave |
221 | +units. For example, to add three mode units: |
222 | + |
223 | + juju add-unit compute-slave -n 3 |
224 | |
225 | |
226 | ## Deploying in Network-Restricted Environments |
227 | @@ -121,12 +42,14 @@ |
228 | access. To deploy in this environment, you will need a local mirror to serve |
229 | the packages and resources required by these charms. |
230 | |
231 | + |
232 | ### Mirroring Packages |
233 | |
234 | You can setup a local mirror for apt packages using squid-deb-proxy. |
235 | For instructions on configuring juju to use this, see the |
236 | [Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html). |
237 | |
238 | + |
239 | ### Mirroring Resources |
240 | |
241 | In addition to apt packages, the Apache Hadoop charms require a few binary |
242 | @@ -144,14 +67,19 @@ |
243 | |
244 | You can fetch the resources for all of the Apache Hadoop charms |
245 | (`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`, |
246 | -`apache-hadoop-compute-slave`, `apache-hadoop-client`, etc) into a single |
247 | +`apache-hadoop-hdfs-secondary`, `apache-hadoop-client`, etc) into a single |
248 | directory and serve them all with a single `juju resources serve` instance. |
249 | |
250 | |
251 | ## Contact Information |
252 | -amir sanjar <amir.sanjar@canonical.com> |
253 | + |
254 | +* Amir Sanjar <amir.sanjar@canonical.com> |
255 | +* Cory Johns <cory.johns@canonical.com> |
256 | +* Kevin Monroe <kevin.monroe@canonical.com> |
257 | + |
258 | |
259 | ## Hadoop |
260 | + |
261 | - [Apache Hadoop](http://hadoop.apache.org/) home page |
262 | - [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html) |
263 | - [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html) |
264 | |
265 | === modified file 'resources.yaml' |
266 | --- resources.yaml 2015-03-06 22:28:48 +0000 |
267 | +++ resources.yaml 2015-03-11 18:02:10 +0000 |
268 | @@ -8,8 +8,8 @@ |
269 | six: |
270 | pypi: six |
271 | charmhelpers: |
272 | - pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150306222841-0ibvrfaungfdtkn8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz |
273 | - hash: 787fc1cc70fc89e653b08dd192ec702d844709e450ff67577b7c5e99b6bbf39b |
274 | + pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150310214330-f2bk32gk92iinrx8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz |
275 | + hash: a1cafa5e315d3a33db15a8e18f56b4c64d47c2c1c6fcbdba81e42bb00642971c |
276 | hash_type: sha256 |
277 | optional_resources: |
278 | hadoop-aarch64: |
we will also need to return hostname of computes nodes in future