1
=== added file 'README.dev.md'
2
--- README.dev.md	1970-01-01 00:00:00 +0000
3
+++ README.dev.md	2015-03-11 18:02:10 +0000
4
@@ -0,0 +1,72 @@
5
1
## Overview
6
2
7
3
This charm provides computation and storage resources for an Apache Hadoop
8
4
deployment, and is intended to be used only as a part of that deployment.
9
5
This document describes how this charm connects to and interacts with the
10
6
other components of the deployment.
11
7
12
8
13
9
## Provided Relations
14
10
15
11
### datanode (interface: dfs-slave)
16
12
17
13
This relation connects this charm to the apache-hadoop-hdfs-master charm.
18
14
It is a bi-directional interface, with the following keys being exchanged:
19
15
20
16
* Sent to hdfs-master:
21
17
22
18
  * `private-address`: Address of this unit, to be registered as a DataNode
23
19
24
20
* Received from hdfs-master:
25
21
26
22
  * `private-address`: Address of the HDFS master unit, to provide the NameNode
27
23
  * `ready`: A flag indicating that HDFS is ready to register DataNodes
28
24
29
25
Ports will soon be added to both of these.
30
26
31
27
32
28
### nodemanager (interface: mapred-slave)
33
29
34
30
This relation connects this charm to the apache-hadoop-yarn-master charm.
35
31
It is a bi-directional interface, with the following keys being exchanged:
36
32
37
33
* Sent to yarn-master:
38
34
39
35
  * `private-address`: Address of this unit, to be registered as a NodeManager
40
36
41
37
* Received from yarn-master:
42
38
43
39
  * `private-address`: Address of the YARN master unit, to provide the ResourceManager
44
40
  * `ready`: A flag indicating that YARN is ready to register NodeManagers
45
41
46
42
Ports will soon be added to both of these.
47
43
48
44
49
45
## Required Relations
50
46
51
47
*There are no required relations for this charm.*
52
48
53
49
54
50
## Manual Deployment
55
51
56
52
The easiest way to deploy the core Apache Hadoop platform is to use one of
57
53
the [apache-core-batch-processing-* bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
58
54
However, to manually deploy the base Apache Hadoop platform without using one of the
59
55
bundles, you can use the following:
60
56
61
57
    juju deploy apache-hadoop-hdfs-master hdfs-master
62
58
    juju deploy apache-hadoop-hdfs-secondary secondary-namenode
63
59
    juju deploy apache-hadoop-yarn-master yarn-master
64
60
    juju deploy apache-hadoop-compute-slave compute-slave -n3
65
61
    juju deploy apache-hadoop-client client
66
62
    juju add-relation yarn-master hdfs-master
67
63
    juju add-relation secondary-namenode hdfs-master
68
64
    juju add-relation compute-slave yarn-master
69
65
    juju add-relation compute-slave hdfs-master
70
66
    juju add-relation client yarn-master
71
67
    juju add-relation client hdfs-master
72
68
73
69
This will create a scalable deployment with separate nodes for each master,
74
70
and a three unit compute slave (NodeManager and DataNode) cluster.  The master
75
71
charms also support co-locating using the `--to` option to `juju deploy` for
76
72
more dense deployments.
77
0
73
78
=== modified file 'README.md'
79
--- README.md	2015-02-13 22:34:21 +0000
80
+++ README.md	2015-03-11 18:02:10 +0000
81
@@ -1,118 +1,39 @@
82
1
## Overview
1
## Overview
83
2
2
84
3
This charm is a component of the Apache Hadoop platform.  It is intended
85
4
to be deployed with the other components using the bundle:
86
5
`bundle:~bigdata-charmers/apache-hadoop`
87
6
88
7
**What is Apache Hadoop?**
89
8
90
9
The Apache Hadoop software library is a framework that allows for the
3
The Apache Hadoop software library is a framework that allows for the
91
10
distributed processing of large data sets across clusters of computers
4
distributed processing of large data sets across clusters of computers
92
11
using a simple programming model.
5
using a simple programming model.
93
12
6
136
13
It is designed to scale up from single servers to thousands of machines,
7
This charm deploys a compute / slave node running the NodeManager
137
14
each offering local computation and storage. Rather than rely on hardware
8
and DataNode components of
138
15
to deliver high-avaiability, the library itself is designed to detect
9
[Apache Hadoop 2.4.1](http://hadoop.apache.org/docs/r2.4.1/),
139
16
and handle failures at the application layer, so delivering a
10
which provides computation and storage resources to the platform.
98
17
highly-availabile service on top of a cluster of computers, each of
99
18
which may be prone to failures.
100
19
101
20
Apache Hadoop 2.4.1 consists of significant improvements over the previous stable
102
21
release (hadoop-1.x).
103
22
104
23
Here is a short overview of the improvments to both HDFS and MapReduce.
105
24
106
25
 - **HDFS Federation**
107
26
   In order to scale the name service horizontally, federation uses multiple
108
27
   independent Namenodes/Namespaces. The Namenodes are federated, that is, the
109
28
   Namenodes are independent   and don't require coordination with each other.
110
29
   The datanodes are used as common storage for blocks by all the Namenodes.
111
30
   Each datanode registers with all the Namenodes in the cluster.   Datanodes
112
31
   send periodic heartbeats and block reports and handles commands from the
113
32
   Namenodes.
114
33
115
34
   More details are available in the HDFS Federation document:
116
35
   <http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/Federation.html>
117
36
118
37
 - **MapReduce NextGen aka YARN aka MRv2**
119
38
   The new architecture introduced in hadoop-0.23, divides the two major functions of the
120
39
   JobTracker: resource management and job life-cycle management into separate components.
121
40
   The new ResourceManager manages the global assignment of compute resources to
122
41
   applications and the per-application ApplicationMaster manages the application‚
123
42
   scheduling and coordination.
124
43
   An application is either a single job in the sense of classic MapReduce jobs or a DAG of
125
44
   such jobs.
126
45
127
46
   The ResourceManager and per-machine NodeManager daemon, which manages the user
128
47
   processes on   that machine, form the computation fabric.
129
48
130
49
   The per-application ApplicationMaster is, in effect, a framework specific
131
50
   library and is tasked with negotiating resources from the ResourceManager and
132
51
   working with the NodeManager(s) to execute and monitor the tasks.
133
52
134
53
   More details are available in the YARN document:
135
54
   <http://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/YARN.html>
140
55
11
141
56
## Usage
12
## Usage
142
57
13
201
58
This charm manages the compute slave nodes, which include the DataNode and
14
This charm is intended to be deployed via one of the
202
59
NodeManager components.  It is intended to be used with `apache-hadoop-hdfs-master`
15
[bundles](https://jujucharms.com/q/bigdata-dev/apache?type=bundle).
203
60
and `apache-hadoop-yarn-master`.
16
For example:
204
61
17
205
62
### Simple Usage: Single YARN / HDFS master deployment
18
    juju quickstart u/bigdata-dev/apache-analytics-sql
206
63
19
207
64
In this configuration, the YARN and HDFS master components run on the same
20
This will deploy the Apache Hadoop platform with Apache Hive available to
208
65
machine.  This is useful for lower-resource deployments::
21
perform SQL-like queries against your data.
209
66
22
210
67
    juju deploy apache-hadoop-hdfs-master hdfs-master
23
You can also manually load and run map-reduce jobs via the client:
211
68
    juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode --to 1
24
212
69
    juju deploy apache-hadoop-yarn-master yarn-master --to 1
25
    juju scp my-job.jar client/0:
213
70
    juju deploy apache-hadoop-compute-slave compute-slave
26
    juju ssh client/0
214
71
    juju deploy apache-hadoop-client client
27
    hadoop jar my-job.jar
215
72
    juju add-relation yarn-master hdfs-master
28
216
73
    juju add-relation secondary-namenode hdfs-master
29
217
74
    juju add-relation compute-slave yarn-master
30
### Scaling
218
75
    juju add-relation compute-slave hdfs-master
31
219
76
    juju add-relation client yarn-master
32
The compute-slave node is the "workhorse" of the Apache Hadoop platform.
220
77
    juju add-relation client hdfs-master
33
To scale your deployment's performance, you can simply add more compute-slave
221
78
34
units.  For example, to add three mode units:
222
79
Note that the machine number (`--to 1`) should match the machine number
35
223
80
for the `hdfs-master` charm.  If you previously deployed other services
36
    juju add-unit compute-slave -n 3
166
81
in your environment, you may need to adjust the machine number appropriately.
167
82
168
83
169
84
### Scale Out Usage: Separate HDFS, YARN, and compute nodes
170
85
171
86
In this configuration the HDFS and YARN deployments operate on
172
87
different service units as separate services::
173
88
174
89
    juju deploy apache-hadoop-hdfs-master hdfs-master
175
90
    juju deploy apache-hadoop-hdfs-checkpoint secondary-namenode
176
91
    juju deploy apache-hadoop-yarn-master yarn-master
177
92
    juju deploy apache-hadoop-compute-slave compute-slave -n 3
178
93
    juju deploy apache-hadoop-client client
179
94
    juju add-relation yarn-master hdfs-master
180
95
    juju add-relation secondary-namenode hdfs-master
181
96
    juju add-relation compute-slave yarn-master
182
97
    juju add-relation compute-slave hdfs-master
183
98
    juju add-relation client yarn-master
184
99
    juju add-relation client hdfs-master
185
100
186
101
The `-n 3` option can be adjusted according to the number of compute nodes
187
102
you need.  You can also add additional compute nodes later::
188
103
189
104
    juju add-unit compute-slave -n 2
190
105
191
106
192
107
### To deploy a Hadoop service with elasticsearch service::
193
108
    # deploy ElasticSearch locally:
194
109
    **juju deploy elasticsearch elasticsearch**
195
110
    # elasticsearch-hadoop.jar file will be added to LIBJARS path
196
111
    # Recommanded to use hadoop -libjars option to included elk jar file
197
112
    **juju add-unit -n elasticsearch**
198
113
    # deploy hive service by any senarios mentioned above
199
114
    # associate Hive with elasticsearch
200
115
    **juju add-relation {hadoop master}:elasticsearch elasticsearch:client**
224
116
37
225
117
38
226
118
## Deploying in Network-Restricted Environments
39
## Deploying in Network-Restricted Environments
227
@@ -121,12 +42,14 @@
228
121
access. To deploy in this environment, you will need a local mirror to serve
42
access. To deploy in this environment, you will need a local mirror to serve
229
122
the packages and resources required by these charms.
43
the packages and resources required by these charms.
230
123
44
231
45
232
124
### Mirroring Packages
46
### Mirroring Packages
233
125
47
234
126
You can setup a local mirror for apt packages using squid-deb-proxy.
48
You can setup a local mirror for apt packages using squid-deb-proxy.
235
127
For instructions on configuring juju to use this, see the
49
For instructions on configuring juju to use this, see the
236
128
[Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html).
50
[Juju Proxy Documentation](https://juju.ubuntu.com/docs/howto-proxies.html).
237
129
51
238
52
239
130
### Mirroring Resources
53
### Mirroring Resources
240
131
54
241
132
In addition to apt packages, the Apache Hadoop charms require a few binary
55
In addition to apt packages, the Apache Hadoop charms require a few binary
242
@@ -144,14 +67,19 @@
243
144
67
244
145
You can fetch the resources for all of the Apache Hadoop charms
68
You can fetch the resources for all of the Apache Hadoop charms
245
146
(`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`,
69
(`apache-hadoop-hdfs-master`, `apache-hadoop-yarn-master`,
247
147
`apache-hadoop-compute-slave`, `apache-hadoop-client`, etc) into a single
70
`apache-hadoop-hdfs-secondary`, `apache-hadoop-client`, etc) into a single
248
148
directory and serve them all with a single `juju resources serve` instance.
71
directory and serve them all with a single `juju resources serve` instance.
249
149
72
250
150
73
251
151
## Contact Information
74
## Contact Information
253
152
amir sanjar <amir.sanjar@canonical.com>
75
254
76
* Amir Sanjar <amir.sanjar@canonical.com>
255
77
* Cory Johns <cory.johns@canonical.com>
256
78
* Kevin Monroe <kevin.monroe@canonical.com>
257
79
258
153
80
259
154
## Hadoop
81
## Hadoop
260
82
261
155
- [Apache Hadoop](http://hadoop.apache.org/) home page
83
- [Apache Hadoop](http://hadoop.apache.org/) home page
262
156
- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)
84
- [Apache Hadoop bug trackers](http://hadoop.apache.org/issue_tracking.html)
263
157
- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)
85
- [Apache Hadoop mailing lists](http://hadoop.apache.org/mailing_lists.html)
264
158
86
265
=== modified file 'resources.yaml'
266
--- resources.yaml	2015-03-06 22:28:48 +0000
267
+++ resources.yaml	2015-03-11 18:02:10 +0000
268
@@ -8,8 +8,8 @@
269
8
  six:
8
  six:
270
9
    pypi: six
9
    pypi: six
271
10
  charmhelpers:
10
  charmhelpers:
274
11
    pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150306222841-0ibvrfaungfdtkn8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz
11
    pypi: http://bazaar.launchpad.net/~bigdata-dev/bigdata-data/trunk/download/cory.johns%40canonical.com-20150310214330-f2bk32gk92iinrx8/charmhelpers0.2.2.ta-20150304033309-4fa7ewnosqavnwms-1/charmhelpers-0.2.2.tar.gz
275
12
    hash: 787fc1cc70fc89e653b08dd192ec702d844709e450ff67577b7c5e99b6bbf39b
12
    hash: a1cafa5e315d3a33db15a8e18f56b4c64d47c2c1c6fcbdba81e42bb00642971c
276
13
    hash_type: sha256
13
    hash_type: sha256
277
14
optional_resources:
14
optional_resources:
278
15
  hadoop-aarch64:
15
  hadoop-aarch64:
Status:	Merged
Merged at revision:	42
Proposed branch:	lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme
Merge into:	lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/trunk
Diff against target:	278 lines (+110/-110) 3 files modified README.dev.md (+72/-0) README.md (+36/-108) resources.yaml (+2/-2)
To merge this branch:	bzr merge lp:~bigdata-dev/charms/trusty/apache-hadoop-compute-slave/readme
Related bugs:	Link a bug report
Reviewer	Review Type	Date Requested	Status
amir sanjar (community)		2015-03-11	Approve on 2015-03-11
Review via email: mp+252617@code.launchpad.net