Juju Charms Collection
hdp-hadoop package

Merge lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes into lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk

Proposed by Kevin W Monroe on 2015-02-06

Status:	Needs review
Proposed branch:	lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes
Merge into:	lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk
Diff against target:	104 lines (+23/-6) 3 files modified README.md (+1/-1) hooks/bdutils.py (+10/-1) hooks/hdp-hadoop-common.py (+12/-4)
To merge this branch:	bzr merge lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes
Related bugs:	Link a bug report

Reviewer	Review Type	Date Requested	Status
Juju Big Data Development		2015-02-06	Pending
Review via email: mp+248978@code.launchpad.net

Description of the change

We test if services are running with 'is_jvm_service_active', which uses jps. This was running as root on our units, but that doesn't show process names that we can parse to see if a service is really active. For example, here's jps output running as root on a yarn-hdfs master:

root@juju-canonistack-machine-20:~# jps
27783 Jps
22062 -- process information unavailable
23542 -- process information unavailable

That's less than helpful. So much so, that we get relation failures because the charms try to fire up services that are already running. With this MP, we run jps as the appropriate user for a given service (usually either hdfs or yarn).

This yields goodness:

ubuntu@juju-canonistack-machine-20:~$ sudo su - hdfs -c jps
22062 NameNode
27825 Jps

ubuntu@juju-canonistack-machine-20:~$ sudo su - yarn -c jps
23542 ResourceManager
27839 Jps

I'm not a big fan of having a dict with hard coded strings, but the alternative is to pass a username in with every call to is_jvm_service_active. I'll go that route if the herd wants, but this way was less typing for me.

Revision history for this message

Kevin W Monroe (kwmonroe) wrote on 2015-02-06:

Oh, I also snuck in a readme change like a boss. It made the suggested deployment undumb.

Unmerged revisions

41. By Kevin W Monroe on 2015-02-06: update readme with sane deployment. need to run 'jps' as the correct user in is_jvm_service_active. stop services if they're already running during certain hooks.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk

Subscribers

People subscribed via source and target branches

to all changes:

Juju Big Data Development

1	=== modified file 'README.md'
2	--- README.md 2014-12-10 23:31:55 +0000
3	+++ README.md 2015-02-06 22:41:08 +0000
4	@@ -57,7 +57,7 @@
5	service units as HDFS namenode and the HDFS datanodes also run YARN NodeManager::
6	juju deploy hdp-hadoop yarn-hdfs-master
7	juju deploy hdp-hadoop compute-node
8	- juju add-unit -n 2 yarn-hdfs-master
9	+ juju add-unit -n 2 compute-node
10	juju add-relation yarn-hdfs-master:namenode compute-node:datanode
11	juju add-relation yarn-hdfs-master:resourcemanager compute-node:nodemanager
12
13
14	=== modified file 'hooks/bdutils.py'
15	--- hooks/bdutils.py 2014-12-26 13:50:41 +0000
16	+++ hooks/bdutils.py 2015-02-06 22:41:08 +0000
17	@@ -128,7 +128,16 @@
18	os.environ[ll[0]] = ll[1].strip().strip(';').strip("\"").strip()
19
20	def is_jvm_service_active(processname):
21	- cmd=["jps"]
22	+ processusers = {
23	+ "JobHistoryServer": os.environ['YARN_USER'],
24	+ "ResourceManager": os.environ['YARN_USER'],
25	+ "NodeManager": os.environ['YARN_USER'],
26	+ "DataNode": os.environ['HDFS_USER'],
27	+ "NameNode": os.environ['HDFS_USER'],
28	+ }
29	+ # set user based on given process, defaulting to hdfs user
30	+ username = processusers.get(processname, os.environ['HDFS_USER'])
31	+ cmd = shlex.split("su {u} -c jps".format(u=username))
32	p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
33	out, err = p.communicate()
34	if err == None and str(out).find(processname) != -1:
35
36	=== modified file 'hooks/hdp-hadoop-common.py'
37	--- hooks/hdp-hadoop-common.py 2014-12-26 13:50:41 +0000
38	+++ hooks/hdp-hadoop-common.py 2015-02-06 22:41:08 +0000
39	@@ -434,6 +434,7 @@
40	@hooks.hook('resourcemanager-relation-joined')
41	def resourcemanager_relation_joined():
42	log ("==> resourcemanager-relation-joined","INFO")
43	+ setHadoopEnvVar()
44	if is_jvm_service_active("ResourceManager"):
45	relation_set(resourceManagerReady=True)
46	relation_set(resourceManager_hostname=get_unit_hostname())
47	@@ -443,12 +444,12 @@
48	sys.exit(0)
49	shutil.copy(os.path.join(os.path.sep, os.environ['CHARM_DIR'],\
50	'files', 'scripts', "terasort.sh"), home)
51	- setHadoopEnvVar()
52	relation_set(resourceManager_ip=unit_get('private-address'))
53	relation_set(resourceManager_hostname=get_unit_hostname())
54	configureYarn(unit_get('private-address'))
55	start_resourcemanager(os.environ["YARN_USER"])
56	- start_jobhistory()
57	+ # TODO: (kwm) start_jh fails if historyserver is running. is it ok to restart_jh here?
58	+ restart_jobhistory()
59	open_port(8025)
60	open_port(8050)
61	open_port(8020)
62	@@ -475,6 +476,9 @@
63	# nodemanager requires data node daemon
64	if not is_jvm_service_active("DataNode"):
65	start_datanode(os.environ['HDFS_USER'])
66	+ # TODO: (kwm) start_nm fails if nm is running. is it ok to stop first?
67	+ if is_jvm_service_active("NodeManager"):
68	+ stop_nodemanager(os.environ["YARN_USER"])
69	start_nodemanager(os.environ["YARN_USER"])
70	open_port(8025)
71	open_port(8030)
72	@@ -506,11 +510,11 @@
73	def namenode_relation_joined():
74	log("Configuring namenode - joined phase", "INFO")
75
76	+ setHadoopEnvVar()
77	if is_jvm_service_active("NameNode"):
78	relation_set(nameNodeReady=True)
79	relation_set(namenode_hostname=get_unit_hostname())
80	return
81	- setHadoopEnvVar()
82	setDirPermission(os.environ['DFS_NAME_DIR'], os.environ['HDFS_USER'], os.environ['HADOOP_GROUP'], 0755)
83	relation_set(namenode_hostname=get_unit_hostname())
84	configureHDFS(unit_get('private-address'))
85	@@ -523,7 +527,8 @@
86	HDFS_command("dfs -mkdir -p /user/ubuntu")
87	HDFS_command("dfs -chown ubuntu /user/ubuntu")
88	HDFS_command("dfs -chmod -R 755 /user/ubuntu")
89	- start_jobhistory()
90	+ # TODO: (kwm) start_jh fails if historyserver is running. is it ok to restart_jh here?
91	+ restart_jobhistory()
92	open_port(8020)
93	open_port(8010)
94	open_port(50070)
95	@@ -550,6 +555,9 @@
96	fileSetKV(hosts_path, nodename_ip+' ', nodename_hostname)
97	configureHDFS(nodename_ip)
98	setDirPermission(os.environ['DFS_DATA_DIR'], os.environ['HDFS_USER'], os.environ['HADOOP_GROUP'], 0750)
99	+ # TODO: (kwm) start_dn fails if dn is running. is it ok to stop first?
100	+ if is_jvm_service_active("DataNode"):
101	+ stop_datanode(os.environ["HDFS_USER"])
102	start_datanode(os.environ["HDFS_USER"])
103	if not is_jvm_service_active("DataNode"):
104	log("error ==> DataNode failed to start")

Juju Charms Collectionhdp-hadoop package