Merge lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes into lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk

Proposed by Kevin W Monroe
Status: Needs review
Proposed branch: lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes
Merge into: lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk
Diff against target: 104 lines (+23/-6)
3 files modified
README.md (+1/-1)
hooks/bdutils.py (+10/-1)
hooks/hdp-hadoop-common.py (+12/-4)
To merge this branch: bzr merge lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes
Reviewer Review Type Date Requested Status
Juju Big Data Development Pending
Review via email: mp+248978@code.launchpad.net

Description of the change

We test if services are running with 'is_jvm_service_active', which uses jps. This was running as root on our units, but that doesn't show process names that we can parse to see if a service is really active. For example, here's jps output running as root on a yarn-hdfs master:

root@juju-canonistack-machine-20:~# jps
27783 Jps
22062 -- process information unavailable
23542 -- process information unavailable

That's less than helpful. So much so, that we get relation failures because the charms try to fire up services that are already running. With this MP, we run jps as the appropriate user for a given service (usually either hdfs or yarn).

This yields goodness:

ubuntu@juju-canonistack-machine-20:~$ sudo su - hdfs -c jps
22062 NameNode
27825 Jps

ubuntu@juju-canonistack-machine-20:~$ sudo su - yarn -c jps
23542 ResourceManager
27839 Jps

I'm not a big fan of having a dict with hard coded strings, but the alternative is to pass a username in with every call to is_jvm_service_active. I'll go that route if the herd wants, but this way was less typing for me.

To post a comment you must log in.
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Oh, I also snuck in a readme change like a boss. It made the suggested deployment undumb.

Unmerged revisions

41. By Kevin W Monroe

update readme with sane deployment. need to run 'jps' as the correct user in is_jvm_service_active. stop services if they're already running during certain hooks.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'README.md'
2--- README.md 2014-12-10 23:31:55 +0000
3+++ README.md 2015-02-06 22:41:08 +0000
4@@ -57,7 +57,7 @@
5 service units as HDFS namenode and the HDFS datanodes also run YARN NodeManager::
6 juju deploy hdp-hadoop yarn-hdfs-master
7 juju deploy hdp-hadoop compute-node
8- juju add-unit -n 2 yarn-hdfs-master
9+ juju add-unit -n 2 compute-node
10 juju add-relation yarn-hdfs-master:namenode compute-node:datanode
11 juju add-relation yarn-hdfs-master:resourcemanager compute-node:nodemanager
12
13
14=== modified file 'hooks/bdutils.py'
15--- hooks/bdutils.py 2014-12-26 13:50:41 +0000
16+++ hooks/bdutils.py 2015-02-06 22:41:08 +0000
17@@ -128,7 +128,16 @@
18 os.environ[ll[0]] = ll[1].strip().strip(';').strip("\"").strip()
19
20 def is_jvm_service_active(processname):
21- cmd=["jps"]
22+ processusers = {
23+ "JobHistoryServer": os.environ['YARN_USER'],
24+ "ResourceManager": os.environ['YARN_USER'],
25+ "NodeManager": os.environ['YARN_USER'],
26+ "DataNode": os.environ['HDFS_USER'],
27+ "NameNode": os.environ['HDFS_USER'],
28+ }
29+ # set user based on given process, defaulting to hdfs user
30+ username = processusers.get(processname, os.environ['HDFS_USER'])
31+ cmd = shlex.split("su {u} -c jps".format(u=username))
32 p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
33 out, err = p.communicate()
34 if err == None and str(out).find(processname) != -1:
35
36=== modified file 'hooks/hdp-hadoop-common.py'
37--- hooks/hdp-hadoop-common.py 2014-12-26 13:50:41 +0000
38+++ hooks/hdp-hadoop-common.py 2015-02-06 22:41:08 +0000
39@@ -434,6 +434,7 @@
40 @hooks.hook('resourcemanager-relation-joined')
41 def resourcemanager_relation_joined():
42 log ("==> resourcemanager-relation-joined","INFO")
43+ setHadoopEnvVar()
44 if is_jvm_service_active("ResourceManager"):
45 relation_set(resourceManagerReady=True)
46 relation_set(resourceManager_hostname=get_unit_hostname())
47@@ -443,12 +444,12 @@
48 sys.exit(0)
49 shutil.copy(os.path.join(os.path.sep, os.environ['CHARM_DIR'],\
50 'files', 'scripts', "terasort.sh"), home)
51- setHadoopEnvVar()
52 relation_set(resourceManager_ip=unit_get('private-address'))
53 relation_set(resourceManager_hostname=get_unit_hostname())
54 configureYarn(unit_get('private-address'))
55 start_resourcemanager(os.environ["YARN_USER"])
56- start_jobhistory()
57+ # TODO: (kwm) start_jh fails if historyserver is running. is it ok to restart_jh here?
58+ restart_jobhistory()
59 open_port(8025)
60 open_port(8050)
61 open_port(8020)
62@@ -475,6 +476,9 @@
63 # nodemanager requires data node daemon
64 if not is_jvm_service_active("DataNode"):
65 start_datanode(os.environ['HDFS_USER'])
66+ # TODO: (kwm) start_nm fails if nm is running. is it ok to stop first?
67+ if is_jvm_service_active("NodeManager"):
68+ stop_nodemanager(os.environ["YARN_USER"])
69 start_nodemanager(os.environ["YARN_USER"])
70 open_port(8025)
71 open_port(8030)
72@@ -506,11 +510,11 @@
73 def namenode_relation_joined():
74 log("Configuring namenode - joined phase", "INFO")
75
76+ setHadoopEnvVar()
77 if is_jvm_service_active("NameNode"):
78 relation_set(nameNodeReady=True)
79 relation_set(namenode_hostname=get_unit_hostname())
80 return
81- setHadoopEnvVar()
82 setDirPermission(os.environ['DFS_NAME_DIR'], os.environ['HDFS_USER'], os.environ['HADOOP_GROUP'], 0755)
83 relation_set(namenode_hostname=get_unit_hostname())
84 configureHDFS(unit_get('private-address'))
85@@ -523,7 +527,8 @@
86 HDFS_command("dfs -mkdir -p /user/ubuntu")
87 HDFS_command("dfs -chown ubuntu /user/ubuntu")
88 HDFS_command("dfs -chmod -R 755 /user/ubuntu")
89- start_jobhistory()
90+ # TODO: (kwm) start_jh fails if historyserver is running. is it ok to restart_jh here?
91+ restart_jobhistory()
92 open_port(8020)
93 open_port(8010)
94 open_port(50070)
95@@ -550,6 +555,9 @@
96 fileSetKV(hosts_path, nodename_ip+' ', nodename_hostname)
97 configureHDFS(nodename_ip)
98 setDirPermission(os.environ['DFS_DATA_DIR'], os.environ['HDFS_USER'], os.environ['HADOOP_GROUP'], 0750)
99+ # TODO: (kwm) start_dn fails if dn is running. is it ok to stop first?
100+ if is_jvm_service_active("DataNode"):
101+ stop_datanode(os.environ["HDFS_USER"])
102 start_datanode(os.environ["HDFS_USER"])
103 if not is_jvm_service_active("DataNode"):
104 log("error ==> DataNode failed to start")

Subscribers

People subscribed via source and target branches

to all changes: