Merge lp:~maddevelopers/mg5amcnlo/PY8_parallelization into lp:~maddevelopers/mg5amcnlo/2.5.1

Proposed by Valentin Hirschi
Status: Merged
Merge reported by: Olivier Mattelaer
Merged at revision: not available
Proposed branch: lp:~maddevelopers/mg5amcnlo/PY8_parallelization
Merge into: lp:~maddevelopers/mg5amcnlo/2.5.1
Diff against target: 210 lines (+60/-46)
4 files modified
madgraph/interface/common_run_interface.py (+1/-1)
madgraph/interface/madevent_interface.py (+13/-12)
madgraph/various/lhe_parser.py (+41/-33)
madgraph/various/systematics.py (+5/-0)
To merge this branch: bzr merge lp:~maddevelopers/mg5amcnlo/PY8_parallelization
Reviewer Review Type Date Requested Status
Olivier Mattelaer Needs Fixing
Review via email: mp+304844@code.launchpad.net

This proposal supersedes a proposal from 2016-09-03.

Description of the change

This branch implements PY8 LO parallelization.

It is fully functional and has been tested for a few cases (not exhaustive yet).

The remaining issues are:

1) [Not related to this branch] Systematics parallelization crash when using a cluster_temp_directory.

2) [Not related to this branch] The PY8 HTML is screwed up past the first run/tag.

3) The .lhe splitting is slow. It would be nice to have an advanced function for this in lhe_parser.py that bypasses the (full) parsing of the event files.
Also, obtaining the number of events in the event file is slow. Again an optimized static method for this bypassing the full parsing would be nice.

4) The merging of the split HEPMC is done in a very efficient way in this branch (very important given their size). However it use two system calls which are not secure. They need to be made secure.

5) More testing must be made, especially a comparison of the results between parallel and sequential runs for the merged_x_secs, HwU plots et hepmc event files must be performed so as to guarantee the correctness of the implementation.

6) The new bits of code in do_pythia8() could be a bit more refactored. In particular the two parts of the code related to parallel submission and merging of split results could be factored out in dedicated functions.

Olivier, could you review this and fix what you can already.
I you manage to clean it up all, then don't hesitate to merge this already to 2.5.1 (or even 2.5.0 since it is a nice feature and introduces some important bug fixing).
If there is still something that needs to be discussed with me, then we will only be able to do this on tuesday, since I will be mostly unavailable 'til then now.

Thanks,

To post a comment you must log in.
Revision history for this message
Valentin Hirschi (valentin-hirschi) wrote : Posted in a previous version of this proposal

To test the above on a condor cluster, one must re-install MG5aMC_PY8_interface, because I modified its installation so that it links *statically* against HEPMC2 so that it doesn't have to be found on the worker nodes at run time.

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :

Hi Valentin,

1) I do not think that I will fix that.

2) this is now fixed in 2.5.0

3) I have improved this point. However, you should avoid to use the len() function.
By bypassing the parsing of the event, you are indeed much faster from 50s to s

4) we can discuss that.

5) I only did one simple test:
but it did not work:

Pythia8 shower jobs: 0 Idle, 1 Running, 7 Done [49m33s]
Pythia8 shower jobs: 0 Idle, 0 Running, 8 Done [51m10s]
Merging results from the split PY8 runs...
Fail to produce a pythia8 output. More info in
     /Users/omatt/Documents/eclipse/PY8_parallelization/PROC_sm_1/Events/run_01/tag_1_pythia8.log

The timing seems to indicate that the splitting did not occur at all...

Cheers,

Olivier

review: Needs Fixing
301. By Olivier Mattelaer

faster parsing for splitting event/get number of events

302. By Olivier Mattelaer

also apply the bypass of parsing for systematics

303. By Valentin Hirschi

1. Fixed the sanity check of PY8 log file which was not ok with the parallelization.

304. By Valentin Hirschi

1. fixed an issue with the warning about failing PY8 log. (needed to close the log stream).
2. Sandboxed the HEPMC merging syscalls.

305. By Valentin Hirschi

1. Merged with latest version of 2.5.1

Revision history for this message
Valentin Hirschi (valentin-hirschi) wrote :

Olivier,

I addressed your remaining points. Hepmc concatenation is now secure, although the event number are no longer unique nor sequential; but this appears to be irrelevant and can be easily fixed if necessary in the future (I'd rather not if not necessary as I want to avoid parsing the HEPMC).

Concerning timing, I tested 10k events for p p > t t~ with the 'simplepy8' shower and '/dev/null' for the HEPMC output. The outcome is (for the shower alone):

2.5.0, single core:

INFO: Pythia8 shower finished after 2m15s.

PY8_parallelization branch on 6 cores:

INFO: Pythia8 shower finished after 57 seconds.

So this is not linear because of the remaining initialization overhead, but it clearly shows that parallelization occurs.

This branch should therefore be ready to be merged in 2.5.1.

Preview Diff

[H/L] Next/Prev Comment, [J/K] Next/Prev File, [N/P] Next/Prev Hunk
1=== modified file 'madgraph/interface/common_run_interface.py'
2--- madgraph/interface/common_run_interface.py 2016-09-08 09:15:22 +0000
3+++ madgraph/interface/common_run_interface.py 2016-09-08 21:48:50 +0000
4@@ -3896,7 +3896,7 @@
5 'lhc' : 'syntax: set lhc VALUE:\n Set for a proton-proton collision with that given center of mass energy (in TeV)',
6 'lep' : 'syntax: set lep VALUE:\n Set for a electron-positron collision with that given center of mass energy (in GeV)',
7 'fixed_scale' : 'syntax: set fixed_scale VALUE:\n Set all scales to the give value (in GeV)',
8- 'simplepy8' : 'Turn off non-perturbative slow features of Pythia8.'
9+ 'simplepy8' : 'Turn off non-perturbative slow features of Pythia8.',
10 'mpi' : 'syntax: set mpi value: allow to turn mpi in Pythia8 on/off'
11 }
12
13
14=== modified file 'madgraph/interface/madevent_interface.py'
15--- madgraph/interface/madevent_interface.py 2016-09-08 09:15:22 +0000
16+++ madgraph/interface/madevent_interface.py 2016-09-08 21:48:50 +0000
17@@ -4024,7 +4024,6 @@
18
19 # Now merge logs
20 pythia_log_file = open(pythia_log,'w')
21-
22 n_added = 0
23 for split_dir in split_dirs:
24 log_file = pjoin(split_dir,'PY8_log.txt')
25@@ -4049,6 +4048,8 @@
26 PY8_extracted_information['Ntry'] = Ntry
27 else:
28 PY8_extracted_information['Ntry'] += Ntry
29+ pythia_log_file.close()
30+
31 # Normalize the values added
32 if n_added>0:
33 PY8_extracted_information['sigma_m'] /= float(n_added)
34@@ -4129,7 +4130,7 @@
35 break
36 header.close()
37 tail = open(pjoin(tmp_dir,'tail.hepmc'),'w')
38- n_tail = 0
39+ n_tail = 0
40 for line in misc.BackRead(all_hepmc_files[-1]):
41 if line.startswith('HepMC::'):
42 n_tail += 1
43@@ -4139,16 +4140,16 @@
44 tail.close()
45 if n_tail>1:
46 raise MadGraph5Error,'HEPMC files should only have one trailing command.'
47- ######################################################################
48- # This is the most efficient way of putting together HEPMC's, *BUT* #
49- # WARNING: NEED TO RENDER THE CODE BELOW SAFE TOWARDS INJECTION #
50- ######################################################################
51+
52 for hepmc_file in all_hepmc_files:
53 # Remove in an efficient way the starting and trailing HEPMC tags
54- os.system(' '.join(['sed','-i',"''","'%s;$d'"%
55- (';'.join('%id'%(i+1) for i in range(n_head))),hepmc_file]))
56- os.system(' '.join(['cat',pjoin(tmp_dir,'header.hepmc')]+all_hepmc_files+
57- [pjoin(tmp_dir,'tail.hepmc'),'>',hepmc_output]))
58+ misc.call(['sed','-i',"''",
59+ "%s;$d"%(';'.join('%id'%(i+1) for i in range(n_head))),
60+ hepmc_file],cwd=parallelization_dir)
61+ misc.call(['cat',pjoin(tmp_dir,'header.hepmc')]+all_hepmc_files+
62+ [pjoin(tmp_dir,'tail.hepmc')],
63+ stdout=open(hepmc_output,'w'),
64+ cwd=os.path.dirname(hepmc_output))
65
66 # We are done with the parallelization directory. Clean it.
67 if os.path.isdir(parallelization_dir):
68@@ -4163,9 +4164,9 @@
69 if os.path.isfile(pt_output):
70 shutil.move(pt_output, pjoin(self.me_dir,'Events',
71 self.run_name, '%s_pts.dat' % tag))
72-
73+
74 if not os.path.isfile(pythia_log) or \
75- 'PYTHIA Abort' in '\n'.join(open(pythia_log,'r').readlines()[:-20]):
76+ 'cross section' not in '\n'.join(open(pythia_log,'r').readlines()[-20:]):
77 logger.warning('Fail to produce a pythia8 output. More info in \n %s'%pythia_log)
78 return
79
80
81=== modified file 'madgraph/various/lhe_parser.py'
82--- madgraph/various/lhe_parser.py 2016-09-08 09:15:22 +0000
83+++ madgraph/various/lhe_parser.py 2016-09-08 21:48:50 +0000
84@@ -153,6 +153,8 @@
85 class EventFile(object):
86 """A class to allow to read both gzip and not gzip file"""
87
88+ parse_event= True #specify if the parsing of the events has to be done
89+
90 def __new__(self, path, mode='r', *args, **opt):
91
92 if not path.endswith(".gz"):
93@@ -183,6 +185,7 @@
94 raise
95
96 self.banner = ''
97+
98 if mode == 'r':
99 line = ''
100 while '</init>' not in line.lower():
101@@ -227,14 +230,15 @@
102 if hasattr(self,"len"):
103 return self.len
104
105- init_pos = self.tell()
106- self.seek(0)
107- nb_event=0
108- for _ in self:
109- nb_event +=1
110- self.len = nb_event
111- self.seek(init_pos)
112- return self.len
113+ with misc.TMP_variable(self, 'parse_event', False):
114+ init_pos = self.tell()
115+ self.seek(0)
116+ nb_event=0
117+ for _ in self:
118+ nb_event +=1
119+ self.len = nb_event
120+ self.seek(init_pos)
121+ return self.len
122
123 def next(self):
124 """get next event"""
125@@ -248,8 +252,11 @@
126 text = ''
127 if mode:
128 text += line
129- return Event(text)
130-
131+ if self.parse_event:
132+ return Event(text)
133+ else:
134+ return text
135+
136 def initialize_unweighting(self, get_wgt, trunc_error):
137 """ scan once the file to return
138 - the list of the hightest weight (of size trunc_error*NB_EVENT
139@@ -516,29 +523,30 @@
140 def split(self, nb_event=0, partition=None, cwd=os.path.curdir, zip=False):
141 """split the file in multiple file. Do not change the weight!"""
142
143- nb_file = -1
144- for i, event in enumerate(self):
145- if (not (partition is None) and i==sum(partition[:nb_file+1])) or \
146- (partition is None and i % nb_event == 0):
147- if i:
148- #close previous file
149- current.write('</LesHouchesEvent>\n')
150- current.close()
151- # create the new file
152- nb_file +=1
153- # If end of partition then finish writing events here.
154- if not partition is None and (nb_file+1>len(partition)):
155- return nb_file+1
156- if zip:
157- current = EventFile(pjoin(cwd,'%s_%s.lhe.gz' % (self.name, nb_file)),'w')
158- else:
159- current = open(pjoin(cwd,'%s_%s.lhe' % (self.name, nb_file)),'w')
160- current.write(self.banner)
161- current.write(str(event))
162- if i!=0:
163- current.write('</LesHouchesEvent>\n')
164- current.close()
165- return nb_file +1
166+ with misc.TMP_variable(self, 'parse_event', False):
167+ nb_file = -1
168+ for i, event in enumerate(self):
169+ if (not (partition is None) and i==sum(partition[:nb_file+1])) or \
170+ (partition is None and i % nb_event == 0):
171+ if i:
172+ #close previous file
173+ current.write('</LesHouchesEvent>\n')
174+ current.close()
175+ # create the new file
176+ nb_file +=1
177+ # If end of partition then finish writing events here.
178+ if not partition is None and (nb_file+1>len(partition)):
179+ return nb_file+1
180+ if zip:
181+ current = EventFile(pjoin(cwd,'%s_%s.lhe.gz' % (self.name, nb_file)),'w')
182+ else:
183+ current = open(pjoin(cwd,'%s_%s.lhe' % (self.name, nb_file)),'w')
184+ current.write(self.banner)
185+ current.write(str(event))
186+ if i!=0:
187+ current.write('</LesHouchesEvent>\n')
188+ current.close()
189+ return nb_file +1
190
191 def update_HwU(self, hwu, fct, name='lhe', keep_wgt=False, maxevents=sys.maxint):
192 """take a HwU and add this event file for the function fct"""
193
194=== modified file 'madgraph/various/systematics.py'
195--- madgraph/various/systematics.py 2016-09-07 23:55:03 +0000
196+++ madgraph/various/systematics.py 2016-09-08 21:48:50 +0000
197@@ -218,8 +218,13 @@
198 ids = [lowest_id+i for i in range(len(self.args)-1)]
199 all_cross = [0 for i in range(len(self.args))]
200
201+ if self.start_event !=0:
202+ self.input.parse_event = False
203+
204 for nb_event,event in enumerate(self.input):
205 if nb_event < self.start_event:
206+ if nb_event == self.start_event-1:
207+ self.input.parse_event = True
208 continue
209 elif nb_event >= self.stop_event:
210 if self.force_write_banner:

Subscribers

People subscribed via source and target branches

to all changes: