Argument list too long in condor for 3 jet multiplicity

Bug #1107603 reported by Sanjay Padhi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MadGraph5_aMC@NLO
Fix Released
Undecided
marco zaro

Bug Description

Hi

I was trying to run ttbar + 3Jets in madgraph and in the cluster mode.

It seems like there is a limitation on number of arguments (given large number of jobs are submitted).

I got the following error:

Start waiting for update on filesystem. (more info in debug mode)
[Errno 7] Argument list too long
Command "generate_events " interrupted in sub-command:
"generate_events" with error:
TypeError : ask() got an unexpected keyword argument 'answers'
Please report this bug on https://bugs.launchpad.net/madgraph5

-Sanjay

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :

Hi Sanjay,

Thanks for reporting this, looks like 2 (or three) combined bugs. I will discuss this with Marco and we will fix this.

Thanks again,

Olivier

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :

Hi Sanjay,

Just to be sure, this is a LO run or a NLO run?

Cheers,

Olivier

Revision history for this message
Sanjay Padhi (sanjay-padhi) wrote :

Yes, this is LO (not NLO)

Another feature (only happens in the batch mode) is (ttbar + 2jets):

 Idle: 0 Running: 1 Finish: 28
INFO: All jobs finished
Combining runs
finish refine
combine_events
Combining Events
Fail to read the number of unweighted events in the combine.log file
cat: events.lhe: No such file or directory
cat: unweighted_events.lhe: No such file or directory
  === Results Summary for run: run_01 tag: tag_1 ===

     Cross-section : 12.35 +- 0.004119 pb
     Nb of events : 0

If you wish, I am open another bug report ...but this is with MadGraph5_v1_5_7

Thanks, Sanjay

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :
Download full text (3.6 KiB)

No one bug report is fine since this is related to condor as well...

I fixed the ask error part of the first bug report. (Since a problem is detected, the code propose to you to remove all jobs present on the queue. the question was not correctly configure).

As well as the problem with the [Errno 7] Argument list too long

For this one you need to replace in the file cluster.py
(madgraph/various/cluster.py or bin/internal/cluster.py)
the routine control (around line 358)
which looks like:
>>>>>>>
    @check_interupt()
    @multiple_try(nb_try=10, sleep=10)
    def control(self, me_dir):
        """ control the status of a single job with it's cluster id """

        if not self.submitted_ids:
            return 0, 0, 0, 0

        cmd = "condor_q " + ' '.join(self.submitted_ids[start:stop]) + " -format \'%-2s \\n\' \'ifThenElse(JobStatus==0,\"U\",ifThenElse(JobStatus==1,\"I\",ifThenElse(JobStatus==2,\"R\",ifThenElse(JobStatus==3,\"X\",ifThenElse(JobStatus==4,\"C\",ifThenElse(JobStatus==5,\"H\",ifThenElse(JobStatus==6,\"E\",string(JobStatus))))))))\'"

        status = misc.Popen([cmd], shell=True, stdout=subprocess.PIPE,
                                                             stderr=subprocess.PIPE)
        error = status.stderr.read()
        if status.returncode or error:
            raise ClusterManagmentError, 'condor_q returns error: %s' % error

        idle, run, fail = 0, 0, 0
        for line in status.stdout:
            status = line.strip()
            if status in ['I','U']:
                idle += 1
            elif status == 'R':
                run += 1
            elif status != 'C':
                fail += 1

        return idle, run, self.submitted - (idle+run+fail), fail

<<<<<<<
by:

    @check_interupt()
    @multiple_try(nb_try=10, sleep=10)
    def control(self, me_dir):
        """ control the status of a single job with it's cluster id """

        if not self.submitted_ids:
            return 0, 0, 0, 0

        packet = 15000
        for i in range(1+(len(self.submitted_ids)-1)//packet):
            start = i * packet
            stop = (i+1) * packet
            cmd = "condor_q " + ' '.join(self.submitted_ids[start:stop]) + " -format \'%-2s \\n\' \'ifThenElse(JobStatus==0,\"U\",ifThenElse(JobStatus==1,\"I\",ifThenElse(JobStatus==2,\"R\",ifThenElse(JobStatus==3,\"X\",ifThenElse(JobStatus==4,\"C\",ifThenElse(JobStatus==5,\"H\",ifThenElse(JobStatus==6,\"E\",string(JobStatus))))))))\'"

            status = misc.Popen([cmd], shell=True, stdout=subprocess.PIPE,
                                                             stderr=subprocess.PIPE)
            error = status.stderr.read()
            if status.returncode or error:
                raise ClusterManagmentError, 'condor_q returns error: %s' % error

            idle, run, fail = 0, 0, 0
            for line in status.stdout:
                status = line.strip()
                if status in ['I','U']:
                    idle += 1
                elif status == 'R':
             ...

Read more...

Revision history for this message
Sanjay Padhi (sanjay-padhi) wrote :

Many thanks for the fix.

The SubProcesses/combine.log is actually empty.

-Sanjay

Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :

Hi Marco,

I've assigned you to this bug, since this is exactly what we discussed by Skype last time.

Cheers,

Olivier

Changed in madgraph5:
assignee: nobody → marco zaro (marco-zaro)
Changed in madgraph5:
status: New → Confirmed
Revision history for this message
Olivier Mattelaer (olivier-mattelaer) wrote :

Marco any news?

Changed in madgraph5:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.