Large samples and batch

Hi everyone,
I am currently developing an analysis for my bachelor thesis and I have so far tested with 1 or 5 millions of events by just specifying the --nevents variable. For my final step I think I will be required to run the job on the whole sample size (100mln) or just larger statistics in general.
I can’t fully understand the correct procedure to proceed with the analysis. From what I gathered my analysis script should begin in a similar way:

#List of processes
processList = {
    'p8_ee_Ztautau_ecm91':{'fraction': 0.5, 'chunks': 20}
}

procDict = "FCCee_procDict_winter2023_IDEA.json"

#output directory
outputDir = "/afs/cern.ch/user/s/scattola/FCCAnalyses/Ztautau/analysis/treemaker/optimal/"
outputName = "p8_ee_Ztautau_ecm91"
#input directory
inputDir    = "/eos/experiment/fcc/ee/generation/DelphesEvents/winter2023/IDEA/"

#Optional
nCPUS       = 4
runBatch    = True
batchQueue = "longlunch"
compGroup = "group_u_FCC.local_gen"

With these settings if I understand correctly I expect to run over 50% of the statistics and divide it into 20 jobs (and 20 final .root files). My questions are:

  • what would be the correct proportion of chunks to total events? lets say I wanna run over 50mln of events, is 50 chunks enough?
  • if I then use a histmaker.py to create the needed histograms will it store the information in 1 final root file or does it also need chunks?
  • what are the options for batchQueue and which compGroup should I insert? I should be part of comp group zp (gid=1307) which I imagine corresponds to the ATLAS group.

Thanks in advance and apologies for the long post.

Hi @scattola,

  • The requirement is that there should be at least one input file per an output chunk. I would say, you should have at least a few input files per chunk. The concrete number depends on several factors: your filter efficiency, the number of output variables, size of the output objects, …
    The goal would be to get output files of manageable size.

  • The histmaker outputs histograms, and they should be saved into one root file per sample (process).

  • You can use accounting group dedicated for the FCC: group_u_FCC.local_gen. In order to be included in this HTCondor accounting group you need to add yourself to be a member of the fcc-experiments-comp e-group [link] (admin approval required).
    The batchQueue options correspond to the Job Flavours, see Submit Files section in the Batch Docs.

Best,
Juraj

1 Like

Understood thank you.
So to actually submit the job should I follow the quick start guide at link?