Large samples and batch

scattola · November 24, 2025, 7:12pm

Hi everyone,
I am currently developing an analysis for my bachelor thesis and I have so far tested with 1 or 5 millions of events by just specifying the --nevents variable. For my final step I think I will be required to run the job on the whole sample size (100mln) or just larger statistics in general.
I can’t fully understand the correct procedure to proceed with the analysis. From what I gathered my analysis script should begin in a similar way:

#List of processes
processList = {
    'p8_ee_Ztautau_ecm91':{'fraction': 0.5, 'chunks': 20}
}

procDict = "FCCee_procDict_winter2023_IDEA.json"

#output directory
outputDir = "/afs/cern.ch/user/s/scattola/FCCAnalyses/Ztautau/analysis/treemaker/optimal/"
outputName = "p8_ee_Ztautau_ecm91"
#input directory
inputDir    = "/eos/experiment/fcc/ee/generation/DelphesEvents/winter2023/IDEA/"

#Optional
nCPUS       = 4
runBatch    = True
batchQueue = "longlunch"
compGroup = "group_u_FCC.local_gen"

With these settings if I understand correctly I expect to run over 50% of the statistics and divide it into 20 jobs (and 20 final .root files). My questions are:

what would be the correct proportion of chunks to total events? lets say I wanna run over 50mln of events, is 50 chunks enough?
if I then use a histmaker.py to create the needed histograms will it store the information in 1 final root file or does it also need chunks?
what are the options for batchQueue and which compGroup should I insert? I should be part of comp group zp (gid=1307) which I imagine corresponds to the ATLAS group.

Thanks in advance and apologies for the long post.

jsmiesko · November 25, 2025, 10:03am

Hi @scattola,

The requirement is that there should be at least one input file per an output chunk. I would say, you should have at least a few input files per chunk. The concrete number depends on several factors: your filter efficiency, the number of output variables, size of the output objects, …
The goal would be to get output files of manageable size.
The histmaker outputs histograms, and they should be saved into one root file per sample (process).
You can use accounting group dedicated for the FCC: group_u_FCC.local_gen. In order to be included in this HTCondor accounting group you need to add yourself to be a member of the fcc-experiments-comp e-group [link] (admin approval required).
The batchQueue options correspond to the Job Flavours, see Submit Files section in the Batch Docs.

Best,
Juraj

scattola · December 1, 2025, 2:11pm

Understood thank you.
So to actually submit the job should I follow the quick start guide at link?