Hi everyone,
I am currently developing an analysis for my bachelor thesis and I have so far tested with 1 or 5 millions of events by just specifying the --nevents variable. For my final step I think I will be required to run the job on the whole sample size (100mln) or just larger statistics in general.
I can’t fully understand the correct procedure to proceed with the analysis. From what I gathered my analysis script should begin in a similar way:
#List of processes
processList = {
'p8_ee_Ztautau_ecm91':{'fraction': 0.5, 'chunks': 20}
}
procDict = "FCCee_procDict_winter2023_IDEA.json"
#output directory
outputDir = "/afs/cern.ch/user/s/scattola/FCCAnalyses/Ztautau/analysis/treemaker/optimal/"
outputName = "p8_ee_Ztautau_ecm91"
#input directory
inputDir = "/eos/experiment/fcc/ee/generation/DelphesEvents/winter2023/IDEA/"
#Optional
nCPUS = 4
runBatch = True
batchQueue = "longlunch"
compGroup = "group_u_FCC.local_gen"
With these settings if I understand correctly I expect to run over 50% of the statistics and divide it into 20 jobs (and 20 final .root files). My questions are:
- what would be the correct proportion of chunks to total events? lets say I wanna run over 50mln of events, is 50 chunks enough?
- if I then use a histmaker.py to create the needed histograms will it store the information in 1 final root file or does it also need chunks?
- what are the options for batchQueue and which compGroup should I insert? I should be part of comp group zp (gid=1307) which I imagine corresponds to the ATLAS group.
Thanks in advance and apologies for the long post.