Large Sample Analysis with new framework

Hello,

I tried to do some large sample stuff using the changes to the repository made over the last two weeks and managed to run a fraction of the large samples provided by the eos and scp the graphs to my home directory.
My question is how would one use the eos system in conjunction with analyzing larger fractions of the samples. Would changing the file output using something like:

outputDir = “root://eosuser.cern.ch//eos/user/h/hshaddix”

be needed in every consecutive step, meaning the first and second selection stages as well as in the provided plot file? The default files simply create a new directory to reference in the other stages but if one was to use eos or Cernbox in order to run the analysis over a larger fraction sample, how would one go about changing the input and output of the files to use that storage space?

Thanks,
Hayden Shaddix

hello @hshaddix ,

when processing your first stage analysis, ie the one that runs over large samples typically located in /eos/experiment/fcc/ee/generation/DelphesEvents/spring2021/IDEA/ and running on batch, I suggest to store the output on eos, using the following

outputDirEos = "/eos/user/h/hshaddix/YOURDIR"
eosType = "eosuser"

This should run jobs on the local machine and copy the outputs to eos. Indeed, it is bad practices to run and directly on eos output as there is a non negligable probability that the open slot dies while the job runs.

Then, once you have run the first analysis stage, I think this trick is not needed anylonger as the processing time is reduced.
Clement

@clement.helsens ,

When I make these changes and try to run with a larger fraction of the events I get an issue with the nightlies build I believe, a message of the form:

Traceback (most recent call last):
File “/cvmfs/sw-nightlies.hsf.org/spackages5/fccanalyses/commit.aa8bb5630550885091256d4389c9d6a3310b1b46/x86_64-centos7-gcc11.2.0-opt/kyb5y/bin/fccanalysis”, line 28, in
run(parser)
File “/cvmfs/sw-nightlies.hsf.org/spackages5/fccanalyses/commit.aa8bb5630550885091256d4389c9d6a3310b1b46/x86_64-centos7-gcc11.2.0-opt/kyb5y/python/config/FCCAnalysisRun.py”, line 889, in run
if args.command == “run”: runStages(args, rdfModule, args.preprocess)
File “/cvmfs/sw-nightlies.hsf.org/spackages5/fccanalyses/commit.aa8bb5630550885091256d4389c9d6a3310b1b46/x86_64-centos7-gcc11.2.0-opt/kyb5y/python/config/FCCAnalysisRun.py”, line 600, in runStages
runLocal(rdfModule, chunkList[ch], args)
File “/cvmfs/sw-nightlies.hsf.org/spackages5/fccanalyses/commit.aa8bb5630550885091256d4389c9d6a3310b1b46/x86_64-centos7-gcc11.2.0-opt/kyb5y/python/config/FCCAnalysisRun.py”, line 476, in runLocal
outf = ROOT.TFile( outFile, “update” )
File “/cvmfs/sw-nightlies.hsf.org/spackages5/root/6.26.02/x86_64-centos7-gcc11.2.0-opt/62nsk/lib/ROOT/_pythonization/_tfile.py”, line 55, in _TFileConstructor
raise OSError(‘Failed to open file {}’.format(args[0]))
OSError: Failed to open file p8_ee_ZZ_ecm240.root

hello @hshaddix ,

I would need more details on the configuration file and the command you run.
Ideally, I would need all the details to be able to reproduce the problem you see.

Clement

@clement.helsens ,

I have since figured out the issue, the output code I used from you worked once I changed outputDirEos to outputDir instead.

Thanks,
Hayden Shaddix

good, will close this topic