preSel.py on local files: how to?

clement.helsens · April 30, 2021, 5:37am

(email from patricia.rebello.teles@cern.ch 30/04/2021)
Dear expert

I am trying to run preSel.py from local files.

I have changed the basedir accordingly but the FCCAnalyses/config/runDataFrame.py", line 41, in run
with open(yamlfile) as ftmp:

is searching for a merge.yaml file.

How is this merge.yaml file produced?

may I kindly ask you how to overcome it? The instructions on github are not clear.

Thank you

Regards Patricia

clement.helsens · April 30, 2021, 5:46am

Dear @prebello,

preSel.py runs over “datasets” so what is in the yaml file is basically the dataset structure. Those datasets are created by the production system. Such merge.yaml looks like this:

merge:
  nbad: 0
  ndone: 1
  nevents: 1000
  outdir: /eos/experiment/fcc/ee/generation/DelphesEvents/spring2021/IDEA/p8_ee_Zuds_ecm91/
  outfiles:
  - - events_092725256.root
    - 1000
  outfilesbad: []
  process: p8_ee_Zuds_ecm91
  size: 5444721
  sumofweights: 1000.0

where the dataset and thus process name is p8_ee_Zuds_ecm91.
So preSel is aimed at being run on this structure only and is connected to a database, see this production for example:

http://fcc-physics-events.web.cern.ch/fcc-physics-events/Delphesevents_fccee_tmp_v03.php

To run on your own local files better to use analysis.py directly.

Change the __main__ like here:

github.com

HEP-FCC/FCCAnalyses/blob/awkward/examples/FCCee/flavour/Bc2TauNu/analysis_stage1.py#L369#L439


# python examples/FCCee/flavour/Bc2TauNu/analysis_stage1.py p8_ee_Zbb_Bc2TauNu_stage1.root /eos/experiment/fcc/ee/generation/DelphesEvents/fcc_tmp_v03/p8_ee_Zbb_ecm91_EvtGen_Bc2TauNuTAUHADNU/events_003834121.root

# python examples/FCCee/flavour/Bc2TauNu/analysis_stage1.py p8_ee_Zbb_Bu2TauNu_stage1.root /eos/experiment/fcc/ee/generation/DelphesEvents/fcc_tmp_v03/p8_ee_Zbb_ecm91_EvtGen_Bu2TauNuTAUHADNU/events_026079857.root

# python examples/FCCee/flavour/Bc2TauNu/analysis_stage1.py p8_ee_Zbb_Bc2TauNu_stage1.root "/eos/experiment/fcc/ee/generation/DelphesEvents/fcc_tmp_v03/p8_ee_Zbb_ecm91_EvtGen_Bc2TauNuTAUHADNU/events_*"

# python examples/FCCee/flavour/Bc2TauNu/analysis_stage1.py p8_ee_Zbb_stage1.root  /eos/experiment/fcc/ee/generation/DelphesEvents/fcc_tmp_v03/p8_ee_Zbb_ecm91/events_026734131.root



if __name__ == "__main__":

    if len(sys.argv)<3:
        print ("usage:")
        print ("python ",sys.argv[0]," output.root input.root")
        print ("python ",sys.argv[0]," output.root \"inputdir/*.root\"")
        print ("python ",sys.argv[0]," output.root file1.root file2.root file3.root <nevents>")
        sys.exit(3)


    print ("Create dataframe object from ", )

to accept all you files, and run like

python analysis.py output.root "inputpath/*.root"

Best,Clement

prebello · May 17, 2021, 12:56pm

Hi Clemens, thank you
I could run on my LHE private sample using

DelphesPythia8_EDM4HEP $DELPHES_DIR/cards/delphes_card_IDEAtrkCov.tcl edm4hep_output_config.tcl Pythia_LHE.cmd outputFile.root

then the analysis.py (no changes) as
python FCCAnalyses/examples/FCCee/higgs/eeh/analysis.py outputFile.root

Nevertheless, the next steps (acc the instructions finalsel.py)
python FCCAnalyses/examples/FCCee/higgs/eeh/finalsel.py

need sort of json file Permission denied: ‘/afs/cern.ch/work/h/helsens/public/FCCDicts/FCCee_procDict_fcc_tmp.json’

it seems that all chain (from presel.py to plots.py) depends on your files (inputs, outputs, dictionaries)

is there any way to work with it without these dictionaries?

Thank you

Regards Pat

clement.helsens · May 17, 2021, 3:35pm

Hello @prebello ,

provided your cern user name I can grant you read access.
But if you want to run over your privately produced samples you need to link your own dictionary
in the finalSel.py file, just replace the procDict pointing to my directory with

procDict={
   "youfirstsamplename": {"numberOfEvents": 11550000, "sumOfWeights": 11550000, "crossSection": 1.0, "kfactor": 1.0, "matchingEfficiency": 1.0},
   "yoursecondsamplename": {"numberOfEvents": 12800000, "sumOfWeights": 12800000, "crossSection": 1.0, "kfactor": 1.0, "matchingEfficiency": 1.0}
}

where youfirstsamplename,yoursecondsamplename are the name of the output files from previous step. If you want the histograms to be properly normalised, add the number of events that run over, so the ones in the original files before any cuts.

plots.py will not depend on the dictionnary as the histograms are already properly normalised.
But it’s true the rest depend on them, because we need a central place to store such common informations for proper normalisation.

prebello · May 17, 2021, 4:32pm

thank you @clement.helsens a lot for this information

One (maybe :-)) last question: since my analysis is simple, in the case I don’t manage to use the FCCAnalyses framework, may I use the standalone Delphes to process my private LHE with the command DelphesPythia8 $DELPHES_DIR/cards/delphes_card_IDEAtrkCov.tcl etc then make a private script without any prejudice wrt FCC-ee results? I would not use the EDM4HEP structures in this way.

clement.helsens · May 18, 2021, 7:22am

Dear @prebello,

Given the efforts you already did to work with the common samples in the EDM4Hep format, that would be unfortunate if you can’t process your analysis. The finalSel.py and plots.py are just helper scripts to run over common datasets, but after having run analysis.py you have most likely removed the dependence wrt to EDM4Hep, thus the output can be manipulated using any custom code you prefer.
Please let us know if you encounter difficulties in processing your analysis and we can certainly help making sure it can be done within common format and tools.
Clement