Understanding the stages of FCC analysis SW

Dear experts,
I have a few questions related to the workflow of FCCAnalysis, and in particular how things are handled. If someone is able to either clarify this, or point me to the relevant bits of code that would be great.

  1. What is the interplay between “chunks” and “fractions” when you set things up in a typical “analysis_stage1.py” file? I understand in principle what this does, but could you explain the logic for how this actually treats the input files? My suspicion is that there isn’t a check that ensures that chunks/fractions isn’t such that you get an empty frame (and that can give an error?).
  2. When doing analysis_final.py (or an equivalent) you can save a tree, but is there a way to save/store an effective “event weight” for that? I think the weight must be generated to fill the histograms, but it would be good to store as well for those wanting to do n-tuple based analyses on the outputs? Or perhaps there’s a way to have that stored on the output of stage1 in the case that you perform a filter?

Thanks in advance,
Sarah

Hello @williams,

The specific part of the framework, which takes care of the chunks and fractions has been revised in this PR.

  1. The chunks specify that one wants to split the work into several pieces, which will create corresponding number of output files.
    The fractions can be used to reduce the number of input events, but it operates with the full files. This means, that one can have a situation like this:

    ----> Info: Adding process "p8_ee_ZH_ecm240" with:
                 - fraction:         2e-06
                 - number of files:  100
                 - output stem:      p8_ee_ZH_ecm240_out
                 - number of chunks: 45
    ----> Info: Reducing the input file list by fraction "2e-06" of total events:
                 - total number of events: 10,000,000
                 - targeted number of events: 20
                 - number of events in the resulting file list: 100,000
                 - number of files after reduction: 1
    

    where there is a large difference between targeted number of events and actual number of events.
    After the PR there should not be a situation where empty frames are created, the number of chunks is limited to be between 1 and number of input events.

  2. The weight of the event can be stored in the resulting tree, it just has to be defined in previous stage. The function for that needs to be provided by the user.

Best,
Juraj