Inflated scaling in stage 2 when using intermediate tools (like uproot)

Hi everyone,
I wanted to share a quick observation in case it helps anyone doing intermediate ML steps between analysis stages.

I was using uproot between stage 1 and 2 to add BDT scores to my trees, and I realized uproot drops the eventsProcessed metadata from the (stage1 output) ROOT file.

When stage 2 runs and can’t find this tag, it seems to fall back to using the number of events in the filtered tree. If you applied any pre-selection cuts in stage 1, this quietly inflates the final cross-section scaling without throwing a warning.

I got around this by just manually copying the TParameter over using PyROOT when recreating my BDT output ROOT file, but figured I’d flag this so others don’t accidentally get inflated results!

Cheers,
Shreyas

Hi @sbakare,

That is a good point to bring up :slight_smile:

We use several Tparameters in order to keep information in between the stages.
The reason we store the information directly in the file is that than you can freely copy the files.
Since there is no external bookkeeping FCCAnalyses can’t tell if the files are preprocessed or not.

Best,
Juraj

Hi @sbakare ,

Personnaly I use defineList in the final-selection to add the BDT score in the TTree which seems more pratical than using uproot to manually add it.

Best,

Tom