FCCAnalysis with non-dataframe based processing

aiwieder · August 14, 2024, 12:25pm

Hi,

I’m modernising my previous analysis ready for some detector studies and like the built in cluster submission and file bookkeeping available in the standard stage1.py etc kind of analysis steps like in the mH-recoil/mumu examples. Is it possible to take advantage of this as well in other kinds of analysis steps. For example I use a BDT for some selection so have a step where I first pickle some files

from config import train_var_lists

def run(input_files, vars_list):
    import uproot
    print("input_files: ", input_files)
    for inf in input_files:
        print(inf)
        output_file = inf.replace("/stage1", "/stage1_pickles").replace(".root", ".pkl")
        #df = uproot.concatenate(inf+":events", library="pd", how="zip", filter_name=vars_list )
        df = uproot.open(inf).get("events").pandas.df(vars_list)
        df.to_pickle(output_file)

def main():
    import argparse
    parser = argparse.ArgumentParser(description="Applies preselection cuts", formatter_class=argparse.ArgumentDefaultsHelpFormatter)
    parser.add_argument('--input', nargs="+", required=True, help='Select the input file(s).')
    #parser.add_argument('--output', type=str, required=True, help='Select the output file.')
    parser.add_argument('--vars', type=str, required=True, help='Select the variables to keep, e.g "train_vars_vtx", "train_vars_stage2"')
    args = parser.parse_args()

    assert( args.vars in train_var_lists.keys() )

    run(args.input, vars_list=train_var_lists[args.vars][args.decay])

if __name__ == '__main__':
    main()

I guess I could edit this to be similar to mH-recoil/mumu/analysis_final.py but it’s not obvious to me how to handle controlling which file gets opened etc… Is this possible or would I have to rely on my own bookkeeping and cluster submission?