Finding CollectionID

ymahmoud · April 11, 2024, 5:38pm

Hello FCC community,

I am trying to figure out what is the collection of references for each object means since for example Jets and MET have six references while others like muons have just one reference. It is written in the EDM4HEP explanation here FCCAnalyses/examples/basics at basicexamples · HEP-FCC/FCCAnalyses · GitHub
that I should know that from the collectionID and find what collection it points to from this list Electron (1), Muon (2), AllMuon (3), EFlowNeutralHadron (4), Particle (5), Photon (6), ReconstructedParticles (7), EFlowPhoton (8), MCRecoAssociations (9), MissingET (10), ParticleIDs (11), Jet (12), EFlowTrack (13), EFlowTrack_1 (14)

It says that for example collectionID for muon#0 is number 7 so it belongs to the reconstructed particle collection. But When I opened one of the files, I found the muon#0 collectionID to be 15 which is not present the list.

Does anyone know how to this works?

jsmiesko · April 12, 2024, 7:32am

Hi @ymahmoud,

there are two types of collections here. One is regular collection of type edm4hep::ReconstructedParticleCollection (this is “Jets”, “MissingET” and “ReconstructedParticles” collections) and the other type is a subset collection which also appears as edm4hep::ReconstructedParticleCollection. In this case “Muon” collection is subset collection of “ReconstructedParticles” collection.

The subset collection needs to refer only to the original collection, so there is only one reference.
The ordinary edm4hep::ReconstructedParticleCollection can refer to other types of collections (six in total). You can find the list of those collections here under “OneToOneRelations” and “OneToManyRelations”

Be aware, that there are few file level differences between spring2021, winter2023 and current productions as EDM4hep/Podio is evolving.

Best,
Juraj

ymahmoud · April 12, 2024, 1:20pm

Thank you @jsmiesko. I have read the yaml file but it doesn’t contain information on the jet collection. Are jet information stored in Reconstructed particles collection too?

Best
yehia

jsmiesko · April 12, 2024, 2:21pm

Hi @ymahmoud,

The “Jet” object collection is of type edm4hep::ReconstructedParticleCollection and it is separate collection from “ReconstructedParticles”. It is not subset collection.

Best,
Juraj

ymahmoud · April 13, 2024, 12:35pm

Thank you @jsmiesko and sorry for asking too many questions since I am new to FCC.

For the jet collection, how can I know which clustering algorithm was used to obtain this jet collection?

Also, are the particles that were used for the jet clustering also stored in ReconstructedParticles ?

Best
Yehia

jsmiesko · April 16, 2024, 6:52pm

Hi @ymahmoud,

there are no metadata like this in the pre-generated samples (spring2021, winter2023) you will probably need to find this information in the appropriate Delphes card. For the pre-generated samples they are hosted here (notice different branches for different campaigns). @selvaggi will probably know more details.

I checked winter2023 campaign and the objects from the “Jet” collection point to the particles from the “ReconstructedParticles” collection.

Best,
Juraj

jaeyserm · April 17, 2024, 3:10pm

Hi @ymahmoud

In general we do not recommend using the default jet and MET definitions in the samples. These are produced in the default sequences in Delphes and are configured for exclusive jet clustering for 2 jets, using the ee-durham kT algorithm. For MET, it is the missing transverse momentum (used at hadron colliders), whereas at lepton colliders we have access to the full missing energy vector.

The recommendation is that each analysis should perform jet clustering on the fly in FCCAnaysis. A snippet to do the jet clustering can be found here (I will move this example soon to the main FCCAnalyses repository):

github.com

jeyserma/FCCAnalyzer/blob/main/analyses/ewk_z/jetclustering.py#L45-L68


      
          # cluster all reconstructed particles
          df = df.Define("RP_px", "FCCAnalyses::ReconstructedParticle::get_px(ReconstructedParticles)")
          df = df.Define("RP_py", "FCCAnalyses::ReconstructedParticle::get_py(ReconstructedParticles)")
          df = df.Define("RP_pz", "FCCAnalyses::ReconstructedParticle::get_pz(ReconstructedParticles)")
          df = df.Define("RP_e",  "FCCAnalyses::ReconstructedParticle::get_e(ReconstructedParticles)")
          df = df.Define("RP_m",  "FCCAnalyses::ReconstructedParticle::get_mass(ReconstructedParticles)")
          df = df.Define("RP_q",  "FCCAnalyses::ReconstructedParticle::get_charge(ReconstructedParticles)")
          df = df.Define("RP_no",  "FCCAnalyses::ReconstructedParticle::get_n(ReconstructedParticles)")
          df = df.Define("pseudo_jets", "FCCAnalyses::JetClusteringUtils::set_pseudoJets(RP_px, RP_py, RP_pz, RP_e)")
          
          
          # more info: https://indico.cern.ch/event/1173562/contributions/4929025/attachments/2470068/4237859/2022-06-FCC-jets.pdf
          # https://github.com/HEP-FCC/FCCAnalyses/blob/master/addons/FastJet/src/JetClustering.cc
          df = df.Define("clustered_jets", "JetClustering::clustering_ee_kt(2, 4, 0, 10)(pseudo_jets)") # 4-jet clustering
          
          
          df = df.Define("jets", "FCCAnalyses::JetClusteringUtils::get_pseudoJets(clustered_jets)")
          df = df.Define("jetconstituents", "FCCAnalyses::JetClusteringUtils::get_constituents(clustered_jets)") # one-to-one mapping to reconstructedparticles
          df = df.Define("jets_e", "FCCAnalyses::JetClusteringUtils::get_e(jets)")
          df = df.Define("jets_px", "FCCAnalyses::JetClusteringUtils::get_px(jets)")

This file has been truncated. show original

This uses the ee-durham kT algorithm that exclusively clusters for 2 jets. You can change the parameters to cluster according to your needs. More info can be found here:

github.com

HEP-FCC/FCCAnalyses/blob/master/addons/FastJet/JetClustering.h#L106-L110


      
          clustering_ee_kt(int arg_exclusive = 0, float arg_cut = 5., int arg_sorted = 0, int arg_recombination = 0);
          FCCAnalysesJet operator()(const std::vector<fastjet::PseudoJet>& jets);
          
          int _exclusive;  ///< flag for exclusive jet clustering. Possible choices are 0=inclusive clustering, 1=exclusive clustering that would be obtained when running the algorithm with the given dcut, 2=exclusive clustering when the event is clustered (in the exclusive sense) to exactly njets, 3=exclusive clustering when the event is clustered (in the exclusive sense) up to exactly njets, 4=exclusive jets obtained at the given ycut
          float _cut;  ///< pT cut for m_exclusive=0, dcut for m_exclusive=1, N jets for m_exlusive=2, N jets for m_exclusive=3, ycut for m_exclusive=4

The missing energy can be recomputed knowing the center-of-mass of the collisions (usually 240 GeV for the Higgs):

github.com

HEP-FCC/FCCAnalyses/blob/master/examples/FCCee/higgs/mH-recoil/functions.h#L232-L250


      
          Vec_rp missingEnergy(float ecm, Vec_rp in, float p_cutoff = 0.0) {
              float px = 0, py = 0, pz = 0, e = 0;
              for(auto &p : in) {
                  if (std::sqrt(p.momentum.x * p.momentum.x + p.momentum.y*p.momentum.y) < p_cutoff) continue;
                  px += -p.momentum.x;
                  py += -p.momentum.y;
                  pz += -p.momentum.z;
                  e += p.energy;
              }
              
              Vec_rp ret;
              rp res;
              res.momentum.x = px;
              res.momentum.y = py;
              res.momentum.z = pz;
              res.energy = ecm-e;
              ret.emplace_back(res);
              return ret;
          }

github.com

HEP-FCC/FCCAnalyses/blob/master/examples/FCCee/higgs/mH-recoil/histmaker_mumu.py#L153


      
          ### CUT 4: Z momentum
          #########  
          df = df.Filter("zmumu_p > 20 && zmumu_p < 70")
          df = df.Define("cut4", "4")
          results.append(df.Histo1D(("cutFlow", "", *bins_count), "cut4"))
          
          
          #########
          ### CUT 5: cosThetaMiss
          #########  
          df = df.Define("missingEnergy", "FCCAnalyses::ZHfunctions::missingEnergy(240., ReconstructedParticles)")
          #df = df.Define("cosTheta_miss", "FCCAnalyses::get_cosTheta_miss(missingEnergy)")
          df = df.Define("cosTheta_miss", "FCCAnalyses::ZHfunctions::get_cosTheta_miss(MissingET)")
          results.append(df.Histo1D(("cosThetaMiss_cut4", "", *bins_cosThetaMiss), "cosTheta_miss")) # plot it before the cut
          
          df = df.Filter("cosTheta_miss < 0.98")
          df = df.Define("cut5", "5")
          results.append(df.Histo1D(("cutFlow", "", *bins_count), "cut5"))
          
          
          #########

Let us know if you have any other questions.

Best,
Jan

ymahmoud · April 27, 2024, 5:15pm

Thank you @jaeyserm, this was really helpful.

ymahmoud · April 27, 2024, 6:57pm

I tried running the jet clustering algorithm and it ran just fine. But I get an error when using this FCCAnalyses::makeLorentzVectors or FCCAnalyses::jetTruthFinder
it says no member function with this name in FCCAnalysis namespace

Also in jetTruthFinder, how do you take parameters like ReconstructedParticles as an argument without defining them first.
Also, what does jetTruthFinder do?

Best,
Yehia

jaeyserm · April 29, 2024, 6:59am

Hi Yehia,

The JetTruthFinder function is a simple algorithm to know the original/true flavor of the clustered jet (e.g. b, c, …). You can certainly remove these lines if you’re not interested in knowing the jet flavor. In case you do want to know, you need to include the C++ function in your analyzer. The corresponding code can be found here: FCCAnalyzer/include/utils.h at main · jeyserma/FCCAnalyzer · GitHub

Best,
Jan

ymahmoud · April 29, 2024, 7:15pm

Thank you @jaeyserm,

How can I make changes to the code locally? I tried to change the code in analyzers/dataframe but the changes don’t manifest themselves when I do fccanalysis run. I even commented out some important code so it gives an error but it didn’t.

I am right now trying to add FCCAnalyses::makeLorentzVectors in analyzers/dataframe but it doesn’t work.
I added it to existing file called Utils.h which is in analyzers/dataframe/FCCAnalyses and added a file Utils.cc in analyzers/dataframe/src

Here are the contents of both file

Utils.h

#ifndef  UTILS_ANALYZERS_H
#define  UTILS_ANALYZERS_H
#include <cmath>
#include "defines.h"
#include "TLorentzVector.h"
#include "ROOT/RVec.hxx"
namespace FCCAnalyses {
  namespace Utils {
    template<typename T> inline auto getsize( T& vec){ return vec.size();};
    template<typename T> inline ROOT::VecOps::RVec<ROOT::VecOps::RVec<T> >  as_vector(const ROOT::VecOps::RVec<T>& in){return ROOT::VecOps::RVec<ROOT::VecOps::RVec<T> >(1, in);};                        >
   ROOT::VecOps::RVec<TLorentzVector>  makeLorentzVectors(ROOT::VecOps::RVec<float> jets_px, ROOT::VecOps::RVec<float> jets_py, ROOT::VecOps::RVec<float> jets_pz, ROOT::VecOps::RVec<float> jets_e);
  }
}
#endif

And here is Utils.cc file

#include <cmath>
#include <vector>
#include <math.h>
//#include "TLorentzVector.h"
//#include "ROOT/RVec.hxx"
#include "FCCAnalyses/Utils.h"

namespace FCCAnalyses{
namespace Utils {
ROOT::VecOps::RVec<TLorentzVector>  makeLorentzVectors(ROOT::VecOps::RVec<float> jets_px, ROOT::VecOps::RVec<float> jets_py, ROOT::VecOps::RVec<float> jets_pz, ROOT::VecOps::RVec<float> jets_e){
  ROOT::VecOps::RVec<TLorentzVector> result;
    for(int i=0; i<jets_px.size(); i++) {
        TLorentzVector tlv;
        tlv.SetPxPyPzE(jets_px[i], jets_py[i], jets_pz[i], jets_e[i]);
        result.push_back(tlv);
    }
    return result;

   }
  }
}

Of course now I use FCCAnalyses::Utils::makeLorentzVectors
Best
Yehia

jaeyserm · April 29, 2024, 8:35pm

Hi Yehia,

You can easily include your own custom header file that is compiled automatically once executed. Save the C++ code below to myfunctions.h (in the same directory as your python analysis script), and then include that in your python analysis file:

includePaths = ["myfunctions.h"]

The C++ header function goes as follows:

#ifndef FCCANALYZER_MYUTILS_H
#define FCCANALYZER_MYUTILS_H

#include <cmath>
#include <vector>
#include <math.h>

#include "TLorentzVector.h"
#include "ROOT/RVec.hxx"
#include "edm4hep/ReconstructedParticleData.h"
#include "edm4hep/MCParticleData.h"
#include "edm4hep/ParticleIDData.h"
#include "ReconstructedParticle2MC.h"


namespace FCCAnalyses {

Vec_i jetTruthFinder(std::vector<std::vector<int>> constituents, Vec_rp reco, Vec_mc mc, Vec_i mcind) {
    // jet truth=finder: match the gen-level partons (eventually with gluons) with the jet constituents
    // matching by mimimizing the sum of dr of the parton and all the jet constituents 

    Vec_tlv genQuarks; // Lorentz-vector of potential partons (gen-level)
    Vec_i genQuarks_pdgId; // corresponding PDG ID
    for(size_t i = 0; i < mc.size(); ++i) {
        int pdgid = abs(mc.at(i).PDG);
        if(pdgid > 6) continue; // only quarks 
        //if(pdgid > 6 and pdgid != 21) continue; // only quarks and gluons
        TLorentzVector tlv;
        tlv.SetXYZM(mc.at(i).momentum.x,mc.at(i).momentum.y,mc.at(i).momentum.z,mc.at(i).mass);
        genQuarks.push_back(tlv);
        genQuarks_pdgId.push_back(mc.at(i).PDG);
    }

    Vec_tlv recoParticles; // Lorentz-vector of all reconstructed particles
    for(size_t i = 0; i < reco.size(); ++i) {
        auto & p = reco[i];
        TLorentzVector tlv;
        tlv.SetXYZM(p.momentum.x, p.momentum.y, p.momentum.z, p.mass);
        recoParticles.push_back(tlv);
    }

    Vec_i usedIdx;
    Vec_i result;
    for(size_t iJet = 0; iJet < constituents.size(); ++iJet) {
        Vec_d dr;
        for(size_t iGen = 0; iGen < genQuarks.size(); ++iGen) {
            if(std::find(usedIdx.begin(), usedIdx.end(), iGen) != usedIdx.end()) {
                dr.push_back(1e99); // set infinite dr, skip
                continue;
            }
            dr.push_back(0);
            for(size_t i = 0; i < constituents[iJet].size(); ++i) {
                dr[iGen] += recoParticles[constituents[iJet][i]].DeltaR(genQuarks[iGen]);
            }
        }
        int maxDrIdx = std::min_element(dr.begin(),dr.end()) - dr.begin();
        usedIdx.push_back(maxDrIdx);
        result.push_back(genQuarks_pdgId[maxDrIdx]);

    }
    return result;
}


}

#endif

Best,
Jan

ymahmoud · May 2, 2024, 6:59am

Thank you @jaeyserm That solved my problem.