Event Data Model Discussion

Now is a good time to reflect a bit on the FCC data model used so far. There is an upcoming meeting with other future experiments where we can report our experiences and get some new Ideas. Since this really concerns everyone using the software, it will be good to get input from as many people as possible, both for the technical implementation (PODIO) as for the conceptual model (fcc-edm)

I’ll start:

  • Sometimes there is a bit of a trade-off between usability and efficiency. In principle, the “charge” member of the fcc particles is redundant, the charge information is contained in the PdgId of the particle, and can be converted with HepPDT. The code to do that looks like this:
      /// lookup charge in particle properties
      HepPDT::ParticleID particleID(2203);
      std::cout << particleID.charge() << std::endl;

So I feel the charge could actually be removed, if there is some method to access it that is easy to type in a call to TTree::Draw()

[edit] There is also the issue that the type of charge is an integer, which means it cannot be used for fractional charges, for example partons in the output of event generators. I think float would be more appropriate

  • Vertex-Particle associations There are a couple of issues with particles and vertices. The MCParticle type consists of a BareParticle and references to two vertices (start and end), but the BareParticle already has a point-member vertex (which I assume will be the start, but is not documented well). So for users it might be unclear which possiblity to use to save /retrieve start vertex information. Issue here: https://github.com/HEP-FCC/fcc-edm/issues/60
  • FloatValue: This is not really necessary I think. There is definitely a need to save some information sometime that is not exactly captured in the edm, but when you do that it makes more sense to do it as a float instead of a struct FloatValueData {float value;}, at least to me. It’s the fault of the framework that the former is not easily possible, but I tried to correct that with https://github.com/HEP-FCC/FCCSW/pull/346
  • bits: This member is used in quite a few structs: BareParticle, BareHit, BareParticle, BareCluster, BareJet, Track, Vertex. In practice we usually use it exclusively to store some simulation MC truth (the Geant4 trackID) which is necessary to calculate some efficiencies. The approach is a valid one, though I think - we are just missing a mechanism similar to the “readout” in dd4hep that defines this member as a bitfield and how to read it - something like “trackID:5:IsPrimary:1: …” Most likely this has to be a long integer to be useful – the trackID already takes up 32 bits
  • Track/TrackState: These types are most in need of an overhaul, in my opinion. The principle is ok: A track needs to provide information on the measurements used in it (the references to TrackHit), a description of the track itself (parameters for a given parametrization) and information on the fit quality (chiSquared / number of degrees of freedom). There may be a set of parameters required as the track may change during the passage through the detector (energy loss, scattering) and is thus best approximated by a piecewise continuous assembly of several helices. There are, however, some issues:
    • The track holds references to TrackHits, not TrackClusters (which we did as the algorithms providing Clusters were not ready in time)
    • The TrackState is, despite the lengthy description, not really well documented unless one is familiar with ATLAS / ACTS, from which the conventions are borrowed. (What entry in the covariances corresponds to which track parameters?)
    • The TrackState in ACTS includes two local coordinates on a surface that is specified by a pointer. (Mostly this surface is the beamspot, and the coordinates are z0 and d0) This could be expressed with a dd4hep volumeID, but instead the TrackState has a “referencePoint”. This raises the question why the local coordinates are needed at all.
    • It may be worththile to also save the parameters as an array or other type that can be used with linear algebra libraries like eigen, which can probably save a lot of glue code in applications like the Kalman Filter. This may require some convenience methods like float theta() {return parameters[2];}

Here a link to the last round of edm discussions: https://indico.cern.ch/event/560008/contributions/2260938/attachments/1317490/1974272/16-07-28-edm-review.pdf