CP Algorithms Concepts

This page gives a brief overview of the central concepts concerning the configuration of the CP algorithms, so that the specific implementation of the graph-based model introduced on the previous page makes sense.

The Component Model

ATLAS analysis and reconstruction jobs work using the component model from the Gaudi software framework (on which Athena is built). In this model C++ code fragments for specific tasks are encapsulated in components which can then be chained together by the configuration. The most important component type is the algorithm. Each algorithm has an execute method which is called on each event. Common pieces of code are typically encapsulated in tools which can be reused in different locations. This model also allows a large amount of flexibility in configuring algorithms as any tool can be provided to an algorithm so long as it matches the interface used in the algorithm’s C++ code. Additionally algorithms can access services. The difference between a service and a tool can be a little murky but typically a service is focused on enabling communication between different algorithms or between an algorithm and some form of job input/output.

All components expose certain values which can be changed at configuration time. These values are called properties and are the primary way in which the job is controlled. More details can be found in the section on Component Facades.

The xAOD Data Model

Objects in the input files are stored in the xAOD data format. xAOD objects are usually arranged into containers of multiple objects (the main exception to this is the EventInfo which is a single object). Each container is identified to the event store by its name. Each container has (almost) arbitrary data attached to it (called auxiliary data or auxdata) which is stored in parallel vectors; one entry per object in the container. Each aux item is identified by a string name. Some data is core to the container definition: for example electron containers have their pt, eta and phi among this core auxdata.

Containers can be copied as either a view container, a deep copy or a shallow copy. Deep copies are the simplest: the whole container, including all of its auxdata, is copied wholesale. After this there is no more connection between the copy and its parent. View containers are like a vector of pointers and are mainly used to filter out only certain elements of a container, though they can also contain elements from multiple different containers. As only pointers are stored any modifications to the parent container are reflected on the copy. The most common form of copy however is a shallow copy. In a shallow copy none of the auxdata is duplicated unless needed. When accessing an aux item on the copy if the data is not available in its store then its parent’s store is searched. If anything attempts to modify an aux item from the shallow copy then the data from the parent is copied into the child’s store before the modification takes place. This means that the child and parent container share all auxitems which are not modified or added to the child container. Most (if not all) of the copies used in the CP algorithms are shallow.

Data dependencies between nodes in the CP algorithm graph are based on these containers and their data. They are expressed as pairs of strings: the container name and auxitem (the second of these is allowed to be None if the dependency is on the container as a whole rather than on a specific piece of auxdata). To allow for copies, any algorithm which makes a copy of a container reports this. Then when searching for the provider of an auxdata item if it is not found attached to the child container it will be looked for on the parent. Then dependencies will be created between the node that produces that auxitem, the node that creates the copy and the node that requires the aux item.

Copies also introduce a further complication to the model. Each copy made needs to have a distinct name in the event store. The user does not need to know about each individual copy made, only whichever holds the final version of the container. However, when providing the nodes to the scheduler which node creates this final version is unknown: the scheduler is, after all, responsible for determining the order in which the nodes are sequenced.

Preselections

Typically not all elements of a container will be of interest in the final analysis and running calculations over these elements is wasteful. As we advance through algorithms the number of objects that we need to consider will be reduced as more and more selections can be made. Most CP algorithms allow setting a preselection property which reduces the number of objects considered. However, as with the issue with copy names this is a problem, as the order of algorithms is not known until after the scheduler has arranged them.