cpalgnodes.node module

Implementation of the main Node class

A node represents the smallest unit of the job configuration that the scheduler is allowed to reorder.

cpalgnodes.node.Dependency

Helper class to define data dependencies

The first element of the tuple is the container (nick)name and the second is the auxitem or None if the dependency is on the container alone.

alias of Tuple[str, Optional[str]]

class cpalgnodes.node.Node

Bases: BaseNode

Node class implementation that will be used by the default scheduler

Data dependencies are represented by (container, aux) 2-tuples, where aux is None if the dependency is on the container alone. Algorithms declare their dependencies through the produces_aux, produces_containers and requires_aux properties. The produces_container property also provides information about relationships between containers (e.g. where one container is a shallow copy of another). The scheduler will use this information to make sure that aux items declared on the parent container can be available on the child container.

Containers are referred to by nicknames, rather than their names in StoreGate. For example, nodes will report affecting the ‘electrons’ container and the scheduler will tell each node the correct name (e.g. Electrons) when building the sequence. Where a container will not be replaced (e.g. ‘EventInfo’) this can be used as the nickname.

Nodes also provide information about preselections which can be applied to the containers that they operate on. This information is used by the scheduler to ensure that algoriths operate on as few objects as possible.

ALL_OBJECTS = TRUE

Special value used to indicate a required input needs all objects in the container

PRESELECTED_OBJECTS = FALSE

Special value used to indicate the scheduler has complete freedom to set the preselection on a container

abstract create(container_info: Mapping[str, ContainerInfo]) Algorithm | List[Algorithm] | ConfigFragment

Create the algorithm(s) represented by this node

Parameters:

container_info (Mapping[str, ContainerInfo]) – The ContainerInfo for each input/output container

Returns:

The configured algorithm(s)

Return type:

Union[Algorithm, List[Algorithm], ConfigFragment]

property has_job_output: bool

Whether or not this node is responsible for producing output from the job.

If a node produces output (e.g. a TTree or histogram) it is always required and so the scheduler will add a dependency from it to the output node.

property produces: Set[Tuple[str, str | None]]

All data that this node produces

abstract property produces_aux: Dict[str, Set[str]]

The auxdata produced by this node, split by container

Return type:

A dictionary mapping container nicknames to the set of auxnames the node produces

abstract property produces_containers: Dict[str, str | None]

Any containers created by this node

Containers are returned as a dictionary mapping from the created container nickname to its parent container (or None otherwise). A container has a parent if it is a deep copy, shallow copy or view. Where a node updates a container it should have itself as the parent.

property requires: Set[Tuple[str, str | None]]

All data that this node requires

abstract property requires_aux: Dict[str, Set[str]]

The auxdata required by this node, split by container

Return type:

A dictionary mapping container nicknames to the set of auxnames the node requires

property requires_containers: Set[str]

A set of all containers used by the node

requires_objects(required_output: Dict[Tuple[str, str | None], Expression]) Dict[Tuple[str, str | None], Expression]

The input objects that this algorithm needs

Part of the preselection mechanism. This method allows each node to tell the scheduler which objects upstream nodes need to have run for this node to produce the data required by downstream node.

For example, in the most simple case (which this default implementation satisfies) the node calculates object-by-object so the node only needs to see the objects on which its output is required.

Parameters:

required_output (Dict[Dependency, boolean.Expression]) – The selections required downstream of this node on its outputs

Return type:

The selections this node requires on its inputs