cpalgnodes.scheduler module

Class to control the scheduling of the job and additional configuration based on the order

class cpalgnodes.scheduler.InputNode

Bases: Node

Node representing the input to the scheduler

This is a dummy node (adds no algorithms) in the graph that tells the scheduler what input information is available

create(container_info)

Create the algorithm(s) represented by this node

Parameters:: container_info (Mapping[str, ContainerInfo]) – The ContainerInfo for each input/output container
Returns:: The configured algorithm(s)
Return type:: Union[Algorithm, List[Algorithm], ConfigFragment]

property name: The name of this node (should be unique within the graph)

property produces_aux

The auxdata produced by this node, split by container

Return type:: A dictionary mapping container nicknames to the set of auxnames the node produces

property produces_containers

Any containers created by this node

Containers are returned as a dictionary mapping from the created container nickname to its parent container (or None otherwise). A container has a parent if it is a deep copy, shallow copy or view. Where a node updates a container it should have itself as the parent.

property requires_aux

The auxdata required by this node, split by container

Return type:: A dictionary mapping container nicknames to the set of auxnames the node requires

requires_objects(required_output: Dict[Tuple[str, str], Expression]) → Dict[Tuple[str, str], Expression]

The input objects that this algorithm needs

Part of the preselection mechanism. This method allows each node to tell the scheduler which objects upstream nodes need to have run for this node to produce the data required by downstream node.

For example, in the most simple case (which this default implementation satisfies) the node calculates object-by-object so the node only needs to see the objects on which its output is required.

Parameters:: required_output (Dict[Dependency, boolean.Expression]) – The selections required downstream of this node on its outputs
Return type:: The selections this node requires on its inputs

class cpalgnodes.scheduler.OutputNode

Bases: Node

Node representing the output of the scheduler

This is a dummy node (adds no algorithms) in the graph that represents the output. It produces no data and has all the required output information as required data dependencies

create(container_info)

Create the algorithm(s) represented by this node

Parameters:: container_info (Mapping[str, ContainerInfo]) – The ContainerInfo for each input/output container
Returns:: The configured algorithm(s)
Return type:: Union[Algorithm, List[Algorithm], ConfigFragment]

property name: str: The name of this node (should be unique within the graph)

property produces_aux

The auxdata produced by this node, split by container

Return type:: A dictionary mapping container nicknames to the set of auxnames the node produces

property produces_containers

Any containers created by this node

Containers are returned as a dictionary mapping from the created container nickname to its parent container (or None otherwise). A container has a parent if it is a deep copy, shallow copy or view. Where a node updates a container it should have itself as the parent.

property requires_aux

The auxdata required by this node, split by container

Return type:: A dictionary mapping container nicknames to the set of auxnames the node requires

requires_objects(required_output: Dict[Tuple[str, str], Expression]) → Dict[Tuple[str, str], Expression]

The input objects that this algorithm needs

Part of the preselection mechanism. This method allows each node to tell the scheduler which objects upstream nodes need to have run for this node to produce the data required by downstream node.

For example, in the most simple case (which this default implementation satisfies) the node calculates object-by-object so the node only needs to see the objects on which its output is required.

Parameters:: required_output (Dict[Dependency, boolean.Expression]) – The selections required downstream of this node on its outputs
Return type:: The selections this node requires on its inputs

class cpalgnodes.scheduler.Scheduler

Bases: GraphBuilder

Class to build the dependency graph and create all the algorithms

After the graph has been configured and the order determined, the preselections necessary for each node are determined.

add_input(container: str, auxitems: Iterable[str] = ())

Tell the scheduler about a container available on the input

Parameters:

container (str) – The name of the container
auxitems (Iterable[str]) – Any auxdata available on the container

add_node(node: Node)

Add a new node into the node list

If the node already exists this is a no-op

Raises:

NodeAlreadyExistsError – There is a node with the same name that does not compare equal
OutputAlreadyProducedError – There is already a node which produces this output

add_output(container: str, auxitems: Iterable[str] | Mapping[str, Expression] = ())

Tell the scheduler about a container we need available on the output

Parameters:

container (str) – The name of the container
auxitems (Union[Iterable[str], Mapping[str, boolean.expression]]) – Any aux items needed on the output. Can either be a a list of auxitems or a mapping from aux item to the required output selection for that item.

add_selection(container: str, selection: Selection)

Tell the scheduler about a new selection

Parameters:

container (str) – The (nick)name of the container that this operates on
selection (Selection) – The selection

Raises:

KeyError – The container.selection_name pair is already defined

build_graph(ignore_unmet=True, extra_producers={}, extra_requirements={}) → BaseGraph

Create the dependency graph

Parameters:

ignore_unmet (bool) – If False, raise a KeyError if a node depends on data which has no known producer

Return type:

The produced dependency graph

Raises:

KeyError – A node depends on data with no known producer and ignore_unmet is False
CyclicGraphError – The graph contains cycles

create_schedule(inputs: Mapping[str, str], outputs: Mapping[str, str], ignore_unmet: bool = True, prune_unused: bool = True, warn_unused: bool = True, annotate_order: str | None = None) → Tuple[JobConfiguration, BaseGraph, Tuple[Node]]

Create the scheduled list of algorithms

Parameters:

inputs (Mapping[str, str]) – Mapping from container nickname to the real input name in the StoreGate
outputs (Mapping[str, str]) – Mapping from container nickname to the real output name in the StoreGate
ignore_unmet (bool, optional) – If False, raise an exception where a dependency is not met. True implicitly assumes that unmet dependencies are available on the input, by default True
prune_unused (bool, optional) – If True, remove any nodes from the graph whose outputs are not used, by default True
warn_unused (bool, optional) – If True, warn if there are any nodes whose outputs are not used, by default True
annotate_order (str, optional) – If set, add an edge attribute to the graph saying if an edge is part of the order

Returns:

JobConfiguration – The full job configuration
BaseGraph – The built dependency graph
Tuple[Node] – Any nodes whose outputs are not used (even if these have been pruned from the graph)

flatten_selection(container: str, selection: str, _seen: List[str] | None = None) → List[str]

Expand a selection into a list of selections implicit in this one

Parameters:

container (str) – The nickname of the container that the selection is defined on
selection (str) – The name of the selection
_seen (Set[str], optional) – Internal parameter to detector cyclic definitions

Raises:

ValueError – A cyclic definition is detected
KeyError – One of the selections involved is unknown

get_container_parent(container: str) → str | None

Get the parent of a container

Return None if it has no known parent

get_producer(data, exact: bool = False) → Node

Get the node which produces a particular data item

Parameters:

data – The data being queried
exact (bool) – Some implementations of the builder allow for some data items to alias others. If exact is True it bypasses this mechanism, otherwise it does not. By default False

Raises:

KeyError – No node produces this data:

get_selection(container: str, name: str) → Selection

Get the selection object

Parameters:

container (str) – The (nick)name of the container
name (str) – The name of the selection

Raises:

KeyError – No such selection exists

Returns:

The corresponding selection object

Return type:

Selection

remove_node(node: Node)

Remove a node from the graph

Parameters:: node (Union[str, BaseNode]) – The node to remove, can be the node object or its name
Raises:: KeyError – The node does not exist in the graph builder

rm_selection(container: str, selection_name: str)

Remove a selection from the scheduler

Parameters:

container (str) – The container the selection is defined on
selection_name (str) – The name of the selection

Raises:

KeyError – The selection does not exist

set_output_selection(container: str, selection: str | Expression): Set the selection on the output container

cpalgnodes.scheduler.add_edge_to_dag(graph: DiGraph, source, destination): Add an edge between two nodes, raising an error if doing so would create a cycle