cpalgnodes.scheduler module

Class to control the scheduling of the job and additional configuration based on the order

class cpalgnodes.scheduler.InputNode

Bases: Node

Node representing the input to the scheduler

This is a dummy node (adds no algorithms) in the graph that tells the scheduler what input information is available

create(container_info)

Create the algorithm(s) represented by this node

Parameters:

container_info (Mapping[str, ContainerInfo]) – The ContainerInfo for each input/output container

Returns:

The configured algorithm(s)

Return type:

Union[Algorithm, List[Algorithm], ConfigFragment]

property name

The name of this node (should be unique within the graph)

property produces_aux

The auxdata produced by this node, split by container

Return type:

A dictionary mapping container nicknames to the set of auxnames the node produces

property produces_containers

Any containers created by this node

Containers are returned as a dictionary mapping from the created container nickname to its parent container (or None otherwise). A container has a parent if it is a deep copy, shallow copy or view. Where a node updates a container it should have itself as the parent.

property requires_aux

The auxdata required by this node, split by container

Return type:

A dictionary mapping container nicknames to the set of auxnames the node requires

requires_objects(required_output: Dict[Tuple[str, str], Expression]) Dict[Tuple[str, str], Expression]

The input objects that this algorithm needs

Part of the preselection mechanism. This method allows each node to tell the scheduler which objects upstream nodes need to have run for this node to produce the data required by downstream node.

For example, in the most simple case (which this default implementation satisfies) the node calculates object-by-object so the node only needs to see the objects on which its output is required.

Parameters:

required_output (Dict[Dependency, boolean.Expression]) – The selections required downstream of this node on its outputs

Return type:

The selections this node requires on its inputs

class cpalgnodes.scheduler.OutputNode

Bases: Node

Node representing the output of the scheduler

This is a dummy node (adds no algorithms) in the graph that represents the output. It produces no data and has all the required output information as required data dependencies

create(container_info)

Create the algorithm(s) represented by this node

Parameters:

container_info (Mapping[str, ContainerInfo]) – The ContainerInfo for each input/output container

Returns:

The configured algorithm(s)

Return type:

Union[Algorithm, List[Algorithm], ConfigFragment]

property name: str

The name of this node (should be unique within the graph)

property produces_aux

The auxdata produced by this node, split by container

Return type:

A dictionary mapping container nicknames to the set of auxnames the node produces

property produces_containers

Any containers created by this node

Containers are returned as a dictionary mapping from the created container nickname to its parent container (or None otherwise). A container has a parent if it is a deep copy, shallow copy or view. Where a node updates a container it should have itself as the parent.

property requires_aux

The auxdata required by this node, split by container

Return type:

A dictionary mapping container nicknames to the set of auxnames the node requires

requires_objects(required_output: Dict[Tuple[str, str], Expression]) Dict[Tuple[str, str], Expression]

The input objects that this algorithm needs

Part of the preselection mechanism. This method allows each node to tell the scheduler which objects upstream nodes need to have run for this node to produce the data required by downstream node.

For example, in the most simple case (which this default implementation satisfies) the node calculates object-by-object so the node only needs to see the objects on which its output is required.

Parameters:

required_output (Dict[Dependency, boolean.Expression]) – The selections required downstream of this node on its outputs

Return type:

The selections this node requires on its inputs

class cpalgnodes.scheduler.Scheduler

Bases: GraphBuilder

Class to build the dependency graph and create all the algorithms

After the graph has been configured and the order determined, the preselections necessary for each node are determined.

add_input(container: str, auxitems: Iterable[str] = ())

Tell the scheduler about a container available on the input

Parameters:
  • container (str) – The name of the container

  • auxitems (Iterable[str]) – Any auxdata available on the container

add_node(node: Node)

Add a new node into the node list

If the node already exists this is a no-op

Raises:
add_output(container: str, auxitems: Iterable[str] | Mapping[str, Expression] = ())

Tell the scheduler about a container we need available on the output

Parameters:
  • container (str) – The name of the container

  • auxitems (Union[Iterable[str], Mapping[str, boolean.expression]]) – Any aux items needed on the output. Can either be a a list of auxitems or a mapping from aux item to the required output selection for that item.

add_selection(container: str, selection: Selection)

Tell the scheduler about a new selection

Parameters:
  • container (str) – The (nick)name of the container that this operates on

  • selection (Selection) – The selection

Raises:

KeyError – The container.selection_name pair is already defined

build_graph(ignore_unmet=True, extra_producers={}, extra_requirements={}) BaseGraph

Create the dependency graph

Parameters:

ignore_unmet (bool) – If False, raise a KeyError if a node depends on data which has no known producer

Return type:

The produced dependency graph

Raises:
  • KeyError – A node depends on data with no known producer and ignore_unmet is False

  • CyclicGraphError – The graph contains cycles

create_schedule(inputs: Mapping[str, str], outputs: Mapping[str, str], ignore_unmet: bool = True, prune_unused: bool = True, warn_unused: bool = True, annotate_order: str | None = None) Tuple[JobConfiguration, BaseGraph, Tuple[Node]]

Create the scheduled list of algorithms

Parameters:
  • inputs (Mapping[str, str]) – Mapping from container nickname to the real input name in the StoreGate

  • outputs (Mapping[str, str]) – Mapping from container nickname to the real output name in the StoreGate

  • ignore_unmet (bool, optional) – If False, raise an exception where a dependency is not met. True implicitly assumes that unmet dependencies are available on the input, by default True

  • prune_unused (bool, optional) – If True, remove any nodes from the graph whose outputs are not used, by default True

  • warn_unused (bool, optional) – If True, warn if there are any nodes whose outputs are not used, by default True

  • annotate_order (str, optional) – If set, add an edge attribute to the graph saying if an edge is part of the order

Returns:

  • JobConfiguration – The full job configuration

  • BaseGraph – The built dependency graph

  • Tuple[Node] – Any nodes whose outputs are not used (even if these have been pruned from the graph)

flatten_selection(container: str, selection: str, _seen: List[str] | None = None) List[str]

Expand a selection into a list of selections implicit in this one

Parameters:
  • container (str) – The nickname of the container that the selection is defined on

  • selection (str) – The name of the selection

  • _seen (Set[str], optional) – Internal parameter to detector cyclic definitions

Raises:
  • ValueError – A cyclic definition is detected

  • KeyError – One of the selections involved is unknown

get_container_parent(container: str) str | None

Get the parent of a container

Return None if it has no known parent

get_producer(data, exact: bool = False) Node

Get the node which produces a particular data item

Parameters:
  • data – The data being queried

  • exact (bool) – Some implementations of the builder allow for some data items to alias others. If exact is True it bypasses this mechanism, otherwise it does not. By default False

Raises:

KeyError – No node produces this data:

get_selection(container: str, name: str) Selection

Get the selection object

Parameters:
  • container (str) – The (nick)name of the container

  • name (str) – The name of the selection

Raises:

KeyError – No such selection exists

Returns:

The corresponding selection object

Return type:

Selection

remove_node(node: Node)

Remove a node from the graph

Parameters:

node (Union[str, BaseNode]) – The node to remove, can be the node object or its name

Raises:

KeyError – The node does not exist in the graph builder

rm_selection(container: str, selection_name: str)

Remove a selection from the scheduler

Parameters:
  • container (str) – The container the selection is defined on

  • selection_name (str) – The name of the selection

Raises:

KeyError – The selection does not exist

set_output_selection(container: str, selection: str | Expression)

Set the selection on the output container

cpalgnodes.scheduler.add_edge_to_dag(graph: DiGraph, source, destination)

Add an edge between two nodes, raising an error if doing so would create a cycle