cpalgnodes.scheduler module
Class to control the scheduling of the job and additional configuration based on the order
- class cpalgnodes.scheduler.InputNode
Bases:
NodeNode representing the input to the scheduler
This is a dummy node (adds no algorithms) in the graph that tells the scheduler what input information is available
- create(container_info)
Create the algorithm(s) represented by this node
- Parameters:
container_info (Mapping[str, ContainerInfo]) – The ContainerInfo for each input/output container
- Returns:
The configured algorithm(s)
- Return type:
Union[Algorithm, List[Algorithm], ConfigFragment]
- property name
The name of this node (should be unique within the graph)
- property produces_aux
The auxdata produced by this node, split by container
- Return type:
A dictionary mapping container nicknames to the set of auxnames the node produces
- property produces_containers
Any containers created by this node
Containers are returned as a dictionary mapping from the created container nickname to its parent container (or None otherwise). A container has a parent if it is a deep copy, shallow copy or view. Where a node updates a container it should have itself as the parent.
- property requires_aux
The auxdata required by this node, split by container
- Return type:
A dictionary mapping container nicknames to the set of auxnames the node requires
- requires_objects(required_output: Dict[Tuple[str, str], Expression]) Dict[Tuple[str, str], Expression]
The input objects that this algorithm needs
Part of the preselection mechanism. This method allows each node to tell the scheduler which objects upstream nodes need to have run for this node to produce the data required by downstream node.
For example, in the most simple case (which this default implementation satisfies) the node calculates object-by-object so the node only needs to see the objects on which its output is required.
- Parameters:
required_output (Dict[Dependency, boolean.Expression]) – The selections required downstream of this node on its outputs
- Return type:
The selections this node requires on its inputs
- class cpalgnodes.scheduler.OutputNode
Bases:
NodeNode representing the output of the scheduler
This is a dummy node (adds no algorithms) in the graph that represents the output. It produces no data and has all the required output information as required data dependencies
- create(container_info)
Create the algorithm(s) represented by this node
- Parameters:
container_info (Mapping[str, ContainerInfo]) – The ContainerInfo for each input/output container
- Returns:
The configured algorithm(s)
- Return type:
Union[Algorithm, List[Algorithm], ConfigFragment]
- property name: str
The name of this node (should be unique within the graph)
- property produces_aux
The auxdata produced by this node, split by container
- Return type:
A dictionary mapping container nicknames to the set of auxnames the node produces
- property produces_containers
Any containers created by this node
Containers are returned as a dictionary mapping from the created container nickname to its parent container (or None otherwise). A container has a parent if it is a deep copy, shallow copy or view. Where a node updates a container it should have itself as the parent.
- property requires_aux
The auxdata required by this node, split by container
- Return type:
A dictionary mapping container nicknames to the set of auxnames the node requires
- requires_objects(required_output: Dict[Tuple[str, str], Expression]) Dict[Tuple[str, str], Expression]
The input objects that this algorithm needs
Part of the preselection mechanism. This method allows each node to tell the scheduler which objects upstream nodes need to have run for this node to produce the data required by downstream node.
For example, in the most simple case (which this default implementation satisfies) the node calculates object-by-object so the node only needs to see the objects on which its output is required.
- Parameters:
required_output (Dict[Dependency, boolean.Expression]) – The selections required downstream of this node on its outputs
- Return type:
The selections this node requires on its inputs
- class cpalgnodes.scheduler.Scheduler
Bases:
GraphBuilderClass to build the dependency graph and create all the algorithms
After the graph has been configured and the order determined, the preselections necessary for each node are determined.
- add_input(container: str, auxitems: Iterable[str] = ())
Tell the scheduler about a container available on the input
- Parameters:
container (str) – The name of the container
auxitems (Iterable[str]) – Any auxdata available on the container
- add_node(node: Node)
Add a new node into the node list
If the node already exists this is a no-op
- Raises:
NodeAlreadyExistsError – There is a node with the same name that does not compare equal
OutputAlreadyProducedError – There is already a node which produces this output
- add_output(container: str, auxitems: Iterable[str] | Mapping[str, Expression] = ())
Tell the scheduler about a container we need available on the output
- Parameters:
container (str) – The name of the container
auxitems (Union[Iterable[str], Mapping[str, boolean.expression]]) – Any aux items needed on the output. Can either be a a list of auxitems or a mapping from aux item to the required output selection for that item.
- add_selection(container: str, selection: Selection)
Tell the scheduler about a new selection
- Parameters:
container (str) – The (nick)name of the container that this operates on
selection (Selection) – The selection
- Raises:
KeyError – The container.selection_name pair is already defined
- build_graph(ignore_unmet=True, extra_producers={}, extra_requirements={}) BaseGraph
Create the dependency graph
- Parameters:
ignore_unmet (bool) – If False, raise a KeyError if a node depends on data which has no known producer
- Return type:
The produced dependency graph
- Raises:
KeyError – A node depends on data with no known producer and ignore_unmet is False
CyclicGraphError – The graph contains cycles
- create_schedule(inputs: Mapping[str, str], outputs: Mapping[str, str], ignore_unmet: bool = True, prune_unused: bool = True, warn_unused: bool = True, annotate_order: str | None = None) Tuple[JobConfiguration, BaseGraph, Tuple[Node]]
Create the scheduled list of algorithms
- Parameters:
inputs (Mapping[str, str]) – Mapping from container nickname to the real input name in the StoreGate
outputs (Mapping[str, str]) – Mapping from container nickname to the real output name in the StoreGate
ignore_unmet (bool, optional) – If False, raise an exception where a dependency is not met. True implicitly assumes that unmet dependencies are available on the input, by default True
prune_unused (bool, optional) – If True, remove any nodes from the graph whose outputs are not used, by default True
warn_unused (bool, optional) – If True, warn if there are any nodes whose outputs are not used, by default True
annotate_order (str, optional) – If set, add an edge attribute to the graph saying if an edge is part of the order
- Returns:
JobConfiguration – The full job configuration
BaseGraph – The built dependency graph
Tuple[Node] – Any nodes whose outputs are not used (even if these have been pruned from the graph)
- flatten_selection(container: str, selection: str, _seen: List[str] | None = None) List[str]
Expand a selection into a list of selections implicit in this one
- Parameters:
container (str) – The nickname of the container that the selection is defined on
selection (str) – The name of the selection
_seen (Set[str], optional) – Internal parameter to detector cyclic definitions
- Raises:
ValueError – A cyclic definition is detected
KeyError – One of the selections involved is unknown
- get_container_parent(container: str) str | None
Get the parent of a container
Return None if it has no known parent
- get_producer(data, exact: bool = False) Node
Get the node which produces a particular data item
- Parameters:
data – The data being queried
exact (bool) – Some implementations of the builder allow for some data items to alias others. If exact is True it bypasses this mechanism, otherwise it does not. By default False
- Raises:
KeyError – No node produces this data:
- get_selection(container: str, name: str) Selection
Get the selection object
- Parameters:
container (str) – The (nick)name of the container
name (str) – The name of the selection
- Raises:
KeyError – No such selection exists
- Returns:
The corresponding selection object
- Return type:
- remove_node(node: Node)
Remove a node from the graph
- Parameters:
node (Union[str, BaseNode]) – The node to remove, can be the node object or its name
- Raises:
KeyError – The node does not exist in the graph builder
- rm_selection(container: str, selection_name: str)
Remove a selection from the scheduler
- Parameters:
container (str) – The container the selection is defined on
selection_name (str) – The name of the selection
- Raises:
KeyError – The selection does not exist
- set_output_selection(container: str, selection: str | Expression)
Set the selection on the output container
- cpalgnodes.scheduler.add_edge_to_dag(graph: DiGraph, source, destination)
Add an edge between two nodes, raising an error if doing so would create a cycle