Launchers

Launchers, together with the Connector, specify how a task should be launched. There exist two types of launchers at the moment, direct launcher (starting a new process) or through slurm

Priority

All launchers have a priority property (default: 0) that determines their preference when used with a DynamicLauncher. Higher priority values indicate more preferred launchers.

from experimaestro.launchers.slurm import SlurmLauncher

# Create launchers with different priorities
fast_launcher = SlurmLauncher(options=SlurmOptions(partition="fast"), priority=10)
slow_launcher = SlurmLauncher(options=SlurmOptions(partition="slow"), priority=1)

Types

Direct

By default, jobs are launched directly by the scheduler using python scripts.

DirectLauncher - Launcher that runs tasks directly as local processes without any job scheduler.

Slurm

The Slurm workload manager launcher is supported. It is possible to use different settings for different jobs by using the config method of the launcher

from experimaestro.launchers.slurm import SlurmLauncher

launcher = SlurmLauncher(nodes=1)
gpulauncher = launcher.config(gpu_per_node=1)

with experiment(launcher=launcher):
    # Default
    mytask().submit()

    # If needed, options can be used
    mytask().submit(launcher=gpulauncher)

SlurmOptions - Configuration options for SLURM jobs (nodes, time, partition, GPUs, etc.).

SlurmLauncher - Launcher that submits tasks to the SLURM workload manager.

Dynamic Launcher

The DynamicLauncher allows dynamic selection from a list of launchers based on their priorities. This is useful when you have multiple execution options (e.g., different SLURM partitions or clusters) and want to select the best one at runtime.

from experimaestro.launchers import DynamicLauncher
from experimaestro.launchers.slurm import SlurmLauncher, SlurmOptions

# Create launchers with different priorities
fast_partition = SlurmLauncher(
    options=SlurmOptions(partition="fast", time="1:00:00"),
    priority=10
)
slow_partition = SlurmLauncher(
    options=SlurmOptions(partition="slow", time="24:00:00"),
    priority=1
)

# By default, selects highest priority (fast_partition)
# If priorities tie, samples uniformly among tied launchers
dynamic = DynamicLauncher([fast_partition, slow_partition])

# With sample=True, samples proportionally to priority
# fast_partition has 10/(10+1) ≈ 91% chance of being selected
dynamic_sampled = DynamicLauncher(
    [fast_partition, slow_partition],
    sample=True
)

with experiment(launcher=dynamic):
    mytask().submit()

Selection Modes

Default mode (sample=False): Selects the launcher with the highest priority. If multiple launchers share the highest priority, one is chosen uniformly at random.
Sampling mode (sample=True): Samples a launcher with probability proportional to its priority. All priorities must be positive in this mode.

Extending DynamicLauncher

You can subclass DynamicLauncher and override the update() method to refresh the launcher list before each job submission. This is useful for checking cluster availability or queue status:

class ClusterAwareLauncher(DynamicLauncher):
    def update(self):
        # Check cluster availability and update priorities
        for launcher in self._launchers:
            if is_cluster_available(launcher):
                launcher.priority = 10
            else:
                launcher.priority = 0

DynamicLauncher - Launcher that dynamically selects from a list of launchers based on priority.

Launcher file

The launchers.py file dictates how a given requirement (e.g., 2 CPU with 64Go of memory) is mapped to a given Launcher configuration.

Requirements

HostRequirement - Abstract base representing a disjunction of host requirements (alternatives).
HostSimpleRequirement - A single host requirement specifying CPU, GPU, and duration constraints.
AcceleratorSpecification - Generic accelerator (GPU) specification that matches any accelerator type.
CudaSpecification - Specifies NVIDIA CUDA GPU requirements (dedicated memory).
MPSSpecification - Specifies Apple Metal Performance Shaders requirements (unified memory).
CPUSpecification - Specifies CPU requirements (cores, memory).

Accelerator Types

The launcher finder supports multiple accelerator (GPU) types:

CUDA (cuda()): NVIDIA GPUs with dedicated memory. Use when you specifically need CUDA support.
MPS (mps()): Apple Silicon GPUs with unified memory (shared with CPU). Use for macOS Metal support.
Generic (gpu()): Matches any accelerator type. Use for cross-platform compatibility.

Tip

If your code runs on any accelerator (e.g., PyTorch code that works on both CUDA and MPS backends), prefer the generic gpu(...) over cuda(...): the same requirement will then match an NVIDIA GPU on a Linux cluster and an Apple Silicon GPU on a laptop. Reserve cuda(...) for code that genuinely requires CUDA (e.g., custom CUDA kernels or CUDA-only libraries).

Note

MPS uses unified memory - the GPU shares RAM with the CPU. When matching MPS requirements, the combined CPU + GPU memory request must not exceed the total system memory.

Parsing requirements

parse() - Parses a requirement specification string into a HostRequirement object.

Syntax elements:

duration=<N><unit>: Job duration (units: h/hours, d/days, m/mins)
cpu(mem=<size>, cores=<N>): CPU requirements
cuda(mem=<size>) * <N>: NVIDIA CUDA GPU requirements (memory and count)
mps(mem=<size>) * <N>: Apple MPS GPU requirements (unified memory)
gpu(mem=<size>) * <N>: Generic GPU requirements (matches any accelerator)
Memory sizes: <N>G, <N>GiB, <N>M, <N>MiB

Note

Memory sizes require an explicit unit: cuda(mem=32) is rejected with a ValueError (use cuda(mem=32G)).

Examples:

from experimaestro.launcherfinder.parser import parse

# Request 8 NVIDIA GPUs with 32GB each
req = parse("duration=40h & cpu(mem=700GiB) & cuda(mem=32GiB) * 8")

# Cross-platform code: generic GPU requirement (matches any accelerator)
req = parse("duration=2h & gpu(mem=4GiB)")

# Equivalent explicit alternative: CUDA on Linux/Windows OR MPS on macOS
req = parse("duration=4h & cuda(mem=8GiB) | duration=4h & mps(mem=8GiB)")

Requirements can be manipulated:

duration can be multiplied by a given coefficient using req.multiply_duration. For instance, req.multiply_duration(2) multiplies all the duration by 2.

Example

To construct launchers given a specification, you have to use a launchers.py file within the configuration directory.

from typing import Set
from experimaestro.launcherfinder import (
    HostRequirement,
    HostSpecification,
    AcceleratorSpecification,
    CudaSpecification,
    MPSSpecification,
    CPUSpecification,
)
from experimaestro.launchers.slurm import SlurmLauncher, SlurmOptions
from experimaestro.launchers.direct import DirectLauncher
from experimaestro.connectors.local import LocalConnector


def find_launcher(requirements: HostRequirement, tags: Set[str] = set()):
    """Find a launcher"""

    if match := requirements.match(HostSpecification(accelerators=[])):
        # No GPU: run directly
        return DirectLauncher(connector=LocalConnector.instance())

    # CUDA cluster with SLURM
    if match := requirements.match(
        HostSpecification(
            max_duration=100 * 3600,
            cpu=CPUSpecification(cores=32, memory=129 * (1024**3)),
            accelerators=[CudaSpecification(memory=24 * (1024**3)) for _ in range(8)],
        )
    ):
        if len(match.requirement.accelerators) > 0:
            return SlurmLauncher(
                connector=LocalConnector.instance(),
                options=SlurmOptions(gpus_per_node=len(match.requirement.accelerators)),
            )

    # Apple Silicon with MPS (unified memory)
    if match := requirements.match(
        HostSpecification(
            cpu=CPUSpecification(cores=8, memory=32 * (1024**3)),
            accelerators=[MPSSpecification(memory=32 * (1024**3))],
        )
    ):
        return DirectLauncher(connector=LocalConnector.instance())

    # Could not find a host
    return None

Note

The cuda= parameter is still supported for backwards compatibility but accelerators= is preferred for new code as it supports all accelerator types.

Tags

Tags can be used to filter out some launchers

from experimaestro.launcherfinder import find_launcher

find_launcher("""duration=4 days & cuda(mem=4G) * 2 & cpu(mem=400M, cores=4)""", tags=["slurm"])

will search for a launcher that has the tag slurm (see example below).

Debugging the launcher file

You can query the launcher finder directly — without running an experiment — to check which launcher a given requirement resolves to.

From the command line

The find-launchers command loads launchers.py from the configuration directory (~/.config/experimaestro by default, or the directory given with --config) and prints the launcher returned for a requirement specification:

experimaestro find-launchers 'duration=1h & cuda(mem=32G) * 1 & cpu(cores=8)'

# With an explicit configuration directory
experimaestro find-launchers --config /path/to/config-dir 'duration=1h & cuda(mem=32G) * 1'

From Python

The same query can be run from a REPL, which is convenient for inspecting intermediate objects:

from pathlib import Path
from experimaestro.launcherfinder import LauncherRegistry, find_launcher

# Optional: use a non-default configuration directory
LauncherRegistry.set_config_dir(Path("/path/to/config-dir"))

launcher = find_launcher("duration=1h & cuda(mem=32G) * 1 & cpu(cores=8)")
print(launcher)

find_launcher raises a LauncherNotFoundError when no launcher matches; the lower-level LauncherRegistry.instance().find(...) returns None instead.

Checking how a specification is parsed

If the returned launcher is not the one you expect, first check that the specification string is parsed the way you intend:

from experimaestro.launcherfinder.parser import parse

req = parse("duration=1h & cuda(mem=32G) * 1 & cpu(cores=8)")
for alternative in req.requirements:
    print(alternative)

Note that memory sizes require an explicit unit: cuda(mem=32) raises a ValueError instead of being silently interpreted as 32 bytes. Use cuda(mem=32G) instead.

Tracing the matching process

Since launchers.py is plain Python, you can add logging.debug calls in your find_launcher function (e.g., to print every matching host specification and its score) and enable debug logging when querying:

import logging
logging.basicConfig(level=logging.DEBUG)