# Configurations

In Experimaestro, a configuration object ({py:class}`~experimaestro.Config`) is a fundamental concept used to specify parameters and settings for tasks and experiments:

1. **Parameter Definition**: The configuration object defines the parameters
   needed for a task or experiment. These parameters can include data file
   paths, numerical values, strings, list and dictionaries. Configuration can be
   nested for more flexibility.
2. **Configuration identifier**: Different configurations yield different
   identifiers. This ensures that each folder name is associated with a unique
   configuration.
3. **Parameter Validation**: A dynamic type checkers ensure configuration values
   are compatible with their types. The configuration object can include
   validation rules to ensure that the parameters provided are in the correct
   format and within expected ranges.
4. **Documentation**: The configuration object can include documentation for
   each parameter, explaining its purpose and how it should be used. This
   documentation can be output (e.g., [experimaestro IR learning
   configurations](https://experimaestro-ir.readthedocs.io/en/latest/learning/index.html)).
5. **Flexibility and Extensibility**: Configuration objects are designed to be
   flexible and extensible, allowing users to add new parameters or modify
   existing ones as the requirements of the task evolve. In particular,
   *default* values can be introduced.


(configuration-identifiers)=
## Configuration identifiers

A configuration identifier in the context of systems like Experimaestro is a
unique identifier associated with a specific configuration object.
This identifier plays a crucial role in managing and referencing configurations,
especially in complex systems where multiple configurations are used. Here's a
detailed description:

1. **Uniqueness**: A configuration identifier (MD5 hash) is unique for each set
   of distinct experimental parameters.

2. **Run-Once Guarantee**: The unique identifiers ensure that each task is
   executed only once. This is particularly important in avoiding redundant
   computations and ensuring the efficiency of the workflow.


## Defining a configuration

A configuration is defined whenever an object derives from {py:class}`~experimaestro.Config`.

When an identifier is not given, it is computed as `__module__.__qualname__`. In that case,
it is possible to shorten the definition using the `Config` class as a base class.

:::{admonition} Example
:class: example

```python
from experimaestro import Param, Config

class MyModel(Config):
    __xpmid__ = "my.model"

    gamma: Param[float]
```

Here, {py:data}`~experimaestro.Param` is used to declare a parameter.
:::

defines a configuration with name `my.model` and one argument `gamma` that has the type `float`.

`__xpmid__` can also be a class method to generate dynamic ids for all descendant configurations
When `__xpmid__` is missing, the qualified name is used.

## Object hierarchy

When deriving `B` from {py:class}`~experimaestro.Config`, experimaestro creates a **configuration
object** `A.XPMConfig` from {py:class}`~experimaestro.core.objects.ConfigMixin` and `A`. When calling the
configuration constructor `A.C(...)` or (`B.C(...)`), the returned object is of
type `A.XPMConfig` or `B.XPMConfig`, which extends the original object with a
configuration specific behavior.

![object hierarchy](../img/xpm-objects.svg)


(composition-operator)=
## Composition operator

The `@` operator provides a concise syntax for composing configurations. When
a configuration has a parameter that accepts another configuration type, you
can use `@` instead of explicitly naming the parameter.

:::{admonition} Basic composition
:class: example

```python
from experimaestro import Config, Param

class Inner(Config):
    x: Param[int]

class Outer(Config):
    inner: Param[Inner]

# These two are equivalent:
outer1 = Outer.C(inner=Inner.C(x=42))
outer2 = Outer.C() @ Inner.C(x=42)
```
:::

The operator finds the unique parameter in the outer configuration that can
accept the inner configuration's type. If there are multiple matching
parameters or none, a {py:exc}`ValueError` is raised.

### Chaining compositions

When chaining multiple `@` operations, each configuration is added to the
**same** outer configuration (left-associative behavior):

```python
from experimaestro import Config, Param

class TypeA(Config):
    pass

class TypeB(Config):
    pass

class Multi(Config):
    a: Param[TypeA]
    b: Param[TypeB]

# Adds both TypeA and TypeB to Multi
result = Multi.C() @ TypeA.C() @ TypeB.C()
```

For **nested** structures, use parentheses to compose from inside out:

```python
from experimaestro import Config, Param

class Inner(Config):
    x: Param[int]

class Middle(Config):
    inner: Param[Inner]

class Outer(Config):
    middle: Param[Middle]

# Creates Outer(middle=Middle(inner=Inner(x=1)))
result = Outer.C() @ (Middle.C() @ Inner.C(x=1))
```

### Ambiguity and errors

The composition operator raises {py:exc}`ValueError` in two cases:

1. **No matching parameter**: The outer configuration has no parameter that
   accepts the inner type
2. **Ambiguous**: Multiple parameters can accept the inner type

```python
from experimaestro import Config, Param

class Inner(Config):
    x: Param[int]

class Ambiguous(Config):
    a1: Param[Inner]
    a2: Param[Inner]  # Same type as a1

# Raises ValueError: ambiguous - both a1 and a2 accept Inner
Ambiguous.C() @ Inner.C(x=1)
```


(deprecating-a-configuration-or-attributes)=
## Deprecating a configuration or attributes

When a configuration is moved (or equivalently its `__xpmid__` changed), its signature
changes, and thus the same tasks can be run twice. To avoid this, use the {py:func}`~experimaestro.deprecate`
decorator.

### Simple deprecation (legacy pattern)

For simple cases where the old and new configurations have the same parameters, use
{py:func}`~experimaestro.deprecate` with inheritance:

:::{admonition} Example
:class: example

```python
from experimaestro import Param, Config, deprecate

class NewConfiguration(Config):
    pass

@deprecate
class OldConfiguration(NewConfiguration):
    # Only pass is allowed here
    pass
```
:::

### Deprecation with conversion

For cases where the deprecated configuration has different parameters and needs to
be converted to the new format, use `@deprecate(TargetConfig)` with a `__convert__`
method:

:::{admonition} Example
:class: example

```python
from experimaestro import Param, Config, deprecate

class NewConfig(Config):
    """New configuration with a list of values."""
    values: Param[list[int]]

@deprecate(NewConfig)
class OldConfig(Config):
    """Old configuration with a single value."""
    value: Param[int]

    def __convert__(self):
        # Convert old single value to new list format
        return NewConfig(values=[self.value])
```
:::

The `__convert__` method should return an equivalent instance of the target
configuration. The identifier is computed from the converted configuration,
ensuring that equivalent old and new configurations produce the same job
identifier.

This also supports chained deprecation for multiple version migrations:

:::{admonition} Example
:class: example

```python
class ConfigV2(Config):
    values: Param[list[int]]

@deprecate(ConfigV2)
class ConfigV1(Config):
    value: Param[int]

    def __convert__(self):
        return ConfigV2(values=[self.value])

@deprecate(ConfigV1)
class ConfigV0(Config):
    val: Param[int]

    def __convert__(self):
        return ConfigV1(value=self.val)
```
:::

### Immediate replacement with replace=True

In some cases, you want the deprecated configuration to be immediately replaced
by the new one during creation. Use `replace=True` for this behavior:

:::{admonition} Example
:class: example

```python
from experimaestro import Param, Config, deprecate

class NewConfig(Config):
    values: Param[list[int]]

@deprecate(NewConfig, replace=True)
class OldConfig(Config):
    value: Param[int]

    def __convert__(self):
        return NewConfig(values=[self.value])

# Creating OldConfig actually returns a NewConfig instance
result = OldConfig.C(value=42)
print(type(result).__name__)  # "NewConfig.XPMConfig"
print(result.values)          # [42]
```
:::

With `replace=True`:

- Creating the deprecated configuration immediately calls `__convert__` and returns
  the new configuration type
- The original deprecated identifier is still preserved for `fix_deprecated` tool
  to create symlinks between old and new job directories
- If code tries to set an attribute that existed on the deprecated config but not
  on the new one, a warning is logged and the value is discarded

### Deprecating a parameter

It is possible to deprecate a parameter or option:

:::{admonition} Example
:class: example

```python
from typing import List
from experimaestro import Param, Config, deprecate

class Learning(Config):
    losses: Param[List[Loss]] = []

    @deprecate
    def loss(self, value):
        # Checking that the new param is not used
        assert len(self.losses) == 0
        # We allow several losses to be defined now
        self.losses.append(value)
```
:::

**Warning** the signature will change when deprecating attributes


To fix the identifiers, one can use the `deprecated` command. This
will create symbolic links so that old jobs are preserved and
re-used.

```sh
experimaestro deprecated list WORKDIR
```


## Object life cycle

### Initialisation

During [task](./task.md) execution, the objects are constructed following
these steps:

- The object is constructed using `self.__init__()`
- The attributes are set (e.g. `gamma` in the example above)
- `self.__post_init__()` is called (if the method exists)
- Pre-tasks are ran (if any, see below)

Sometimes, it is necessary to postpone a part of the initialization of a configuration
object because it depends on an external processing. In this case, the {py:func}`~experimaestro.initializer` decorator can
be used:

```python
from experimaestro import Config, initializer

class MyConfig(Config):
    # The decorator ensures the initializer can only be called once
    @initializer
    def initialize(self):
        # Do whatever is needed
        pass
```

## Types

Possible types are:

- basic Python types (`str`, `int`, `float`, `bool`) and paths `pathlib.Path`
- lists, using `list[T]`
- sets, using `set[T]` (see [Sets](#sets) below)
- enumerations, using `Enum` from the `enum` package
- dictionaries (support for basic types in keys only) with `dict[U, V]`
- Other configurations

(sets)=
### Sets

Sets (`set[T]`) are supported as parameter types. Since sets are unordered,
experimaestro sorts elements deterministically before computing the identifier,
ensuring that `{A, B}` and `{B, A}` always produce the same identifier.

For **primitive types** (int, str, float, enum), sets work directly:

```python
from experimaestro import Config, Param

class MyConfig(Config):
    tags: Param[set[str]]

# Order does not matter for identifiers
c1 = MyConfig.C(tags={"a", "b", "c"})
c2 = MyConfig.C(tags={"c", "a", "b"})
# c1 and c2 have the same identifier
```

For **Config objects**, elements must be **sealed** before being placed in a set,
because Config objects are only hashable once their identifier has been computed
and cached. Use {py:func}`~experimaestro.sealed_set` to seal and collect elements
in one step:

```python
from experimaestro import Config, Param, sealed_set

class Model(Config):
    lr: Param[float]

class Ensemble(Config):
    models: Param[set[Model]]

m1 = Model.C(lr=0.01)
m2 = Model.C(lr=0.02)

# sealed_set seals each Config element and returns a set
ensemble = Ensemble.C(models=sealed_set(m1, m2))
```

Alternatively, you can seal configs individually with {py:meth}`~experimaestro.Config.seal`:

```python
m1 = Model.C(lr=0.01).seal()
m2 = Model.C(lr=0.02).seal()
ensemble = Ensemble.C(models={m1, m2})
```

:::{warning}
Attempting to put an unsealed Config object in a `set[T]` parameter will raise
a `TypeError`. Always use `sealed_set()` or call `.seal()` on each element first.
:::

(parameters)=
## Parameters

```python
from experimaestro import Config, Param, field

class MyConfig(Config):
    """My configuration

    Long description of the configuration.

    Attributes:
        x: The parameter x
        y: The parameter y
    """
    # Default value ignored in identifier computation (backwards-compatible behavior)
    # If value == 1, it won't be included in identifier
    x: Param[int] = field(default=1, ignore_default=True)

    # Default value always included in identifier computation
    # Even if value == 1, it will be included in identifier
    w: Param[int] = field(default=1)

    # Factory default always included in identifier
    v: Param[SomeConfig] = field(default_factory=SomeConfig.C)

    # Factory default ignored when value == default
    u: Param[SomeConfig] = field(default_factory=SomeConfig.C, ignore_default=True)

    # Without default value
    y: Param[type]

    # Using a docstring
    z: Param[int]
    """Most important parameter of the model"""
```

- `name` defines the name of the argument, which can be retrieved by the instance `self` (class) or passed as an argument (function)
- `type` is the type of the argument (more details below)
- Default values can be specified using {py:class}`~experimaestro.field`:
    - `field(default=value)`: The default is `value`, always included in the signature computation.
    - `field(default=value, ignore_default=True)`: The default is `value`, and if the actual value equals the default, it won't be included in the signature computation. This allows adding new parameters without changing past experiment signatures.
    - `field(default_factory=callable)`: A callable that produces the default value, always included in the signature computation (like `default`).
    - `field(default_factory=callable, ignore_default=True)`: Factory default excluded from identifier when value equals the default.

:::{warning} Bare default values are deprecated
Using bare default values like `x: Param[int] = 23` is deprecated. This syntax is
ambiguous because it's unclear whether the default should be ignored in identifier
computation or not.

Use `field(default=23, ignore_default=True)` to keep the backwards-compatible behavior
(default ignored in identifier) or `field(default=23)` to always include the value in
identifier.

Run `experimaestro refactor default-values` to automatically convert bare defaults.
:::

:::{warning} `field(ignore_default=<value>)` syntax is deprecated
The old syntax `field(ignore_default=23)` (passing a value directly) is deprecated.
Use `field(default=23, ignore_default=True)` instead. The old syntax still works
but will emit a `DeprecationWarning`.
:::

### Default Values

:::{warning}
When changing an `ignore_default` value, the identifier of configurations
**might** change. The reason is explained below.
:::

Adding a new parameter to a `Config` with `field(default=..., ignore_default=True)` will not change the original `id`.

**Why?** The motivation is that with this behavior, you can add experimental parameters
that were previously hard-coded.

For instance, if the original class is:

```python
from experimaestro import Config, Param

class MyConfig(Config):
    a: Param[int]

obj = MyConfig.C(a = 2)
id_old = obj.__identifier__()
```

Then when using `field(default=..., ignore_default=True)` for parameter b will yield an object with the
same identifier when using the default value:

```python
from experimaestro import Config, Param, field

class MyConfig(Config):
    a: Param[int]
    b: Param[int] = field(default=4, ignore_default=True)


# When not setting `b`, the identifier is the same
obj = MyConfig.C(a = 2)
new_id = obj.__identifier__()
assert new_id == old_id

# When setting `b` to the default value, still the same
obj = MyConfig.C(a = 2, b = 4)
new_id = obj.__identifier__()
assert new_id == old_id
```

:::{warning}
The identifier can be different if only the ignore_default value is changed. In particular,
if the default value is 2 (and not 4)

```python
from experimaestro import Config, Param, field

class MyConfig(Config):
    a: Param[int]
    b: Param[int] = field(default=2, ignore_default=True)

# Here, `b` is not the default value
obj = MyConfig.C(a = 4, b = 4)
new_id = obj.__identifier__()
assert new_id != old_id
```
:::

### field(default=...) vs field(default=..., ignore_default=True)

The key difference between `field(default=...)` and `field(default=..., ignore_default=True)`:

| Feature | `field(default=X)` | `field(default=X, ignore_default=True)` |
|---------|-------------------|----------------------------------------|
| Default value | X | X |
| Included in identifier when value==X | **Yes** | **No** |
| Use case | When you want the default value to be part of the task signature | When adding new parameters to existing configs without breaking old identifiers |

The same applies to `default_factory`:

| Feature | `field(default_factory=F)` | `field(default_factory=F, ignore_default=True)` |
|---------|--------------------------|------------------------------------------------|
| Default value | F() | F() |
| Included in identifier when value==F() | **Yes** | **No** |

```python
from experimaestro import Config, Param, field

class ConfigA(Config):
    x: Param[int] = field(default=1)      # Always in identifier

class ConfigB(Config):
    x: Param[int] = field(default=1, ignore_default=True)  # Ignored if x==1

# These will have DIFFERENT identifiers
a = ConfigA.C(x=1)
b = ConfigB.C(x=1)
assert a.__identifier__() != b.__identifier__()

# But these will have the SAME identifier
a2 = ConfigA.C(x=2)
b2 = ConfigB.C(x=2)
# Both include x=2 in their identifiers (since 2 != default value)
```


### Overriding parameters

When a subclass redefines a parameter from a parent class, experimaestro issues
a warning to alert you about the potential unintended override. To intentionally
override a parent parameter, use `field(overrides=True)`:

```python
from experimaestro import Param, Config, field

class Parent(Config):
    value: Param[int]

class Child(Parent):
    # This will produce a warning about overriding 'value'
    value: Param[int]

class ChildWithOverride(Parent):
    # This explicitly marks the override as intentional - no warning
    value: Param[int] = field(overrides=True)
```

#### Type compatibility

When overriding a parameter, the new type must be compatible with the parent type:

- For **Config types**: The child type must be a subtype of the parent type (covariant)
- For **primitive types**: The types must match exactly

```python
from experimaestro import Param, Config, field

class BaseModel(Config):
    pass

class AdvancedModel(BaseModel):
    pass

class Parent(Config):
    model: Param[BaseModel]

# OK - AdvancedModel is a subtype of BaseModel
class Child(Parent):
    model: Param[AdvancedModel] = field(overrides=True)

# ERROR - str is not compatible with int (raises TypeError)
class BadChild(Parent):
    value: Param[str] = field(overrides=True)
```

### Constants

Constants ({py:data}`~experimaestro.Constant`) are special parameters that cannot be modified. They are useful to note that the
behavior of a configuration/task has changed, and thus that the signature should not be the
same (as the result of the processing will differ).

```python
from experimaestro import Config, Constant

class MyConfig(Config):
    # Constant
    version: Constant[str] = "2.1"
```

### Metadata

Metadata are parameters which are ignored during the signature computation. For
instance, the human readable name of a model would be a metadata. They are
declared as parameters, but using the {py:data}`~experimaestro.Meta` type hint.

Example

```python
from experimaestro import Config, Meta

class MyConfig(Config):
    """
    Attributes:
        count: The number of documents in the collection
    """
    count: Meta[type]
```

It is also possible to dynamically change the type of an argument using the {py:func}`~experimaestro.setmeta` function:

```python
from experimaestro import setmeta

# Forces the parameter to be a meta-parameter
a = setmeta(A(), True)

# Forces the parameter to be a meta-parameter
a = setmeta(A(), False)

```

### Path option

It is possible to define special options that will be set
to paths relative to the task directory using {py:class}`~experimaestro.PathGenerator`. For instance,

```python
from experimaestro import Config, Meta, PathGenerator, field
from pathlib import Path

class MyConfig(Config):
    output: Meta[Path] = field(default_factory=PathGenerator("output.txt"))
```

defines the instance variable `path` as a path `.../output.txt` within the task
directory. To ensure there are no conflicts, paths are defined by following the
config/task path, i.e. if the executed task has a parameter `model`, `model` has
a parameter `optimization`, and optimization a path parameter `loss.txt`, then
the file will be `./out/model/optimization/loss.txt`.

(partial-identifiers)=
### Partial Identifiers

Sometimes you want to share directories (like checkpoints) across tasks that
differ only in certain parameters. For example, when training a model with
different numbers of iterations but the same learning rate, you might want
all runs to share the same checkpoint directory.

**Partial identifiers** (using {py:func}`~experimaestro.partial` and {py:func}`~experimaestro.param_group`) allow you to define parameter subsets that compute partial
identifiers by excluding certain parameter groups. This enables:

- Sharing checkpoint directories across training runs with different iteration counts
- Resuming training from checkpoints saved by different configurations
- Organizing experiment outputs by logical parameter groups

#### Defining Parameter Groups

First, define parameter groups at module level:

```python
from experimaestro import param_group

# Create parameter groups
iter_group = param_group("iter")
model_group = param_group("model")
```

#### Using Partial Identifiers in Tasks

Define partial identifiers as class attributes and assign parameters to groups:

```python
from experimaestro import Task, Param, Meta, field, PathGenerator, partial, param_group
from pathlib import Path

iter_group = param_group("iter")

class Learn(Task):
    # Define a partial identifier that excludes iteration-related parameters
    checkpoints = partial(exclude_groups=[iter_group])

    # This parameter is in the iter group - excluded from partial identifier
    max_iter: Param[int] = field(groups=[iter_group])

    # This parameter has no group - included in partial identifier
    learning_rate: Param[float]

    # Path generated using the partial identifier
    checkpoints_path: Meta[Path] = field(
        default_factory=PathGenerator("checkpoints", partial=checkpoints)
    )

    def execute(self):
        # self.checkpoints_path will be in:
        # WORKSPACE/partials/TASK_ID/checkpoints/PARTIAL_ID/checkpoints/
        save_checkpoint(self.checkpoints_path / "model.pt")
```

#### How Partial Identifiers Work

- Tasks with the same values for **non-excluded** parameters will have the
  same partial identifier
- Tasks can have different values for **excluded** parameters and still
  share the same partial directory

```python
# (Assuming Learn is defined as above)

# These have different full identifiers but the SAME partial identifier
task1 = Learn.C(max_iter=100, learning_rate=0.1)
task2 = Learn.C(max_iter=200, learning_rate=0.1)

# This has a DIFFERENT partial identifier (learning_rate differs)
task3 = Learn.C(max_iter=100, learning_rate=0.2)
```

#### Partial Identifier Options

The `partial()` function supports several options:

| Option | Description |
|--------|-------------|
| `exclude_groups` | List of parameter groups to exclude from partial identifier |
| `include_groups` | List of groups to always include (overrides exclusion) |
| `exclude_all` | If True, exclude all parameters by default |
| `exclude_no_group` | If True, exclude parameters with no group assigned |

```python
from experimaestro import partial, param_group

iter_group = param_group("iter")
model_group = param_group("model")

# Exclude specific groups
checkpoints = partial(exclude_groups=[iter_group])

# Include only specific groups (exclude everything else)
model_params = partial(exclude_all=True, include_groups=[model_group])

# Exclude ungrouped parameters
grouped_only = partial(exclude_no_group=True)
```

#### Parameters in Multiple Groups

A parameter can belong to multiple groups:

```python
from experimaestro import Task, Param, field, partial, param_group

iter_group = param_group("iter")
model_group = param_group("model")

class MyTask(Task):
    checkpoints = partial(exclude_groups=[iter_group])

    # This parameter is in both groups
    x: Param[int] = field(groups=[iter_group, model_group])
```

If **any** of a parameter's groups is excluded, the parameter is excluded
from the partial identifier (unless overridden by `include_groups`).

## Validation

If a configuration has a `__validate__` method, it is called to validate
the values before a task is submitted. This allows to fail fast when parameters
are not valid.

```python
from experimaestro import Param, Config

class ModelLearn(Config):
    batch_size: Param[int] = 100
    micro_batch_size: Param[int] = 100

    def __validate__(self):
        assert self.batch_size % self.micro_batch_size == 0
```

(value-classes)=
## Value classes

By default, the configuration class itself is used to create instances. However,
you may want to use a different class for the runtime instance, especially when:

- You want to avoid importing heavy dependencies (like PyTorch) during configuration
- The runtime class needs to inherit from external classes (like `nn.Module`)
- You want to separate configuration logic from implementation logic

The `@Config.value_class()` decorator allows registering an external value class:

```python
from experimaestro import Config, Param

class Model(Config):
    hidden_size: Param[int]
    num_layers: Param[int] = 3

@Model.value_class()
class TorchModel(Model):
    """The actual PyTorch implementation"""

    def __post_init__(self):
        import torch.nn as nn
        # Now we can safely import PyTorch
        self.layers = nn.ModuleList([
            nn.Linear(self.hidden_size, self.hidden_size)
            for _ in range(self.num_layers)
        ])

    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x
```

### Value class requirements

The value class must:

1. **Be a subclass of the configuration class**: This ensures type compatibility
2. **Inherit from parent value classes**: If the parent configuration has a value class,
   the child value class must inherit from it

```python
from experimaestro import Config, Param

class BaseModel(Config):
    base_param: Param[int]

@BaseModel.value_class()
class BaseModelImpl(BaseModel):
    def base_method(self):
        return self.base_param * 2

class ChildModel(BaseModel):
    child_param: Param[int]

# Must inherit from BOTH ChildModel AND BaseModelImpl
@ChildModel.value_class()
class ChildModelImpl(ChildModel, BaseModelImpl):
    def child_method(self):
        return self.base_method() + self.child_param
```

### Accessing the value class

You can access the value class through the `XPMValue` property:

```python
# (Assuming Model is defined as above)

# Returns TorchModel if registered, or Model itself otherwise
Model.XPMValue

# Creating instances uses the value class automatically
config = Model.C(hidden_size=256)
instance = config.instance()  # Returns a TorchModel instance
```

### Skipping intermediate classes

If an intermediate class in the hierarchy doesn't have a value class,
child classes can still define their own:

```python
from experimaestro import Config, Param

class Base(Config):
    x: Param[int]

@Base.value_class()
class BaseImpl(Base):
    pass

class Middle(Base):  # No value class defined
    y: Param[int]

class Leaf(Middle):
    z: Param[int]

# LeafImpl must inherit from BaseImpl (skipping Middle which has no impl)
@Leaf.value_class()
class LeafImpl(Leaf, BaseImpl):
    pass
```

(accessing-configuration)=
## Accessing configuration from instances

When a configuration is instantiated (via `config.instance()`), it is often
useful to access the original configuration object from the resulting instance.
This is made possible by the `xpmconfig` property.

By default, `config.instance()` tracks the original configuration. This
behavior can be controlled with the `keep` parameter (which defaults to `True`).

```python
config = MyModel.C(gamma=0.1)
instance = config.instance()

# Access the original configuration
assert instance.xpmconfig is config
```

### Recursive tracking

Configuration tracking is recursive. If a configuration contains other
configuration objects as parameters, those sub-configurations are also
instantiated, and their instances will also have their `xpmconfig` property
set to the corresponding original sub-configuration.

```python
sub_config = SubModel.C(x=1)
main_config = MainModel.C(sub=sub_config)
instance = main_config.instance()

# Recursive tracking
assert instance.sub.xpmconfig is sub_config
```

### Unified API

The `xpmconfig` property is available on both configuration objects and
their instances. When called on a configuration object, it returns
the object itself. This allows configuration and instances to be used
interchangeably in many contexts.

```python
def process(obj):
    # Works whether obj is a Config or an instantiated value
    config = obj.xpmconfig
    print(f"Processing with config: {config}")
```

(instance-based-configurations)=
## Instance-based configurations

By default, two {py:class}`~experimaestro.Config` instances with identical parameters will have the same identifier. This is the desired behavior in most cases, as it ensures task deduplication and caching. However, in some scenarios, you need to distinguish between different instances even when their parameters are identical.

This is where {py:class}`~experimaestro.InstanceConfig` comes in. When a class derives from {py:class}`~experimaestro.InstanceConfig` instead of {py:class}`~experimaestro.Config`, each instance will have a unique identifier based on the order it appears during identifier computation.

### When to use InstanceConfig

Use {py:class}`~experimaestro.InstanceConfig` when:

- **Shared vs. Separate Resources**: You need to distinguish between shared and separate instances of the same configuration (e.g., shared model weights vs. separate model instances)
- **Multiple Identical Configurations**: The same configuration appears multiple times in a workflow, and each occurrence should be treated as distinct

:::{admonition} Shared vs Separate Model Instances
:class: example

```python
from experimaestro import Param, Config, InstanceConfig

class SubModel(InstanceConfig):  # Use InstanceConfig instead of Config
    """A model component that can be shared or separate"""
    hidden_size: Param[int] = 128

class Ensemble(Config):
    """An ensemble using multiple models"""
    model1: Param[SubModel]
    model2: Param[SubModel]

# Create two instances with identical parameters
sm1 = SubModel.C(hidden_size=128)
sm2 = SubModel.C(hidden_size=128)

# Case 1: Shared instance - the same SubModel is used for both parameters
# This means model1 and model2 share weights/state
shared_ensemble = Ensemble.C(model1=sm1, model2=sm1)

# Case 2: Separate instances - different SubModel instances for each parameter
# This means model1 and model2 have independent weights/state
separate_ensemble = Ensemble.C(model1=sm1, model2=sm2)

# The Ensemble configurations will have DIFFERENT identifiers
# Even though both use SubModel instances with hidden_size=128
assert shared_ensemble.__identifier__() != separate_ensemble.__identifier__()

# This distinction is important: with regular Config, both would have
# the same identifier since the parameters are identical. With InstanceConfig,
# the framework can distinguish between shared and separate instances.
```
:::

### Backwards compatibility

{py:class}`~experimaestro.InstanceConfig` is designed to be backwards compatible with existing experiments. The first occurrence of an {py:class}`~experimaestro.InstanceConfig` instance (with a given set of parameters) will have the same identifier as a regular {py:class}`~experimaestro.Config` would have. Only when a second instance with identical parameters is encountered does the instance order marker get added to the identifier.

This means you can migrate existing configurations to {py:class}`~experimaestro.InstanceConfig` without invalidating previous experiments, as long as you were only using a single instance of each configuration.

:::{warning}
Be careful when migrating to {py:class}`~experimaestro.InstanceConfig` if your workflow previously created multiple instances with the same parameters. The identifiers will change for the second and subsequent instances.
:::

### How it works

During identifier computation, Experimaestro tracks `InstanceConfig` instances by their base identifier (computed from parameters). When the same base identifier is encountered multiple times (but with different Python object instances), each occurrence after the first gets a unique instance order marker added to its identifier.

The instance order is deterministic and based on the traversal order during identifier computation, ensuring reproducibility across runs.


## Prepare configurations (data preparation)

A {py:class}`~experimaestro.Prepare` is a `Config` that declares an in-process
preparation step — typically downloading a dataset, fetching credentials, or
populating a local cache — that should run *before* any task that depends on
it. Library authors return a `Prepare` instance from helper functions like
`prepare_dataset(...)`; experimaestro discovers them automatically in any
submitted task's parameters and invokes `prepare()` exactly once per
identifier, in the driver Python process.

```python
from experimaestro import Prepare, Task, Param

class DatasetPrep(Prepare):
    name: Param[str]

    def prepare(self) -> None:
        # Idempotent: the underlying tool should no-op if the cache is warm.
        actually_fetch_huggingface(self.name)

class Train(Task):
    dataset: Param[DatasetPrep]

    def execute(self):
        ...

# Submitting the task auto-attaches a dependency on the Prepare:
Train.C(dataset=DatasetPrep.C(name="hf:foo")).submit()
```

Key properties:

- **No on-disk footprint.** A `Prepare` is a {py:class}`Resource
  <experimaestro.scheduler.dependencies.Resource>`, not a `Job`: there is no
  workdir under `jobs/`, no `.done` marker, no `params.json`. Idempotence is
  the responsibility of `prepare()` itself.
- **Dedup by identifier.** Two `Prepare` instances with the same parameters
  share a single execution. Many tasks referencing the same Prepare trigger
  at most one `prepare()` call per Python process.
- **Concurrent.** Distinct Prepares run in parallel — each task starts as
  soon as *its* prepares have completed.
- **`RunMode.PREPARE`.** Setting `--run-mode prepare` on
  `experimaestro run-experiment` runs every discovered `Prepare` referenced
  by submitted tasks while skipping the tasks themselves. This is useful to
  pre-warm a cache before submitting jobs to an offline cluster.

```bash
# Pre-warm all downloads referenced by the experiment, then run tasks
# (the second invocation can be offline).
experimaestro run-experiment --run-mode prepare my_experiment.py
experimaestro run-experiment my_experiment.py
```

### Where do results land in each run mode?

| Run mode      | `workspace/jobs/...`             | Cache populated by `prepare()` |
|---------------|----------------------------------|--------------------------------|
| `NORMAL`      | One folder per task (logs, outputs, `.done` / `.failed`) | Yes (prep runs before each task) |
| `PREPARE`     | **Nothing**                      | Yes (only effect on disk) |
| `GENERATE_ONLY` | `params.json` per task (no execution) | No |
| `DRY_RUN`     | Nothing                          | No |

The cache location is owned by whatever `prepare()` calls — usually
`~/.cache/datamaestro/` for datasets resolved via
[datamaestro](https://datamaestro.readthedocs.io), or
`~/.cache/huggingface/` for direct HF Hub downloads. Experimaestro itself
writes nothing for a `Prepare`.

`Prepare.prepare()` runs in the driver process via `asyncio.to_thread`, so
blocking I/O does not stall the scheduler loop.

**See also:** [How do I pre-download datasets or resources before running on an offline cluster?](../faq.md#how-do-i-pre-download-datasets-or-resources-before-running-on-an-offline-cluster) in the FAQ; the [MNIST demo](https://github.com/experimaestro/experimaestro-demo) exercises the end-to-end flow.


## How is a configuration identifier computed?

The principale is the following. Any value can be associated with a unique byte
string: the byte string is obtained by outputting the type of the value (e.g.
string, `ir.adhoc.dataset`) and the value itself as a binary string. A special
handling of configurations and tasks (objects) is performed by sorting keys in
ascending lexicographic order, thus ensuring the uniqueness of the
representation.

 Moreover:

- **Ignored default values** are removed when the value matches the default
  (e.g. `k1` when set to 0.9 with `field(default=0.9, ignore_default=True)`). This
  allows to handle the situation where one adds a new experimental parameter (e.g. a
  new loss component). In that case, using `field(default=..., ignore_default=True)`
  allows to add this parameter without invalidating all the previously ran experiments.
- **Regular default values** using `field(default=...)` are always included in
  the identifier computation, even when the value matches the default.
- **Ignored values** are removed (e.g. the number of threads when
  indexing, the path where the index is stored)