Serialization

This page discusses how to save and load configuration objects:

  1. Saving configuration

  2. How to specify files/directories to be serialized

  3. HuggingFace integration

Saving/Loading objects with configurations

Configuration objects can be loaded and saved. You can even embed them within any standard Python structure (i.e. dictionary, list, tuple).

from experimaestro import load, save

# Saves the object
save([obj1, obj2, {key: obj3, key2: obj4}], "/my/folder")

# Load the object
[obj1, obj2, obj_dict] = load("/my/folder")
  • load() - Load configuration objects from a folder.

  • save() - Save configuration objects to a folder.

You can use serialization methods to include init_tasks in the deserialize process. This makes it easier to distribute configurations that need to be initialized in a special way.

A task configuration/instance can be loaded with from_task_dir().

The serialization context is controlled by SerializationContext.

If you need more control over saved data, you can use state_dict and from_state_dict that respectively returns Python data structures and loads from them.

Saving/Loading from running experiment

To ease saving/loading configuration from experiments, one can use methods from the experiment objects as follows:

from experimaestro import experiment, Param, Config

class MyConfig(Config):
    a: Param[int]

if __name__ == "__main__":
    # Saving configurations
    with experiment("/tmp/load_save", "xp1", port=-1) as xp:
        cfg = MyConfig.C(a=1)
        xp.save([cfg])


    # Loading configurations
    with experiment("/tmp/load_save", "xp2", port=-1) as xp:
        # Loads MyConfig(a=1)
        cfg, = xp.load("xp1")

Specifying paths to be serialized

Configurations can be serialized with the data necessary to restore their state. This can be useful to share a model (e.g. with HuggingFace hub).

Use DataPath to annotate fields whose file or directory content should be copied into the save directory during serialization:

from experimaestro import Config, DataPath

class MyConfig(Config):
    to_serialize: DataPath
    """This path will be serialized alongside the configuration"""

When saving, each DataPath field is copied (using hard links when possible) into the save directory under a relative path derived from the field name. On loading, the relative path is resolved back to an absolute path.

DataPath fields are ignored in identifier computation — changing the data path does not change the task identifier.

Custom data serialization

For more control over which files are serialized and where they are stored, override the __xpm_serialize__ method on your Config subclass. This method receives a SerializationContext and returns a dict mapping names to SerializedPath objects.

By default, it serializes all DataPath fields. You can override it to change destination paths, add extra data files, or skip certain fields:

from pathlib import Path
from experimaestro import Config, Param, DataPath
from experimaestro.core.context import SerializationContext, SerializedPath

class MyModel(Config):
    name: Param[str]
    weights: DataPath

    def __xpm_serialize__(self, context: SerializationContext) -> dict[str, SerializedPath]:
        # Call super() for default DataPath handling
        result = super().__xpm_serialize__(context)

        # Or customize: serialize weights under a different name
        result["weights"] = context.serialize(
            context.var_path + ["model_weights.bin"],
            self.weights,
            self,
        )

        # Add extra files not declared as DataPath
        vocab_path = self.weights.parent / "vocab.txt"
        if vocab_path.exists():
            result["vocab"] = context.serialize(
                context.var_path + ["vocab.txt"],
                vocab_path,
                self,
            )

        return result

The SerializationContext can also be subclassed to customize the serialization process globally (e.g. to change how files are copied or where they are stored). The serialize method receives the config object, enabling per-config path logic.

Paths in params.json

Each submitted task writes its parameters to params.json in the job directory. Starting with the v3 schema (experimaestro 2.4), Path values are encoded with a "base" field that records what the value is relative to:

Encoding

Stored as

Used when

Job-relative

{"type": "path", "value": "out/file.txt", "base": "job"}

the path is inside the current job directory (typical for PathGenerator outputs)

Workspace-relative

{"type": "path", "value": "jobs/<id>/<hash>/file", "base": "workspace"}

the path points into another job in the same workspace (cross-job dependencies)

Absolute

{"type": "path", "value": "/abs/path"}

the path lies outside the workspace, or was written by an older version

Resolution at load time uses Env.taskpath for "job" and Env.wspath for "workspace"; these are set automatically by run.py, from_task_dir, and tools.jobs.load_job. Absolute paths and entries without a "base" key load unchanged for backward compatibility.

The top-level "version" field records the schema version (PARAMS_JSON_VERSION on ConfigInformation). A loader refuses to run a task whose params.json was written by a newer experimaestro — upgrade rather than risk a silently wrong path resolution.

HuggingFace integration

# ExperimaestroHFHub implements the interface from ModelHubMixin
# https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.ModelHubMixin
from experimaestro.huggingface import ExperimaestroHFHub

# True if the object should be an instance (and not a configuration)
as_instance = False

# Save and load a configuration
ExperimaestroHFHub(config).push_to_hub(hf_id)
ExperimaestroHFHub.from_pretrained(hf_id_or_folder, as_instance=as_instance)

# Save and load a configuration (with a variant)
ExperimaestroHFHub(config).push_to_hub(hf_id, variant)
ExperimaestroHFHub.from_pretrained(hf_id_or_folder, variant=variant, as_instance=as_instance)

ExperimaestroHFHub - HuggingFace Hub integration for experimaestro configurations. Key methods: from_pretrained(), push_to_hub().

Customizing HuggingFace serialization

Subclass ExperimaestroHFHub to customize the definition filename or the serialization context:

from experimaestro.huggingface import ExperimaestroHFHub
from experimaestro.core.context import SerializationContext

class MyHFHub(ExperimaestroHFHub):
    # Use a custom filename for the definition JSON
    definition_filename = "my_model.json"

    # Use a custom SerializationContext subclass
    serialization_context_class = MySerializationContext
  • definition_filename: The JSON file storing the configuration definition (default: "experimaestro.json", with fallback to "definition.json" on load)

  • serialization_context_class: The SerializationContext class used during serialization (default: SerializationContext)