Serialization
This page discusses how to save and load configuration objects:
Saving configuration
How to specify files/directories to be serialized
HuggingFace integration
Saving/Loading objects with configurations
Configuration objects can be loaded and saved. You can even embed them within any standard Python structure (i.e. dictionary, list, tuple).
You can use serialization methods to include init_tasks
in the deserialize process. This makes it easier to distribute
configurations that need to be initialized in a special way.
serialize()- Serialize configurations with init tasks.deserialize()- Deserialize configurations with init tasks.
A task configuration/instance can be loaded with from_task_dir().
The serialization context is controlled by SerializationContext.
If you need more control over saved data, you can use state_dict
and from_state_dict that respectively returns Python data structures
and loads from them.
state_dict()- Convert configurations to Python data structures.from_state_dict()- Load configurations from Python data structures.
Saving/Loading from running experiment
To ease saving/loading configuration from experiments, one can use methods from the experiment objects as follows:
from experimaestro import experiment, Param, Config
class MyConfig(Config):
a: Param[int]
if __name__ == "__main__":
# Saving configurations
with experiment("/tmp/load_save", "xp1", port=-1) as xp:
cfg = MyConfig.C(a=1)
xp.save([cfg])
# Loading configurations
with experiment("/tmp/load_save", "xp2", port=-1) as xp:
# Loads MyConfig(a=1)
cfg, = xp.load("xp1")
Specifying paths to be serialized
Configurations can be serialized with the data necessary to restore their state. This can be useful to share a model (e.g. with HuggingFace hub).
Use DataPath to annotate fields whose file or directory content
should be copied into the save directory during serialization:
When saving, each DataPath field is copied (using hard links when
possible) into the save directory under a relative path derived from
the field name. On loading, the relative path is resolved back to an
absolute path.
DataPath fields are ignored in identifier computation — changing
the data path does not change the task identifier.
Custom data serialization
For more control over which files are serialized and where they are
stored, override the __xpm_serialize__ method on your Config subclass.
This method receives a SerializationContext and
returns a dict mapping names to
SerializedPath objects.
By default, it serializes all DataPath fields. You can override it to
change destination paths, add extra data files, or skip certain fields:
from pathlib import Path
from experimaestro import Config, Param, DataPath
from experimaestro.core.context import SerializationContext, SerializedPath
class MyModel(Config):
name: Param[str]
weights: DataPath
def __xpm_serialize__(self, context: SerializationContext) -> dict[str, SerializedPath]:
# Call super() for default DataPath handling
result = super().__xpm_serialize__(context)
# Or customize: serialize weights under a different name
result["weights"] = context.serialize(
context.var_path + ["model_weights.bin"],
self.weights,
self,
)
# Add extra files not declared as DataPath
vocab_path = self.weights.parent / "vocab.txt"
if vocab_path.exists():
result["vocab"] = context.serialize(
context.var_path + ["vocab.txt"],
vocab_path,
self,
)
return result
The SerializationContext can also be subclassed
to customize the serialization process globally (e.g. to change how files
are copied or where they are stored). The serialize method receives the
config object, enabling per-config path logic.
Paths in params.json
Each submitted task writes its parameters to params.json in the job directory.
Starting with the v3 schema (experimaestro 2.4), Path values are encoded with
a "base" field that records what the value is relative to:
Encoding |
Stored as |
Used when |
|---|---|---|
Job-relative |
|
the path is inside the current job directory (typical for |
Workspace-relative |
|
the path points into another job in the same workspace (cross-job dependencies) |
Absolute |
|
the path lies outside the workspace, or was written by an older version |
Resolution at load time uses Env.taskpath for "job" and Env.wspath for
"workspace"; these are set automatically by run.py, from_task_dir, and
tools.jobs.load_job. Absolute paths and entries without a "base" key load
unchanged for backward compatibility.
The top-level "version" field records the schema version (PARAMS_JSON_VERSION
on ConfigInformation). A loader refuses to run a task whose params.json was
written by a newer experimaestro — upgrade rather than risk a silently wrong
path resolution.
HuggingFace integration
# ExperimaestroHFHub implements the interface from ModelHubMixin
# https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.ModelHubMixin
from experimaestro.huggingface import ExperimaestroHFHub
# True if the object should be an instance (and not a configuration)
as_instance = False
# Save and load a configuration
ExperimaestroHFHub(config).push_to_hub(hf_id)
ExperimaestroHFHub.from_pretrained(hf_id_or_folder, as_instance=as_instance)
# Save and load a configuration (with a variant)
ExperimaestroHFHub(config).push_to_hub(hf_id, variant)
ExperimaestroHFHub.from_pretrained(hf_id_or_folder, variant=variant, as_instance=as_instance)
ExperimaestroHFHub - HuggingFace Hub integration for experimaestro configurations. Key methods: from_pretrained(), push_to_hub().
Customizing HuggingFace serialization
Subclass ExperimaestroHFHub to customize the definition filename
or the serialization context:
from experimaestro.huggingface import ExperimaestroHFHub
from experimaestro.core.context import SerializationContext
class MyHFHub(ExperimaestroHFHub):
# Use a custom filename for the definition JSON
definition_filename = "my_model.json"
# Use a custom SerializationContext subclass
serialization_context_class = MySerializationContext
definition_filename: The JSON file storing the configuration definition (default:"experimaestro.json", with fallback to"definition.json"on load)serialization_context_class: TheSerializationContextclass used during serialization (default:SerializationContext)