Configurations
Defining experiments is based on config(urations) and tasks. Tasks are configurations that can be executed.
Defining a configuration
A configuration is defined whenever an object derives from Config.
When an identifier is not given, it is computed as __module__.__qualname__. In that case,
it is possible to shorten the definition using the Config class as a base class.
Example
from experimaestro import Param, Config
class MyModel(Config):
__xpmid__ = "my.model"
gamma: Param[float]
defines a configuration with name my.model and one argument gamma that has the type float.
__xpmid__ can also be a class method to generate dynamic ids for all descendant configurations
When __xpmid__ is missing, the qualified name is used.
Deprecating a configuration or attributes
When a configuration is moved (or equivalently its __xpmid__ changed), its signature
changes, and thus the same tasks can be run twice. To avoid this, use the @deprecate
annotation.
Example
from experimaestro import Param, Config, deprecate
class NewConfiguration(Config):
pass
@deprecate
class OldConfiguration(NewConfiguration):
# Only pass is allowed here
pass
It is possible to deprecate a parameter or option:
Example
from experimaestro import Param, Config, deprecate
class Learning(Config):
losses: Param[List[Loss]] = []
@deprecate
def loss(self, value):
# Checking that the new param is not used
assert len(self.losses) == 0
# We allow several losses to be defined now
self.losses.append(value)
Warning the signature will change when deprecating attributes
To fix the identifiers, one can use the deprecated command. This
will create symbolic links so that old jobs are preserved and
re-used.
experimaestro deprecated list WORKDIR
Object life cycle
Initialisation
During task execution, the objects are constructed following these steps:
- The object is constructed using
self.__init__() - The attributes are set (e.g.
gammain the example above) self.__post_init__()is called (if the method exists)- Pre-tasks are ran (if any, see below)
Sometimes, it is necessary to postpone a part of the initialization of a configuration
object because it depends on an external processing. In this case, the initializer decorator can
be used:
from experimaestro import Config, initializer
class MyConfig(Config):
# The decorator ensures the initializer can only be called once
@initializer
def initialize(self, ...):
# Do whatever is needed
pass
Initialization tasks
Sometimes, it is necessary to restore an object state from disk, and we want
to separate the loading mechanism from the configuration logic; in that case,
LightweightTask (a Config which must be subclassed) can be used.
Pre-tasks (deprecated)
from experimaestro import Config, LightweightTask
class Model(Config):
...
class ModelLoader(LightweightTask):
model: Param[Model]
def execute(self):
# Access the configuration through self.config
self.model.initialized = True
Lightweight tasks are executed automatically by using the add_pretasks
method of a configuration object.
class ModelLearner(Task):
model: Param[model]
def task_outputs(self, dep):
model = copyconfig(self.model)
return model.add_pretasks(dep(ModelLoader(model=model)))
When initializing a single Config, the SerializationLWTask,
a child class of LightweightTask, has a parameter value (of type Config).
The typical use case is when the state can be recovered from disk. In that case,
PathSerializationLWTask can be used -- it is a lightweight task configuration
object with two fields (value and path).
from experimaestro import Config, LightweightTask
class Model(Config):
...
class SerializedModel(PathSerializationLWTask):
def execute(self):
# Loads the model from disk
data = torch.load(self.path)
self.config.load_state_dict(data)
It is possible to copy pre-tasks from one configuration to another by using
add_pretasks_from. For instance
config2.add_pretasks_from(config1)
copies the pre-tasks of config1 to config2.
Initialization tasks
Initialization tasks can only be used when submitting a task. They are not associated with any configuration or task (as pre-tasks), and as such their use is more explicit (and leads to less errors and bugs).
To take the example of a model learner task, it would return a model loader only:
class ModelLearner(Task):
model: Param[model]
def task_outputs(self, dep):
return dep(ModelLoader(model=model))
When using the model:
model_loader = learner.submit()
Evaluate(model=model).submit(init_tasks=[model_loader])
Types
Possible types are:
- basic Python types (
str,int,float,bool) and pathspathlib.Path - lists, using
typing.List[T] - enumerations, using
Enumfrom theenumpackage - dictionaries (support for basic types in keys only) with
typing.Dict[U, V] - Other configurations
Parameters
class MyConfig(Config):
"""My configuration
Long description of the configuration.
Attributes:
x: The parameter x
y: The parameter y
"""
# With default value
x: Param[type] = value
# Alternative syntax, useful to avoid class properties
x: Annotated[type, default(value)]
# Without default value
y: Param[type]
# Using a docstring
z: Param[int]
"""Most important parameter of the model"""
namedefines the name of the argument, which can be retrieved by the instanceself(class) or passed as an argument (function)typeis the type of the argument (more details below)valuedefault value of the argument (if any). If the value equals to the default, the argument will not be included in the signature computation. This allows to add new parameters without changing the signature of past experiments (if the configuration is equivalent with the default value of course, otherwise do not use a default value!).
Constants
Constants are special parameters that cannot be modified. They are useful to note that the behavior of a configuration/task has changed, and thus that the signature should not be the same (as the result of the processing will differ).
class MyConfig(Config):
# Constant
version: Constant[str] = "2.1"
Metadata
Metadata are parameters which are ignored during the signature computation. For instance, the human readable name of a model would be a metadata. They are declared as parameters, but using the Meta type hint
class MyConfig(Config):
"""
Attributes:
count: The number of documents in the collection
"""
count: Meta[type]
It is also possible to dynamically change the type of an argument using the setmeta method:
from experimaestro import setmeta
# Forces the parameter to be a meta-parameter
a = setmeta(A(), True)
# Forces the parameter to be a meta-parameter
a = setmeta(A(), False)
Path option
It is possible to define special options that will be set to paths relative to the task directory. For instance,
class MyConfig(Config):
output: Annotated[Path, pathgenerator("output.txt")]
defines the instance variable path as a path .../output.txt within
the task directory. To ensure there are no conflicts, paths
are defined by following the config/task path, i.e. if the executed
task has a parameter model, model has a parameter optimization,
and optimization a path parameter loss.txt, then the file will be
./out/model/optimization/loss.txt.
Validation
If a configuration has a __validate__ method, it is called to validate
the values before a task is submitted. This allows to fail fast when parameters
are not valid.
class ModelLearn(Config):
batch_size: Param[int] = 100
micro_batch_size: Param[int] = 100
parameters: Annotated[Path, pathgenerator("parameters.pth")]
def __validate__(self):
assert self.batch_size % self.micro_batch_size == 0