Services
Services can be used to add some useful functionalities when running experiments. For instance, below is an example of tensorboard service
import logging
import threading
from pathlib import Path
from experimaestro import Config, tagspath
from experimaestro.scheduler.services import WebService, ServiceState
def cleanupdir(path: Path):
"""Remove directory contents"""
if path.exists():
for child in path.iterdir():
child.unlink()
class TensorboardService(WebService):
id = "tensorboard"
def __init__(self, path: Path):
super().__init__()
self.path = path
self.server = None
self.program = None
logging.info("You can monitor learning with:")
logging.info("tensorboard --logdir=%s", self.path)
self.url = None
def set_experiment(self, xp):
"""Called when added to an experiment - access xp.workdir etc."""
self.xp = xp
if xp.run_mode == RunMode.NORMAL_RUN:
self.path.mkdir(exist_ok=True, parents=True)
cleanupdir(self.path)
def state_dict(self) -> dict:
"""Return constructor arguments for service recreation."""
return {"path": self.path} # Path is automatically handled
def add(self, config: Config, path: Path):
if self.xp.run_mode == RunMode.NORMAL_RUN:
(self.path / tagspath(config)).symlink_to(path)
def description(self):
return "Tensorboard service"
def close(self):
if self.server:
self.state = ServiceState.STOPPING
self.server.shutdown()
self.state = ServiceState.STOPPED
def _serve(self, running: threading.Event):
import tensorboard as tb
try:
self.state = ServiceState.STARTING
self.program = tb.program.TensorBoard()
self.program.configure(
host="localhost",
logdir=str(self.path.absolute()),
path_prefix=f"/services/{self.id}",
port=0,
)
self.server = self.program._make_server()
self.url = self.server.get_url()
running.set()
self.state = ServiceState.RUNNING
self.server.serve_forever()
except Exception:
logging.exception("Error while starting tensorboard")
running.set()
Service State Serialization
For services to be visible in monitoring, they must implement state_dict() to
return constructor arguments needed to recreate the service. The base
Service.state_dict() returns an empty dict, so subclasses with constructor
parameters must override it.
Path Handling
Path values in state_dict() are automatically handled:
When storing: Paths are serialized to a special format
When recreating locally: Paths are restored as
PathobjectsWhen recreating remotely (SSH): Paths are translated to local cache and synced via rsync
Just return constructor arguments - paths are handled automatically:
def state_dict(self):
return {"log_dir": self.log_dir} # Path is auto-handled
Adding a service to an experiment
Services are added to an experiment using the
add_service() method. The method returns
the same service instance, allowing you to use it immediately.
from experimaestro import experiment
with experiment("/path/to/workdir", "my_experiment", port=12345) as xp:
# Add the tensorboard service to the experiment
# set_experiment(xp) is called automatically by add_service
tb = xp.add_service(TensorboardService(xp.workdir / "runs"))
# Submit a task
learner = Learner.C(...).submit()
# Register the task's log directory with tensorboard
# This creates a symlink so tensorboard can monitor the run
tb.add(learner, learner.logpath)
# Wait for completion
xp.wait()
When using the web interface (enabled by the port parameter), services
are accessible through the services menu. For WebService subclasses like
TensorboardService, the service URL is automatically proxied through the
experimaestro web interface.
TUI Service Management
When using the TUI monitor (experimaestro experiments monitor), services can
be managed using the following keyboard shortcuts:
Shortcut |
Action |
Description |
|---|---|---|
|
Start |
Start the selected service |
|
Stop |
Stop the selected service |
These shortcuts work both in:
Experiment Services tab: Shows services for the selected experiment
Main Services tab: Shows all running services across all experiments