We recommend using uv to manage the dependencies needed. Please refer to the uv documentation for installation instructions of uv.
To run this workflow, first clone this repository. Inside the repository, run:
uv syncThis will create a virtual environment in the .venv folder with all the required dependencies. To activate the virtual environment, run:
source .venv/bin/activateFor developers who want to make changes to the code, please install the pre-commit hooks for code formatting and linting. You can do this by running:
pre-commit installAfter this, every time you make a commit, the code will be automatically formatted and linted according to the rules defined in the .pre-commit-config.yaml file.
The model takes daily SST (or similar) data in video format: x ∈ ℝ^{B × 1 × T × H × W} and a daily_mask indicating missing pixels. It also takes
land_mask_patch indicating land regions in the output. The model does of the
following tasks:
- Combines video encoder, temporal attention, spatial transformer, and decoder
- Encodes 3D data (space, time) into spatio-temporal patches
- Aggregates temporal information per spatial patch
- Mixes spatial features across patches
- Decodes back to original spatial resolution
The architecture consists of the following steps:
# 1. Patch embedding:
X (VideoEncoder)---------> X_patch
# 2. Add temporal encoding +
# 3. Temporal aggregation:
X_patch + PE (TemporalAttentionAggregator)---------> X_temp_agg
# 4. Add spatial encoding +
# 5. Spatial transformer:
X_temp_agg + PE (SpatialTransformer) ---------> X_mixed
# 6. Decode to original resolution:
X_mixed (MonthlyConvDecoder)---------> OutputWe explain the model architecture in more detail in the code and math description document.
