Compute API Overview

An introduction to the Descartes Labs Compute API and its key concepts, classes, and methods.

Overview

The Descartes Labs Compute API, or Batch Compute, is a highly scalable cloud computing service designed to parallelize any computation, specifically those leveraging our Catalog API and its datasets. This service packages up your Python code and executes it on nodes hosted by Descartes Labs.

Please visit the Compute Guide for a more in-depth primer on the Batch Compute service.

This document gives a high-level introduction to the Compute API. For more detailed, practically applied tutorial notebooks please reference the Example Notebooks on GitHub or install them by running the following command:

git clone https://github.com/descarteslabs/example-notebooks.git

Functions and Jobs

The core object within Batch Compute is the Function, a serverless, user-configurable cloud function. Functions can be treated like any other Python function, taking any arbitrary input argument and performing whatever task it is written to do, such as writing a new object back to Catalog or returning some calculated statistic. The Batch Compute service was designed to simplify complex cloud infrastructure and lower the barrier to scale beyond the limitations of your local computer.

Each time we invoke, or pass new arguments to, a Function, that triggers a single Job, which we can track in real time using the Compute UI. Each user can run up to 1000 concurrent Jobs.

Creating a Function

To create a Function, we start by iterating on your local environment on a Python function. At its simplest level, the function will accept input parameters, define some processing to be performed, and optionally return some output.

def hello_world(arg):
     print(arg)
     return arg

We then can instantiate our Function by passing a few key parameters:

Our locally defined Python function
A Function name
Image URL, which is always "latest:pythonX.Y" where X corresponds to your major Python version and Y the minor version, such as "latest:python3.10" for a local Python 3.10 distribution.
Number of CPUs per Job, up to 16 vCPUs
Memory allotted per Job, up to 120GB
Maximum Concurrency, e.g. how many active Jobs can run in parallel
Timeout before canceling a Job, in seconds
Retry Count if a Job fails
Optional requirements for dependencies not installed by default

from descarteslabs.compute import Function

async_func = Function(
     hello_world,
     name='my-compute-hello',
     image="python3.10:latest",
     cpus=1,
     memory=2048,
     maximum_concurrency=10,
     timeout=600,
     retry_count=3,
)

async_func.save()

Only certain combinations of CPUs and memory are available. Visit the Documentation for more details.

Function Lifecycles and Managing Active Jobs

Once a Function is saved, it can be tracked by both its ID and Name on the Compute Monitor UI. The general lifecycle of a Function starts with in the state of pending, while the initial Docker image is built. If a Function fails to build, access the Build Log to determine the point of failure through either the UI or through the Function itself. Once a Function is successfully built it starts to schedule the pending Jobs. You can submit new Jobs at any time even while the Function is building.

Submit a new Job to your Function the same way as a local Python function, or in bulk by passing an iterable into the .map() method:

job = async_func("Hello World")
jobs = async_func.map(["H", "e", "l", "l", "o"])

Now that the Function has Jobs to iterate through, track the progress either through the API directly, or interactively through the Compute Monitor UI:

This interface allows the user to:

Modify a Function's maximum concurrency, number of CPUs, memory
Delete and stop a running Function
Access the Function's Build Logs, in case of a failure to build
Access each individual Job's input arguments, runtime, status, logs, and outputs

Retrieving Results of a Function

Typically, pipelines built on the Descartes Labs Platform are significantly more complex than print("Hello World").In practice, these Functions can return any number of results, such as complex time-series statistics, which must be managed in bulk and not suited for individual inspection in the UI.

Each Job's result can be accessed through either the Job object itself or through Blobs that are created with each Job. The best way to access all of the results of a Function's Jobs is to construct a search filter to retrieve a list of Job results. This requires both the current user's namespace, which can be accessed through the Auth module, and the Function's unique ID:

import descarteslabs as dl

auth = Auth.get_default_auth()
namespace = f"{auth.payload['org']}:{auth.namespace}"

for b in (
    Blob.search()
    .filter(p.namespace == namespace)
    .filter(p.name.startswith(f"{async_func.id}/"))
    .filter(p.storage_type == StorageType.COMPUTE)
):
    print(f"ID: {b.id}")
    print(b.data())
    print("\n")

Oftentimes Functions aren't designed to return any meaningful information, such as when tasked with writing new imagery to personal Catalog Products. In those cases, there is no need to search and reference each Job's result.

Compute Best Practices

The Compute API is best utilized when paired with high-throughput access to raster data through the Catalog API. Typical use cases include the generation of dense time-series statistics such as daily weather conditions over large areas and scaled inference and training of AI models at high spatial resolution.

Catalog Limits

Since each user is allotted up to 1000 concurrent Jobs, it is necessary to be aware of the Quotas and Limits that pertain to the various creation and retrieval methods of the Catalog API.

If your Jobs are failing due to a "Maximum retries" error, this is most likely due to a limit on the Catalog end!

Scaling Processing Efficiently

Once familiar with the Catalog limitations, it is also beneficial to plan out an optimal method over which large spatiotemporal scales are divided and submitted to the Compute service itself. In general, try to avoid duplicating both spatial and temporal searches and retrievals of data - think in a "read-once and store-only-your-results" paradigm. In this spirit, it is best to choose an optimal tiling grid, such as with DLTiles, which strike a balance of per-Job runtime and overall count of tiles. In some cases, especially with coarser imagery such as weather data, it may be wisest to submit Image IDs as an input argument versus the DLTile ID approach that is generally used.

Contact support@descarteslabs.com with any questions on the overall efficiency and performance of your processing!