odc.stac.load

odc.stac.load(items, bands=None, *, groupby='time', resampling=None, dtype=None, chunks=None, pool=None, crs=<odc.geo.types.Unset object>, resolution=None, anchor=None, geobox=None, bbox=None, lon=None, lat=None, x=None, y=None, like=None, geopolygon=None, progress=None, fail_on_error=True, stac_cfg=None, patch_url=None, preserve_original_order=False, driver=None, **kw)[source]

STAC Item to xarray.Dataset.

Load several STAC Item objects (from the same or similar collections) as an xarray.Dataset.

This method can load pixel data directly on a local machine or construct a Dask graph that can be processed on a remote cluster.

catalog = pystac.Client.open(...)
query = catalog.search(...)
xx = odc.stac.load(
    query.get_items(),
    bands=["red", "green", "blue"],
)
xx.red.plot.imshow(col="time")
Parameters:

Common Options

Parameters:
  • groupby (Union[str, Callable[[Item, ParsedItem, int], Any], None]) –

    Controls what items get placed in to the same pixel plane.

    Following have special meaning:

    • ”time” items with exactly the same timestamp are grouped together

    • ”solar_day” items captured on the same day adjusted for solar time

    • ”id” every item is loaded separately

    Any other string is assumed to be a key in Item’s properties dictionary.

    You can also supply custom key function, it should take 3 arguments (pystac.Item, ParsedItem, index:int) -> Any

  • preserve_original_order (bool) – By default items are sorted by time, id within each group to make pixel fusing order deterministic. Setting this flag to True will instead keep items within each group in the same order as supplied, so that one can implement arbitrary priority for pixel overlap cases.

  • resampling (Union[str, Dict[str, str], None]) – Controls resampling strategy, can be specified per band

  • dtype (Union[dtype[Any], None, type[Any], _SupportsDType[dtype[Any]], str, tuple[Any, int], tuple[Any, Union[SupportsIndex, Sequence[SupportsIndex]]], list[Any], _DTypeDict, tuple[Any, Any], Dict[str, Union[dtype[Any], None, type[Any], _SupportsDType[dtype[Any]], str, tuple[Any, int], tuple[Any, Union[SupportsIndex, Sequence[SupportsIndex]]], list[Any], _DTypeDict, tuple[Any, Any]]]]) – Force output dtype, can be specified per band

  • chunks (Optional[Dict[str, Union[int, Literal['auto']]]]) – Rather than loading pixel data directly, construct Dask backed arrays. chunks={'x': 2048, 'y': 2048}

  • progress (Optional[Any]) – Pass in tqdm progress bar or similar, only used in non-Dask load.

  • fail_on_error (bool) – Set this to False to skip over load failures.

  • pool (Union[ThreadPoolExecutor, int, None]) – Use thread pool to perform load locally, only used in non-Dask load.

Control Pixel Grid of Output

There are many ways to control footprint and resolution of returned data. The most precise way is to use GeoBox, geobox=GeoBox(..). Similarly one can use like=xx to match pixel grid to previously loaded data (xx = odc.stac.load(...)).

Other common way is to configure crs and resolution only

xx = odc.stac.load(...
    crs="EPSG:3857",
    resolution=10)

# resolution units must match CRS
# here we assume 1 degree == 111km to load at roughly
# the same 10m resolution as statement above.
yy = odc.stac.load(...
    crs="EPSG:4326",
    resolution=0.00009009)

By default odc.stac.load() loads all available pixels in the requested projection and resolution. To limit extent of loaded data you have to supply bounds via either geobox= or like= parameters (these also select projection and resolution). Alternatively use a pair of x, y or lon, lat parameters. x, y allows you to specify bounds in the output projection, while lon, lat operate in degrees. You can also use bbox which is equivalent to lon, lat.

It should be noted that returned data is likely to reach outside of the requested bounds by fraction of a pixel when using bbox, x, y or lon, lat mechanisms. This is due to pixel grid “snapping”. Pixel edges will still start at N*pixel_size where N is int regardless of the requested bounding box.

Parameters:
  • crs (Union[str, int, CRS, CRS, Dict[str, Any], Unset, None]) – Load data in a given CRS. Special name of "utm" is also understood, in which case an appropriate UTM projection will be picked based on the output bounding box.

  • resolution (Union[float, int, Resolution, None]) –

    Set resolution of output in units of the output CRS. This can be a single float, in which case pixels are assumed to be square with Y axis flipped. To specify non-square or non-flipped pixels use odc.geo.resxy_() or odc.geo.resyx_(). resolution=10 is equivalent to resolution=odc.geo.resxy_(10, -10).

    Resolution must be supplied in the units of the output CRS. Units are commonly meters for Projected and degrees for Geographic CRSs.

  • bbox (Optional[Tuple[float, float, float, float]]) – Specify bounding box in Lon/Lat. [min(lon), min(lat), max(lon), max(lat)]

  • lon (Optional[Tuple[float, float]]) – Define output bounds in Lon/Lat

  • lat (Optional[Tuple[float, float]]) – Define output bounds in Lon/Lat

  • x (Optional[Tuple[float, float]]) – Define output bounds in output projection coordinate units

  • y (Optional[Tuple[float, float]]) – Define output bounds in output projection coordinate units

  • anchor (Union[AnchorEnum, XY[float], float, Literal['center'], Literal['centre'], Literal['edge'], Literal['floating'], Literal['default'], None]) – Controls pixel snapping, default is to align pixel grid to X/Y axis such that pixel edges align with x=0, y=0. Other common option is to align pixel centers to 0,0 rather than edges.

  • geobox (Optional[GeoBox]) – Allows to specify exact region/resolution/projection using GeoBox object

  • like (Optional[Any]) – Match output grid to the data loaded previously.

  • geopolygon (Optional[Any]) – Limit returned result to a bounding box of a given geometry. This could be an instance of Geometry, GeoJSON dictionary, GeoPandas DataFrame, or any object implementing __geo_interface__. We assume EPSG:4326 projection for dictionary and Shapely inputs. CRS information available on GeoPandas inputs should be understood correctly.

STAC Related Options

Parameters:
Return type:

Dataset

Returns:

xarray.Dataset with requested bands populated

Complete Example Code

import planetary_computer as pc
from pystac_client import Client

from odc import stac

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
query = catalog.search(
    collections=["sentinel-2-l2a"],
    datetime="2019-06-06",
    query={"s2:mgrs_tile": dict(eq="06VVN")},
)

xx = stac.load(
    query.get_items(),
    bands=["red", "green", "blue"],
    resolution=100,  # 1/10 of the native 10m resolution
    patch_url=pc.sign,
)
xx.red.plot.imshow(col="time", size=8, aspect=1)

Example Optional Configuration

Sometimes data source might be missing some optional STAC extensions. With stac_cfg= parameter one can supply that information at load time. Configuration is per collection per asset. You can provide information like pixel data type, nodata value used, unit attribute and band aliases you would like to use.

Sample stac_cfg={..} parameter:

sentinel-2-l2a:  # < name of the collection, i.e. ``.collection_id``
  assets:
    "*":  # Band named "*" contains band info for "most" bands
      data_type: uint16
      nodata: 0
      unit: "1"
    SCL:  # Those bands that are different than "most"
      data_type: uint8
      nodata: 0
      unit: "1"
  aliases:  #< unique alias -> canonical map
    rededge: B05
    rededge1: B05
    rededge2: B06
    rededge3: B07

some-other-collection:
  assets:
  #...

"*": # Applies to all collections if not defined on a collection
  warnings: ignore  # ignore|all (default all)