Access Sentinel 2 Data on Planetary Computer

Binder

Setup Instructions

This notebook is meant to run on Planetary Computer lab hub.

[1]:
import dask.distributed
import dask.utils
import numpy as np
import planetary_computer as pc
import xarray as xr
from IPython.display import display
from pystac_client import Client

from odc.stac import configure_rio, stac_load

Configuration

For now we need to manually supply band dtype and nodata information for each band in the collection. Use band named * as a wildcard.

[2]:
cfg = {
    "sentinel-2-l2a": {
        "assets": {
            "*": {"data_type": "uint16", "nodata": 0},
            "SCL": {"data_type": "uint8", "nodata": 0},
            "visual": {"data_type": "uint8", "nodata": 0},
        },
    },
    "*": {"warnings": "ignore"},
}

Start Dask Client

This step is optional, but it does improve load speed significantly. You don’t have to use Dask, as you can load data directly into memory of the notebook.

[3]:
client = dask.distributed.Client()
configure_rio(cloud_defaults=True, client=client)
display(client)

Client

Client-cf90ae1a-caaa-11ed-8749-6045bde8e932

Connection method: Cluster object Cluster type: distributed.LocalCluster
Dashboard: http://127.0.0.1:8787/status

Cluster Info

Query STAC API

Here we are looking for datasets in sentinel-2-l2a collection from June 2019 over MGRS tile 06VVN.

[4]:
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

query = catalog.search(
    collections=["sentinel-2-l2a"],
    datetime="2019-06",
    query={"s2:mgrs_tile": dict(eq="06VVN")},
)

items = list(query.get_items())
print(f"Found: {len(items):d} datasets")
Found: 23 datasets

Lazy load all the bands

We won’t use all the bands but it doesn’t matter as bands that we won’t use won’t be loaded. We are “loading” data with Dask, which means that at this point no reads will be happening just yet.

If you were to skip warnings: ignore in the configuration, you’ll see a warning about rededge common name being used on several bands. Basically we can only work with common names that uniquely identify some band. In this case EO extension defines common name rededge for bands 5, 6 and 7.

[5]:
resolution = 10
SHRINK = 4
if client.cluster.workers[0].memory_limit < dask.utils.parse_bytes("4G"):
    SHRINK = 8  # running on Binder with 2Gb RAM

if SHRINK > 1:
    resolution = resolution * SHRINK

xx = stac_load(
    items,
    chunks={"x": 2048, "y": 2048},
    stac_cfg=cfg,
    patch_url=pc.sign,
    resolution=resolution,
)

print(f"Bands: {','.join(list(xx.data_vars))}")
display(xx)
/tmp/binder_env/lib/python3.10/site-packages/distributed/worker_memory.py:492: FutureWarning: The `Nanny.memory_limit` attribute has been moved to `Nanny.memory_manager.memory_limit
  warnings.warn(
Bands: AOT,B01,B02,B03,B04,B05,B06,B07,B08,B09,B11,B12,B8A,SCL,WVP,visual
<xarray.Dataset>
Dimensions:      (y: 1373, x: 1373, time: 23)
Coordinates:
  * y            (y) float64 6.8e+06 6.8e+06 6.8e+06 ... 6.69e+06 6.69e+06
  * x            (x) float64 4e+05 4e+05 4.001e+05 ... 5.096e+05 5.097e+05
    spatial_ref  int32 32606
  * time         (time) datetime64[ns] 2019-06-01T21:15:21.024000 ... 2019-06...
Data variables: (12/16)
    AOT          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    B01          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    B02          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    B03          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    B04          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    B05          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    ...           ...
    B11          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    B12          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    B8A          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    SCL          (time, y, x) uint8 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    WVP          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    visual       (time, y, x) uint8 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>

By default stac_load will return all the data bands using canonical asset names. But we can also request a subset of bands, by supplying bands= parameter. When going this route you can also use “common name” to refer to a band.

In this case we request red,green,blue,nir bands which are common names for bands B04,B03,B02,B08 and SCL band which is a canonical name.

[6]:
xx = stac_load(
    items,
    bands=["red", "green", "blue", "nir", "SCL"],
    resolution=resolution,
    chunks={"x": 2048, "y": 2048},
    stac_cfg=cfg,
    patch_url=pc.sign,
)

print(f"Bands: {','.join(list(xx.data_vars))}")
display(xx)
Bands: red,green,blue,nir,SCL
<xarray.Dataset>
Dimensions:      (y: 1373, x: 1373, time: 23)
Coordinates:
  * y            (y) float64 6.8e+06 6.8e+06 6.8e+06 ... 6.69e+06 6.69e+06
  * x            (x) float64 4e+05 4e+05 4.001e+05 ... 5.096e+05 5.097e+05
    spatial_ref  int32 32606
  * time         (time) datetime64[ns] 2019-06-01T21:15:21.024000 ... 2019-06...
Data variables:
    red          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    green        (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    blue         (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    nir          (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
    SCL          (time, y, x) uint8 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>

Do some math with bands

[7]:
def to_float(xx):
    _xx = xx.astype("float32")
    nodata = _xx.attrs.pop("nodata", None)
    if nodata is None:
        return _xx
    return _xx.where(xx != nodata)


def colorize(xx, colormap):
    return xr.DataArray(colormap[xx.data], coords=xx.coords, dims=(*xx.dims, "band"))
[8]:
# like .astype(float32) but taking care of nodata->NaN mapping
nir = to_float(xx.nir)
red = to_float(xx.red)
ndvi = (nir - red) / (
    nir + red
)  # < This is still a lazy Dask computation (no data loaded yet)

# Get the 5-th time slice `load->compute->plot`
_ = ndvi.isel(time=4).compute().plot.imshow(size=7, aspect=1.2, interpolation="bicubic")
../_images/notebooks_stac-load-S2-ms_15_0.png

For sample purposes work with first 6 observations only

[9]:
xx = xx.isel(time=np.s_[:6])
[10]:
# fmt: off
scl_colormap = np.array(
    [
        [255,   0, 255, 255],  # 0  - NODATA
        [255,   0,   4, 255],  # 1  - Saturated or Defective
        [0  ,   0,   0, 255],  # 2  - Dark Areas
        [97 ,  97,  97, 255],  # 3  - Cloud Shadow
        [3  , 139,  80, 255],  # 4  - Vegetation
        [192, 132,  12, 255],  # 5  - Bare Ground
        [21 , 103, 141, 255],  # 6  - Water
        [117,   0,  27, 255],  # 7  - Unclassified
        [208, 208, 208, 255],  # 8  - Cloud
        [244, 244, 244, 255],  # 9  - Definitely Cloud
        [195, 231, 240, 255],  # 10 - Thin Cloud
        [222, 157, 204, 255],  # 11 - Snow or Ice
    ],
    dtype="uint8",
)
# fmt: on

# Load SCL band, then convert to RGB using color scheme above
scl_rgba = colorize(xx.SCL.compute(), scl_colormap)

# Check we still have geo-registration
scl_rgba.odc.geobox
[10]:

GeoBox

Dimensions
1,373x1,373
EPSG
32606
Resolution
80m
Cell
200px
WKT
PROJCRS["WGS 84 / UTM zone 6N",
    BASEGEOGCRS["WGS 84",
        ENSEMBLE["World Geodetic System 1984 ensemble",
            MEMBER["World Geodetic System 1984 (Transit)"],
            MEMBER["World Geodetic System 1984 (G730)"],
            MEMBER["World Geodetic System 1984 (G873)"],
            MEMBER["World Geodetic System 1984 (G1150)"],
            MEMBER["World Geodetic System 1984 (G1674)"],
            MEMBER["World Geodetic System 1984 (G1762)"],
            MEMBER["World Geodetic System 1984 (G2139)"],
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ENSEMBLEACCURACY[2.0]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 6N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-147,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Engineering survey, topographic mapping."],
        AREA["Between 150°W and 144°W, northern hemisphere between equator and 84°N, onshore and offshore. United States (USA) - Alaska (AK)."],
        BBOX[0,-150,84,-144]],
    ID["EPSG",32606]]
[11]:
_ = scl_rgba.plot.imshow(col="time", col_wrap=3, size=3, interpolation="antialiased")
../_images/notebooks_stac-load-S2-ms_19_0.png

Let’s save image dated 2019-06-04 to a cloud optimized geotiff file.

[12]:
to_save = scl_rgba.isel(time=3)
fname = f"SCL-{to_save.time.dt.strftime('%Y%m%d').item()}.tif"
print(f"Saving to: '{fname}'")
Saving to: 'SCL-20190604.tif'
[13]:
scl_rgba.isel(time=3).odc.write_cog(
    fname,
    overwrite=True,
    compress="webp",
    webp_quality=90,
)
[13]:
PosixPath('SCL-20190604.tif')

Check the file with rio info.

[14]:
!ls -lh {fname}
!rio info {fname} | jq .
-rw-r--r-- 1 runner docker 444K Mar 25 01:17 SCL-20190604.tif
{
  "blockxsize": 512,
  "blockysize": 512,
  "bounds": [
    399920,
    6690240,
    509760,
    6800080
  ],
  "colorinterp": [
    "red",
    "green",
    "blue",
    "alpha"
  ],
  "compress": "webp",
  "count": 4,
  "crs": "EPSG:32606",
  "descriptions": [
    null,
    null,
    null,
    null
  ],
  "driver": "GTiff",
  "dtype": "uint8",
  "height": 1373,
  "indexes": [
    1,
    2,
    3,
    4
  ],
  "interleave": "pixel",
  "lnglat": [
    -147.83078743077945,
    60.83917440326895
  ],
  "mask_flags": [
    [
      "per_dataset",
      "alpha"
    ],
    [
      "per_dataset",
      "alpha"
    ],
    [
      "per_dataset",
      "alpha"
    ],
    [
      "all_valid"
    ]
  ],
  "nodata": null,
  "res": [
    80,
    80
  ],
  "shape": [
    1373,
    1373
  ],
  "tiled": true,
  "transform": [
    80,
    0,
    399920,
    0,
    -80,
    6800080,
    0,
    0,
    1
  ],
  "units": [
    null,
    null,
    null,
    null
  ],
  "width": 1373
}