Access Sentinel 2 Data on Planetary Computer¶
Setup Instructions¶
This notebook is meant to run on Planetary Computer lab hub.
[1]:
import dask.distributed
import dask.utils
import numpy as np
import planetary_computer as pc
import xarray as xr
from IPython.display import display
from pystac_client import Client
from odc.stac import configure_rio, stac_load
Configuration¶
For now we need to manually supply band dtype
and nodata
information for each band in the collection. Use band named *
as a wildcard.
[2]:
cfg = {
"sentinel-2-l2a": {
"assets": {
"*": {"data_type": "uint16", "nodata": 0},
"SCL": {"data_type": "uint8", "nodata": 0},
"visual": {"data_type": "uint8", "nodata": 0},
},
},
"*": {"warnings": "ignore"},
}
Start Dask Client¶
This step is optional, but it does improve load speed significantly. You don’t have to use Dask, as you can load data directly into memory of the notebook.
[3]:
client = dask.distributed.Client()
configure_rio(cloud_defaults=True, client=client)
display(client)
Client
Client-cf90ae1a-caaa-11ed-8749-6045bde8e932
Connection method: Cluster object | Cluster type: distributed.LocalCluster |
Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
8b47f214
Dashboard: http://127.0.0.1:8787/status | Workers: 2 |
Total threads: 2 | Total memory: 6.78 GiB |
Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-4facecbb-cf08-47c7-a24a-d76f9d185d8a
Comm: tcp://127.0.0.1:40711 | Workers: 2 |
Dashboard: http://127.0.0.1:8787/status | Total threads: 2 |
Started: Just now | Total memory: 6.78 GiB |
Workers
Worker: 0
Comm: tcp://127.0.0.1:43367 | Total threads: 1 |
Dashboard: http://127.0.0.1:43929/status | Memory: 3.39 GiB |
Nanny: tcp://127.0.0.1:41255 | |
Local directory: /tmp/dask-worker-space/worker-5axotzwj |
Worker: 1
Comm: tcp://127.0.0.1:44799 | Total threads: 1 |
Dashboard: http://127.0.0.1:41593/status | Memory: 3.39 GiB |
Nanny: tcp://127.0.0.1:35873 | |
Local directory: /tmp/dask-worker-space/worker-_rdgycma |
Query STAC API¶
Here we are looking for datasets in sentinel-2-l2a
collection from June 2019 over MGRS tile 06VVN
.
[4]:
catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
query = catalog.search(
collections=["sentinel-2-l2a"],
datetime="2019-06",
query={"s2:mgrs_tile": dict(eq="06VVN")},
)
items = list(query.get_items())
print(f"Found: {len(items):d} datasets")
Found: 23 datasets
Lazy load all the bands¶
We won’t use all the bands but it doesn’t matter as bands that we won’t use won’t be loaded. We are “loading” data with Dask, which means that at this point no reads will be happening just yet.
If you were to skip warnings: ignore
in the configuration, you’ll see a warning about rededge
common name being used on several bands. Basically we can only work with common names that uniquely identify some band. In this case EO extension defines common name rededge
for bands 5, 6 and 7.
[5]:
resolution = 10
SHRINK = 4
if client.cluster.workers[0].memory_limit < dask.utils.parse_bytes("4G"):
SHRINK = 8 # running on Binder with 2Gb RAM
if SHRINK > 1:
resolution = resolution * SHRINK
xx = stac_load(
items,
chunks={"x": 2048, "y": 2048},
stac_cfg=cfg,
patch_url=pc.sign,
resolution=resolution,
)
print(f"Bands: {','.join(list(xx.data_vars))}")
display(xx)
/tmp/binder_env/lib/python3.10/site-packages/distributed/worker_memory.py:492: FutureWarning: The `Nanny.memory_limit` attribute has been moved to `Nanny.memory_manager.memory_limit
warnings.warn(
Bands: AOT,B01,B02,B03,B04,B05,B06,B07,B08,B09,B11,B12,B8A,SCL,WVP,visual
<xarray.Dataset> Dimensions: (y: 1373, x: 1373, time: 23) Coordinates: * y (y) float64 6.8e+06 6.8e+06 6.8e+06 ... 6.69e+06 6.69e+06 * x (x) float64 4e+05 4e+05 4.001e+05 ... 5.096e+05 5.097e+05 spatial_ref int32 32606 * time (time) datetime64[ns] 2019-06-01T21:15:21.024000 ... 2019-06... Data variables: (12/16) AOT (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> B01 (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> B02 (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> B03 (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> B04 (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> B05 (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> ... ... B11 (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> B12 (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> B8A (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> SCL (time, y, x) uint8 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> WVP (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> visual (time, y, x) uint8 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
By default stac_load
will return all the data bands using canonical asset names. But we can also request a subset of bands, by supplying bands=
parameter. When going this route you can also use “common name” to refer to a band.
In this case we request red,green,blue,nir
bands which are common names for bands B04,B03,B02,B08
and SCL
band which is a canonical name.
[6]:
xx = stac_load(
items,
bands=["red", "green", "blue", "nir", "SCL"],
resolution=resolution,
chunks={"x": 2048, "y": 2048},
stac_cfg=cfg,
patch_url=pc.sign,
)
print(f"Bands: {','.join(list(xx.data_vars))}")
display(xx)
Bands: red,green,blue,nir,SCL
<xarray.Dataset> Dimensions: (y: 1373, x: 1373, time: 23) Coordinates: * y (y) float64 6.8e+06 6.8e+06 6.8e+06 ... 6.69e+06 6.69e+06 * x (x) float64 4e+05 4e+05 4.001e+05 ... 5.096e+05 5.097e+05 spatial_ref int32 32606 * time (time) datetime64[ns] 2019-06-01T21:15:21.024000 ... 2019-06... Data variables: red (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> green (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> blue (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> nir (time, y, x) uint16 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray> SCL (time, y, x) uint8 dask.array<chunksize=(1, 1373, 1373), meta=np.ndarray>
Do some math with bands¶
[7]:
def to_float(xx):
_xx = xx.astype("float32")
nodata = _xx.attrs.pop("nodata", None)
if nodata is None:
return _xx
return _xx.where(xx != nodata)
def colorize(xx, colormap):
return xr.DataArray(colormap[xx.data], coords=xx.coords, dims=(*xx.dims, "band"))
[8]:
# like .astype(float32) but taking care of nodata->NaN mapping
nir = to_float(xx.nir)
red = to_float(xx.red)
ndvi = (nir - red) / (
nir + red
) # < This is still a lazy Dask computation (no data loaded yet)
# Get the 5-th time slice `load->compute->plot`
_ = ndvi.isel(time=4).compute().plot.imshow(size=7, aspect=1.2, interpolation="bicubic")

For sample purposes work with first 6 observations only
[9]:
xx = xx.isel(time=np.s_[:6])
[10]:
# fmt: off
scl_colormap = np.array(
[
[255, 0, 255, 255], # 0 - NODATA
[255, 0, 4, 255], # 1 - Saturated or Defective
[0 , 0, 0, 255], # 2 - Dark Areas
[97 , 97, 97, 255], # 3 - Cloud Shadow
[3 , 139, 80, 255], # 4 - Vegetation
[192, 132, 12, 255], # 5 - Bare Ground
[21 , 103, 141, 255], # 6 - Water
[117, 0, 27, 255], # 7 - Unclassified
[208, 208, 208, 255], # 8 - Cloud
[244, 244, 244, 255], # 9 - Definitely Cloud
[195, 231, 240, 255], # 10 - Thin Cloud
[222, 157, 204, 255], # 11 - Snow or Ice
],
dtype="uint8",
)
# fmt: on
# Load SCL band, then convert to RGB using color scheme above
scl_rgba = colorize(xx.SCL.compute(), scl_colormap)
# Check we still have geo-registration
scl_rgba.odc.geobox
[10]:
GeoBox
WKT
BASEGEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]],
CONVERSION["UTM zone 6N",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",0,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",-147,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",0.9996,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",500000,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",0,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Engineering survey, topographic mapping."],
AREA["Between 150°W and 144°W, northern hemisphere between equator and 84°N, onshore and offshore. United States (USA) - Alaska (AK)."],
BBOX[0,-150,84,-144]],
ID["EPSG",32606]]
[11]:
_ = scl_rgba.plot.imshow(col="time", col_wrap=3, size=3, interpolation="antialiased")

Let’s save image dated 2019-06-04 to a cloud optimized geotiff file.
[12]:
to_save = scl_rgba.isel(time=3)
fname = f"SCL-{to_save.time.dt.strftime('%Y%m%d').item()}.tif"
print(f"Saving to: '{fname}'")
Saving to: 'SCL-20190604.tif'
[13]:
scl_rgba.isel(time=3).odc.write_cog(
fname,
overwrite=True,
compress="webp",
webp_quality=90,
)
[13]:
PosixPath('SCL-20190604.tif')
Check the file with rio info
.
[14]:
!ls -lh {fname}
!rio info {fname} | jq .
-rw-r--r-- 1 runner docker 444K Mar 25 01:17 SCL-20190604.tif
{
"blockxsize": 512,
"blockysize": 512,
"bounds": [
399920,
6690240,
509760,
6800080
],
"colorinterp": [
"red",
"green",
"blue",
"alpha"
],
"compress": "webp",
"count": 4,
"crs": "EPSG:32606",
"descriptions": [
null,
null,
null,
null
],
"driver": "GTiff",
"dtype": "uint8",
"height": 1373,
"indexes": [
1,
2,
3,
4
],
"interleave": "pixel",
"lnglat": [
-147.83078743077945,
60.83917440326895
],
"mask_flags": [
[
"per_dataset",
"alpha"
],
[
"per_dataset",
"alpha"
],
[
"per_dataset",
"alpha"
],
[
"all_valid"
]
],
"nodata": null,
"res": [
80,
80
],
"shape": [
1373,
1373
],
"tiled": true,
"transform": [
80,
0,
399920,
0,
-80,
6800080,
0,
0,
1
],
"units": [
null,
null,
null,
null
],
"width": 1373
}