BioSCape Data Access

The Bioscape data has undergone reprocessing, and Version 2 is now available. This data is stored in an S3 bucket associated with the SMCE environment. You can access the data through various methods:

1. Intake Catalog

The simplest and fastest method of access is through the BioSCape intake catalog. This method offers the quickest read times, with entire scenes being loaded in around 20 to 40 seconds. The catalog is optimized with reference files using the virtualizarr library, which significantly enhances read performance. You can access reflectance, radiance, and observation (obs) data through this method.

Support for additional datasets, such as PRISM or LLIS data, is under development

Dependencies

Intake is currently undergoing significant changes. To ensure compatibility, please pin the following versions in your conda environment:

  • intake=2.0.7

  • intake-xarray=2.0.0

  • xarray=2024.11.0

  • zarr=2.18.4

  • fsspec=2024.12.0

  • dask=2024.12.1

  • s3fs=2024.12.0

[1]:
import intake
import xarray as xr
import numpy as np

# Load the catalog
catalog = intake.open_catalog('s3://bioscape-data/bioscape_avng_v2.yaml')

# Each flightline is divided into smaller scenes. Each scene has a reflectance, radiance and observation file associated with it
data = [catalog.ang20231022t092801.ang20231022t092801_005_RFL, catalog.ang20231022t092801.ang20231022t092801_005_RDN, catalog.ang20231022t092801.ang20231022t092801_005_OBS]

# Use read_chunked() or to_dask() to access the data via xarray. Other methods might load the entire scene into memory
# The crs should already be encoded and can be accessed via ds.rio.crs
ds = data[0].read_chunked()
ds
[1]:
<xarray.Dataset> Size: 3GB
Dimensions:                    (y: 1795, x: 906, wavelength: 425)
Coordinates:
    transverse_mercator        object 8B ...
  * wavelength                 (wavelength) float32 2kB 377.2 ... 2.501e+03
  * x                          (x) float64 7kB 2.992e+05 2.992e+05 ... 3.049e+05
  * y                          (y) float64 14kB 6.297e+06 ... 6.286e+06
Data variables:
    aerosol_optical_thickness  (y, x) float32 7MB dask.array<chunksize=(256, 256), meta=np.ndarray>
    fwhm                       (wavelength) float32 2kB dask.array<chunksize=(425,), meta=np.ndarray>
    reflectance                (wavelength, y, x) float32 3GB dask.array<chunksize=(10, 256, 256), meta=np.ndarray>
    water_vapor                (y, x) float32 7MB dask.array<chunksize=(256, 256), meta=np.ndarray>
Attributes: (12/23)
    Conventions:                       CF-1.6
    creator_name:                      Jet Propulsion Laboratory/California I...
    creator_url:                       aviris.jpl.nasa.gov
    date_created:                      2024-11-25T19:57:23Z
    flight_line:                       ang20231022t092801
    identifier_product_doi_authority:  https://doi.org
    ...                                ...
    sensor:                            Airborne Visible / Infrared Imaging Sp...
    software_build_version:            002
    summary:                           The Airborne Visible / Infrared Imagin...
    time_coverage_end:                 2023-10-22T09:33:34Z
    time_coverage_start:               2023-10-22T09:33:34Z
    title:                             AVIRIS-NG L2A Surface reflectance (fli...
[2]:
%%time
# read the data into memory
ds.reflectance.compute()
CPU times: user 20.8 s, sys: 3.04 s, total: 23.8 s
Wall time: 26.8 s
[2]:
<xarray.DataArray 'reflectance' (wavelength: 425, y: 1795, x: 906)> Size: 3GB
array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
...
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        ...,
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]],
      shape=(425, 1795, 906), dtype=float32)
Coordinates:
    transverse_mercator  object 8B '0.0'
  * wavelength           (wavelength) float32 2kB 377.2 382.2 ... 2.501e+03
  * x                    (x) float64 7kB 2.992e+05 2.992e+05 ... 3.049e+05
  * y                    (y) float64 14kB 6.297e+06 6.297e+06 ... 6.286e+06
Attributes:
    _QuantizeBitGroomNumberOfSignificantDigits:  5
    long_name:                                   Surface hemispherical direct...
    orthorectified:                              True

The observation and radiance data are not orthorectified, but the GLT tables are provided. The following code can be used to generate an orthorectified data array, if required.

[5]:
%%time
# 2D example

# OBS data
ds = data[2].read_chunked()

# This assumes nan is the nodata value
line = ds.line.where(~ds.line.isnull(), -9999).astype(int)
sample = ds.sample.where(~ds.sample.isnull(), -9999).astype(int)

# Generate a nodata mask
mask = (line != -9999) & (sample != -9999)

# The tables have negative values where a nearest neighbor value was inserted, We need to switch these to posivitive in order to perform the broadcasting operation
line = xr.where((line < 0) & (line != -9999), -line, line)
sample = xr.where((sample < 0) & (sample != -9999), -sample, sample)

valid_glt = mask.values

# Create an output dataset, since this is a 2D array the shape of line or sample will be the shape of our output
out_ds = np.zeros((line.shape[0], line.shape[1]), dtype=np.float32)  + np.nan

# load the data we want to orthorectify into memory
ds_array = ds.elev.values

# Peform the broadcasting operation. The larger chunk_size is the faster the operation will go, but more memory will be required.
chunk_size = 500
for x in range(0, valid_glt.shape[0], chunk_size):
    x = slice(x, min(x + chunk_size, valid_glt.shape[0]))
    y =  valid_glt[x,:]
    out_ds[x][y] = ds_array[line.values[x][y] -1, sample.values[x][y] -1]


# Use the outputs to create an Xarray dataset
coords = {
    'northing': ds.northing,
    'easting': ds.easting,
    'transverse_mercator': ds.transverse_mercator
}

data_vars = {
    'elev': (['northing', 'easting'], out_ds),
}

ds_out = xr.Dataset(data_vars, coords=coords, attrs=ds.attrs)
ds_out.rio.write_crs(ds_out.transverse_mercator.attrs['crs_wkt'], inplace=True)
ds_out
CPU times: user 1.37 s, sys: 151 ms, total: 1.52 s
Wall time: 3.04 s
[5]:
<xarray.Dataset> Size: 7MB
Dimensions:              (northing: 1795, easting: 906)
Coordinates:
  * northing             (northing) float64 14kB 6.297e+06 ... 6.286e+06
    transverse_mercator  object 8B ...
  * easting              (easting) float64 7kB 2.992e+05 2.992e+05 ... 3.049e+05
    spatial_ref          int64 8B 0
Data variables:
    elev                 (northing, easting) float32 7MB nan nan nan ... nan nan
Attributes: (12/23)
    Conventions:                       CF-1.6
    creator_name:                      Jet Propulsion Laboratory/California I...
    creator_url:                       aviris.jpl.nasa.gov
    date_created:                      2024-11-25T19:33:58Z
    flight_line:                       ang20231022t092801
    identifier_product_doi_authority:  https://doi.org
    ...                                ...
    sensor:                            Airborne Visible / Infrared Imaging Sp...
    software_build_version:            002
    summary:                           The Airborne Visible / Infrared Imagin...
    time_coverage_end:                 2023-10-22T09:34:38Z
    time_coverage_start:               2023-10-22T09:33:34Z
    title:                             AVIRIS-NG L1B Observation Parameters (...
[30]:
%%time
# 3D example

# Radiance data
ds = data[1].read_chunked()

# This assumes nan is the nodata value
line = ds.line.where(~ds.line.isnull(), -9999).astype(int)
sample = ds.sample.where(~ds.sample.isnull(), -9999).astype(int)

# Generate a nodata mask
mask = (line != -9999) & (sample != -9999)

# The tables have negative values where a nearest neighbor value was inserted, We need to switch these to posivitive in order to perform the broadcasting operation
line = xr.where((line < 0) & (line != -9999), -line, line)
sample = xr.where((sample < 0) & (sample != -9999), -sample, sample)

valid_glt = mask.values

# Create an output dataset, since this is for radiance, we want 425 bands and then the shape of line or sample
out_ds = np.zeros((425, line.shape[0], line.shape[1]), dtype=np.float32)  + np.nan

# load the data we want to orthorectify into memory
ds_array = ds.radiance.values

# Peform the broadcasting operation. The larger chunk_size is the faster the oepration will go, but more memory will be required.
chunk_size = 500
for x in range(0, valid_glt.shape[0], chunk_size):
    x = slice(x, min(x + chunk_size, valid_glt.shape[0]))
    y =  valid_glt[x,:]
    valid_y = np.where(valid_glt[slice(0, 10), :])[1]
    out_ds[:, x][:, y] = ds_array[:, line.values[x][y] -1, sample.values[x][y] -1]


# Use the outputs to create an Xarray dataset
coords = {
    'wavelength': ds.wavelength,
    'northing': ds.northing,
    'easting': ds.easting,
    'transverse_mercator': ds.transverse_mercator
}

data_vars = {
    'radiance': (['wavelength', 'northing', 'easting'], out_ds),
    'fwhm': ds.fwhm
}

ds_out = xr.Dataset(data_vars, coords=coords, attrs=ds.attrs)
ds_out.rio.write_crs(ds_out.transverse_mercator.attrs['crs_wkt'], inplace=True)
ds_out
CPU times: user 9.92 s, sys: 2.58 s, total: 12.5 s
Wall time: 16.7 s
[30]:
<xarray.Dataset> Size: 3GB
Dimensions:              (wavelength: 425, northing: 1795, easting: 906)
Coordinates:
    transverse_mercator  object 8B ...
  * wavelength           (wavelength) float32 2kB 377.2 382.2 ... 2.501e+03
  * northing             (northing) float64 14kB 6.297e+06 ... 6.286e+06
  * easting              (easting) float64 7kB 2.992e+05 2.992e+05 ... 3.049e+05
    spatial_ref          int64 8B 0
Data variables:
    radiance             (wavelength, northing, easting) float32 3GB nan ... nan
    fwhm                 (wavelength) float32 2kB dask.array<chunksize=(425,), meta=np.ndarray>
Attributes: (12/23)
    Conventions:                       CF-1.6
    creator_name:                      Jet Propulsion Laboratory/California I...
    creator_url:                       aviris.jpl.nasa.gov
    date_created:                      2024-11-25T19:40:38Z
    flight_line:                       ang20231022t092801
    identifier_product_doi_authority:  https://doi.org
    ...                                ...
    sensor:                            Airborne Visible / Infrared Imaging Sp...
    software_build_version:            002
    summary:                           The Airborne Visible / Infrared Imagin...
    time_coverage_end:                 2023-10-22T09:34:38Z
    time_coverage_start:               2023-10-22T09:33:34Z
    title:                             AVIRIS-NG L1B Calibrated Radiance (fli...

2. BioSCape Cropping Web Application

For reflectance data only

Users can perform the following actions with BioSCAPE or EMIT data:

  1. Submit a GeoJSON: This request returns the overlapping flightlines.

  2. Retrieve Data Cropped: This request returns cropped data in NetCDF format. Provide a flightline, subsection number, a GeoJSON, and an output file name.

Check it out at crop.bioscape.io.

Note: A BioSCape SMCE username and password are required.

This application is in beta phase. The current user interface is basic and will be improved. Please report any issues via GitHub or Slack.

For more detailed information, visit the User Guide

3. BioSCape Tools Python Library

For reflectance data only

The BioSCape Tools library allows users to perform the following actions with BioSCAPE or EMIT data:

  1. Submit a GeoJSON: This request returns the overlapping flightlines.

  2. Retrieve Data Cropped: This request returns cropped data in NetCDF format. Provide a flightline, subsection number, a GeoJSON, and an output file name.

The BioSCape Tools library can be used outside of the SMCE. A BioSCape SMCE username and password are required.

Installation

The library can be installed via pip:

pip install bioscape_tools

It can also be installed via the Conda Store:

  1. Select and edit your desired environment.

  2. Choose YAML editing mode.

  3. Add the following lines:

- pip
  - bioscape-tools
  1. Build Note: It will not show up in the Conda Store UI, but it will still be installed

Please report any bugs via github issues or via Slack

[27]:
from bioscape_tools import Bioscape, Emit

OUTPATH = 'test.nc'
GEOJSON_FILE = "path_to_your_geojson"

Use your BioSCape SMCE username and password to get credentials.

[7]:
b = Bioscape(persist=True)

Find overlapping data.

[8]:
flightlines = b.get_overlap(GEOJSON_FILE)
flightlines
[8]:
geometry flightline subsection
0 POLYGON ((18.75585 -32.97929, 18.75674 -32.944... ang20231022t092801 000
1 POLYGON ((18.78096 -33.00205, 18.78218 -32.953... ang20231022t094938 035
2 POLYGON ((18.77505 -32.96264, 18.77627 -32.913... ang20231022t094938 036
3 POLYGON ((18.71476 -32.98757, 18.71623 -32.930... ang20231029t120919 045
4 POLYGON ((18.73772 -32.9587, 18.73861 -32.9237... ang20231029t123011 001
5 POLYGON ((18.74498 -32.98879, 18.74588 -32.953... ang20231029t123011 002

Crop and retrieve the data.

[5]:
bioscape_data = b.crop_flightline(flightline="ang20231022t092801", subsection=000, geojson=GEOJSON_FILE, output_path=None, mask_and_scale=True)
bioscape_data
[5]:
<xarray.Dataset> Size: 500MB
Dimensions:      (wavelength: 425, x: 479, y: 307)
Coordinates:
  * wavelength   (wavelength) float32 2kB 377.2 382.2 ... 2.496e+03 2.501e+03
  * x            (x) float64 4kB 2.909e+05 2.909e+05 ... 2.939e+05 2.939e+05
  * y            (y) float64 2kB 6.352e+06 6.352e+06 ... 6.35e+06 6.35e+06
    spatial_ref  int64 8B ...
Data variables:
    reflectance  (y, wavelength, x) float32 250MB ...
    uncertainty  (y, wavelength, x) float32 250MB ...
Attributes: (12/19)
    description:          L2A Analytyical per-pixel surface retrieval
    samples:              719
    lines:                615
    bands:                425
    header offset:        0
    file type:            ENVI Standard
    ...                   ...
    band names:           ['channel_0', 'channel_1', 'channel_2', 'channel_3'...
    masked pixel noise:   2.753511428833008
    ang pge input files:  bad_element_file=/scratch/achlus/airborne_sds/ang_l...
    ang pge run command:  /scratch/achlus/airborne_sds/ang_l1b_radiance/emit-...
    bbl:                  ['0', '1', '1', '1', '1', '1', '1', '1', '1', '1', ...
    data ignore value:    -9999

Optionally, you can download the data by providing an output path.

[9]:
b.crop_flightline(flightline="ang20231022t092801", subsection=000, geojson=GEOJSON_FILE, output_path=OUTPATH, mask_and_scale=True)

The same operations can be preformed on EMIT data.

[ ]:
e = Emit(persist=True)
[19]:
emit_data = e.get_overlap(GEOJSON_FILE, temporal_range=("2024-01-01", "2024-10-01"), cloud_cover=(0,10))
emit_data[0]
[19]:
[20]:
e.crop_scene(geojson=GEOJSON_FILE, granule_ur=emit_data[0].granule_ur, out_path=None, mask_and_scale=True)
[20]:
<xarray.Dataset> Size: 2MB
Dimensions:           (latitude: 33, longitude: 61, wavelengths: 285)
Coordinates:
  * wavelengths       (wavelengths) float32 1kB 381.0 388.4 ... 2.493e+03
    fwhm              (wavelengths) float32 1kB ...
    good_wavelengths  (wavelengths) float64 2kB ...
  * latitude          (latitude) float64 264B -32.95 -32.95 ... -32.97 -32.97
  * longitude         (longitude) float64 488B 18.76 18.76 18.76 ... 18.79 18.8
    elev              (latitude, longitude) float32 8kB ...
    spatial_ref       int64 8B ...
Data variables:
    reflectance       (latitude, longitude, wavelengths) float32 2MB ...
Attributes: (12/41)
    ncei_template_version:             NCEI_NetCDF_Swath_Template_v2.0
    summary:                           The Earth Surface Mineral Dust Source ...
    keywords:                          Imaging Spectroscopy, minerals, EMIT, ...
    Conventions:                       CF-1.63
    sensor:                            EMIT (Earth Surface Mineral Dust Sourc...
    instrument:                        EMIT
    ...                                ...
    geotransform:                      [ 1.87625957e+01  5.42232520e-04  0.00...
    day_night_flag:                    Day
    title:                             EMIT L2A Estimated Surface Reflectance...
    granule_id:                        EMIT_L2A_RFL_001_20240223T132851_24054...
    subset_downtrack_range:            [840 877]
    subset_crosstrack_range:           [414 450]
[23]:
OUTPATH = 'test.nc'
e.crop_scene(geojson=GEOJSON_FILE, granule_ur=emit_data[0].granule_ur, output_path=OUTPATH, mask_and_scale=True)

4. Direct S3 Access

All data can be accessed via direct S3 access, but please note that read times may be significantly slower, especially for larger files. Therefore, this method is not recommended for working with large datasets.

[1]:
import rioxarray as rxr
import os
import s3fs
[2]:
s3 = s3fs.S3FileSystem(anon=False)
files = s3.ls('bioscape-data/')
files
[2]:
['bioscape-data/AVNG',
 'bioscape-data/AVNG_V2',
 'bioscape-data/BioSCapeVegPolys2023_10_18',
 'bioscape-data/BioSCapeVegPolys2023_10_18.geoparquet',
 'bioscape-data/LVIS',
 'bioscape-data/PRISM',
 'bioscape-data/bioscape_avng.yaml',
 'bioscape-data/bioscape_avng_v2.yaml']
[15]:
rxr.open_rasterio(os.path.join('s3://', 'bioscape-data/PRISM/L2/prm20231022t141344_rfl_ort'))
[15]:
<xarray.DataArray (band: 246, y: 6449, x: 918)> Size: 6GB
[1456364772 values with dtype=float32]
Coordinates:
    wavelength   (band) float64 2kB 350.6 353.4 356.2 ... 1.043e+03 1.046e+03
    fwhm         (band) float64 2kB 3.332 3.332 3.332 ... 3.327 3.314 3.326
  * band         (band) int64 2kB 1 2 3 4 5 6 7 ... 240 241 242 243 244 245 246
    xc           (y, x) float64 47MB 3.306e+05 3.306e+05 ... 3.261e+05 3.261e+05
    yc           (y, x) float64 47MB 6.24e+06 6.24e+06 ... 6.208e+06 6.208e+06
    spatial_ref  int64 8B 0
Dimensions without coordinates: y, x
Attributes: (12/263)
    wavelength_units:   Nanometers
    Band_1:             350.5548293 Nanometers
    Band_2:             353.3850859 Nanometers
    Band_3:             356.21539889999997 Nanometers
    Band_4:             359.045768 Nanometers
    Band_5:             361.8761934 Nanometers
    ...                 ...
    file_type:          ENVI
    data_type:          4
    interleave:         bil
    byte_order:         0
    smoothing_factors:   1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 ...
    data_ignore_value:  -9999
[ ]: