BioSCape Data Access
The Bioscape data has undergone reprocessing, and Version 2 is now available. This data is stored in an S3 bucket associated with the SMCE environment. You can access the data through various methods:
1. Intake Catalog
The simplest and fastest method of access is through the BioSCape intake catalog. This method offers the quickest read times, with entire scenes being loaded in around 20 to 40 seconds. The catalog is optimized with reference files using the virtualizarr library, which significantly enhances read performance. You can access reflectance, radiance, and observation (obs) data through this method.
Support for additional datasets, such as PRISM or LLIS data, is under development
Dependencies
Intake is currently undergoing significant changes. To ensure compatibility, please pin the following versions in your conda environment:
intake=2.0.7intake-xarray=2.0.0xarray=2024.11.0zarr=2.18.4fsspec=2024.12.0dask=2024.12.1s3fs=2024.12.0
[1]:
import intake
import xarray as xr
import numpy as np
# Load the catalog
catalog = intake.open_catalog('s3://bioscape-data/bioscape_avng_v2.yaml')
# Each flightline is divided into smaller scenes. Each scene has a reflectance, radiance and observation file associated with it
data = [catalog.ang20231022t092801.ang20231022t092801_005_RFL, catalog.ang20231022t092801.ang20231022t092801_005_RDN, catalog.ang20231022t092801.ang20231022t092801_005_OBS]
# Use read_chunked() or to_dask() to access the data via xarray. Other methods might load the entire scene into memory
# The crs should already be encoded and can be accessed via ds.rio.crs
ds = data[0].read_chunked()
ds
[1]:
<xarray.Dataset> Size: 3GB
Dimensions: (y: 1795, x: 906, wavelength: 425)
Coordinates:
transverse_mercator object 8B ...
* wavelength (wavelength) float32 2kB 377.2 ... 2.501e+03
* x (x) float64 7kB 2.992e+05 2.992e+05 ... 3.049e+05
* y (y) float64 14kB 6.297e+06 ... 6.286e+06
Data variables:
aerosol_optical_thickness (y, x) float32 7MB dask.array<chunksize=(256, 256), meta=np.ndarray>
fwhm (wavelength) float32 2kB dask.array<chunksize=(425,), meta=np.ndarray>
reflectance (wavelength, y, x) float32 3GB dask.array<chunksize=(10, 256, 256), meta=np.ndarray>
water_vapor (y, x) float32 7MB dask.array<chunksize=(256, 256), meta=np.ndarray>
Attributes: (12/23)
Conventions: CF-1.6
creator_name: Jet Propulsion Laboratory/California I...
creator_url: aviris.jpl.nasa.gov
date_created: 2024-11-25T19:57:23Z
flight_line: ang20231022t092801
identifier_product_doi_authority: https://doi.org
... ...
sensor: Airborne Visible / Infrared Imaging Sp...
software_build_version: 002
summary: The Airborne Visible / Infrared Imagin...
time_coverage_end: 2023-10-22T09:33:34Z
time_coverage_start: 2023-10-22T09:33:34Z
title: AVIRIS-NG L2A Surface reflectance (fli...[2]:
%%time
# read the data into memory
ds.reflectance.compute()
CPU times: user 20.8 s, sys: 3.04 s, total: 23.8 s
Wall time: 26.8 s
[2]:
<xarray.DataArray 'reflectance' (wavelength: 425, y: 1795, x: 906)> Size: 3GB
array([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
...
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
...,
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]],
shape=(425, 1795, 906), dtype=float32)
Coordinates:
transverse_mercator object 8B '0.0'
* wavelength (wavelength) float32 2kB 377.2 382.2 ... 2.501e+03
* x (x) float64 7kB 2.992e+05 2.992e+05 ... 3.049e+05
* y (y) float64 14kB 6.297e+06 6.297e+06 ... 6.286e+06
Attributes:
_QuantizeBitGroomNumberOfSignificantDigits: 5
long_name: Surface hemispherical direct...
orthorectified: TrueThe observation and radiance data are not orthorectified, but the GLT tables are provided. The following code can be used to generate an orthorectified data array, if required.
[5]:
%%time
# 2D example
# OBS data
ds = data[2].read_chunked()
# This assumes nan is the nodata value
line = ds.line.where(~ds.line.isnull(), -9999).astype(int)
sample = ds.sample.where(~ds.sample.isnull(), -9999).astype(int)
# Generate a nodata mask
mask = (line != -9999) & (sample != -9999)
# The tables have negative values where a nearest neighbor value was inserted, We need to switch these to posivitive in order to perform the broadcasting operation
line = xr.where((line < 0) & (line != -9999), -line, line)
sample = xr.where((sample < 0) & (sample != -9999), -sample, sample)
valid_glt = mask.values
# Create an output dataset, since this is a 2D array the shape of line or sample will be the shape of our output
out_ds = np.zeros((line.shape[0], line.shape[1]), dtype=np.float32) + np.nan
# load the data we want to orthorectify into memory
ds_array = ds.elev.values
# Peform the broadcasting operation. The larger chunk_size is the faster the operation will go, but more memory will be required.
chunk_size = 500
for x in range(0, valid_glt.shape[0], chunk_size):
x = slice(x, min(x + chunk_size, valid_glt.shape[0]))
y = valid_glt[x,:]
out_ds[x][y] = ds_array[line.values[x][y] -1, sample.values[x][y] -1]
# Use the outputs to create an Xarray dataset
coords = {
'northing': ds.northing,
'easting': ds.easting,
'transverse_mercator': ds.transverse_mercator
}
data_vars = {
'elev': (['northing', 'easting'], out_ds),
}
ds_out = xr.Dataset(data_vars, coords=coords, attrs=ds.attrs)
ds_out.rio.write_crs(ds_out.transverse_mercator.attrs['crs_wkt'], inplace=True)
ds_out
CPU times: user 1.37 s, sys: 151 ms, total: 1.52 s
Wall time: 3.04 s
[5]:
<xarray.Dataset> Size: 7MB
Dimensions: (northing: 1795, easting: 906)
Coordinates:
* northing (northing) float64 14kB 6.297e+06 ... 6.286e+06
transverse_mercator object 8B ...
* easting (easting) float64 7kB 2.992e+05 2.992e+05 ... 3.049e+05
spatial_ref int64 8B 0
Data variables:
elev (northing, easting) float32 7MB nan nan nan ... nan nan
Attributes: (12/23)
Conventions: CF-1.6
creator_name: Jet Propulsion Laboratory/California I...
creator_url: aviris.jpl.nasa.gov
date_created: 2024-11-25T19:33:58Z
flight_line: ang20231022t092801
identifier_product_doi_authority: https://doi.org
... ...
sensor: Airborne Visible / Infrared Imaging Sp...
software_build_version: 002
summary: The Airborne Visible / Infrared Imagin...
time_coverage_end: 2023-10-22T09:34:38Z
time_coverage_start: 2023-10-22T09:33:34Z
title: AVIRIS-NG L1B Observation Parameters (...[30]:
%%time
# 3D example
# Radiance data
ds = data[1].read_chunked()
# This assumes nan is the nodata value
line = ds.line.where(~ds.line.isnull(), -9999).astype(int)
sample = ds.sample.where(~ds.sample.isnull(), -9999).astype(int)
# Generate a nodata mask
mask = (line != -9999) & (sample != -9999)
# The tables have negative values where a nearest neighbor value was inserted, We need to switch these to posivitive in order to perform the broadcasting operation
line = xr.where((line < 0) & (line != -9999), -line, line)
sample = xr.where((sample < 0) & (sample != -9999), -sample, sample)
valid_glt = mask.values
# Create an output dataset, since this is for radiance, we want 425 bands and then the shape of line or sample
out_ds = np.zeros((425, line.shape[0], line.shape[1]), dtype=np.float32) + np.nan
# load the data we want to orthorectify into memory
ds_array = ds.radiance.values
# Peform the broadcasting operation. The larger chunk_size is the faster the oepration will go, but more memory will be required.
chunk_size = 500
for x in range(0, valid_glt.shape[0], chunk_size):
x = slice(x, min(x + chunk_size, valid_glt.shape[0]))
y = valid_glt[x,:]
valid_y = np.where(valid_glt[slice(0, 10), :])[1]
out_ds[:, x][:, y] = ds_array[:, line.values[x][y] -1, sample.values[x][y] -1]
# Use the outputs to create an Xarray dataset
coords = {
'wavelength': ds.wavelength,
'northing': ds.northing,
'easting': ds.easting,
'transverse_mercator': ds.transverse_mercator
}
data_vars = {
'radiance': (['wavelength', 'northing', 'easting'], out_ds),
'fwhm': ds.fwhm
}
ds_out = xr.Dataset(data_vars, coords=coords, attrs=ds.attrs)
ds_out.rio.write_crs(ds_out.transverse_mercator.attrs['crs_wkt'], inplace=True)
ds_out
CPU times: user 9.92 s, sys: 2.58 s, total: 12.5 s
Wall time: 16.7 s
[30]:
<xarray.Dataset> Size: 3GB
Dimensions: (wavelength: 425, northing: 1795, easting: 906)
Coordinates:
transverse_mercator object 8B ...
* wavelength (wavelength) float32 2kB 377.2 382.2 ... 2.501e+03
* northing (northing) float64 14kB 6.297e+06 ... 6.286e+06
* easting (easting) float64 7kB 2.992e+05 2.992e+05 ... 3.049e+05
spatial_ref int64 8B 0
Data variables:
radiance (wavelength, northing, easting) float32 3GB nan ... nan
fwhm (wavelength) float32 2kB dask.array<chunksize=(425,), meta=np.ndarray>
Attributes: (12/23)
Conventions: CF-1.6
creator_name: Jet Propulsion Laboratory/California I...
creator_url: aviris.jpl.nasa.gov
date_created: 2024-11-25T19:40:38Z
flight_line: ang20231022t092801
identifier_product_doi_authority: https://doi.org
... ...
sensor: Airborne Visible / Infrared Imaging Sp...
software_build_version: 002
summary: The Airborne Visible / Infrared Imagin...
time_coverage_end: 2023-10-22T09:34:38Z
time_coverage_start: 2023-10-22T09:33:34Z
title: AVIRIS-NG L1B Calibrated Radiance (fli...2. BioSCape Cropping Web Application
For reflectance data only
Users can perform the following actions with BioSCAPE or EMIT data:
Submit a GeoJSON: This request returns the overlapping flightlines.
Retrieve Data Cropped: This request returns cropped data in NetCDF format. Provide a flightline, subsection number, a GeoJSON, and an output file name.
Check it out at crop.bioscape.io.
Note: A BioSCape SMCE username and password are required.
This application is in beta phase. The current user interface is basic and will be improved. Please report any issues via GitHub or Slack.
For more detailed information, visit the User Guide
3. BioSCape Tools Python Library
For reflectance data only
The BioSCape Tools library allows users to perform the following actions with BioSCAPE or EMIT data:
Submit a GeoJSON: This request returns the overlapping flightlines.
Retrieve Data Cropped: This request returns cropped data in NetCDF format. Provide a flightline, subsection number, a GeoJSON, and an output file name.
The BioSCape Tools library can be used outside of the SMCE. A BioSCape SMCE username and password are required.
Installation
The library can be installed via pip:
pip install bioscape_tools
It can also be installed via the Conda Store:
Select and edit your desired environment.
Choose YAML editing mode.
Add the following lines:
- pip
- bioscape-tools
Build Note: It will not show up in the Conda Store UI, but it will still be installed
Please report any bugs via github issues or via Slack
[27]:
from bioscape_tools import Bioscape, Emit
OUTPATH = 'test.nc'
GEOJSON_FILE = "path_to_your_geojson"
Use your BioSCape SMCE username and password to get credentials.
[7]:
b = Bioscape(persist=True)
Find overlapping data.
[8]:
flightlines = b.get_overlap(GEOJSON_FILE)
flightlines
[8]:
| geometry | flightline | subsection | |
|---|---|---|---|
| 0 | POLYGON ((18.75585 -32.97929, 18.75674 -32.944... | ang20231022t092801 | 000 |
| 1 | POLYGON ((18.78096 -33.00205, 18.78218 -32.953... | ang20231022t094938 | 035 |
| 2 | POLYGON ((18.77505 -32.96264, 18.77627 -32.913... | ang20231022t094938 | 036 |
| 3 | POLYGON ((18.71476 -32.98757, 18.71623 -32.930... | ang20231029t120919 | 045 |
| 4 | POLYGON ((18.73772 -32.9587, 18.73861 -32.9237... | ang20231029t123011 | 001 |
| 5 | POLYGON ((18.74498 -32.98879, 18.74588 -32.953... | ang20231029t123011 | 002 |
Crop and retrieve the data.
[5]:
bioscape_data = b.crop_flightline(flightline="ang20231022t092801", subsection=000, geojson=GEOJSON_FILE, output_path=None, mask_and_scale=True)
bioscape_data
[5]:
<xarray.Dataset> Size: 500MB
Dimensions: (wavelength: 425, x: 479, y: 307)
Coordinates:
* wavelength (wavelength) float32 2kB 377.2 382.2 ... 2.496e+03 2.501e+03
* x (x) float64 4kB 2.909e+05 2.909e+05 ... 2.939e+05 2.939e+05
* y (y) float64 2kB 6.352e+06 6.352e+06 ... 6.35e+06 6.35e+06
spatial_ref int64 8B ...
Data variables:
reflectance (y, wavelength, x) float32 250MB ...
uncertainty (y, wavelength, x) float32 250MB ...
Attributes: (12/19)
description: L2A Analytyical per-pixel surface retrieval
samples: 719
lines: 615
bands: 425
header offset: 0
file type: ENVI Standard
... ...
band names: ['channel_0', 'channel_1', 'channel_2', 'channel_3'...
masked pixel noise: 2.753511428833008
ang pge input files: bad_element_file=/scratch/achlus/airborne_sds/ang_l...
ang pge run command: /scratch/achlus/airborne_sds/ang_l1b_radiance/emit-...
bbl: ['0', '1', '1', '1', '1', '1', '1', '1', '1', '1', ...
data ignore value: -9999Optionally, you can download the data by providing an output path.
[9]:
b.crop_flightline(flightline="ang20231022t092801", subsection=000, geojson=GEOJSON_FILE, output_path=OUTPATH, mask_and_scale=True)
The same operations can be preformed on EMIT data.
[ ]:
e = Emit(persist=True)
[19]:
emit_data = e.get_overlap(GEOJSON_FILE, temporal_range=("2024-01-01", "2024-10-01"), cloud_cover=(0,10))
emit_data[0]
[19]:
Data: EMIT_L2A_RFL_001_20240223T132851_2405409_006.ncEMIT_L2A_RFLUNCERT_001_20240223T132851_2405409_006.ncEMIT_L2A_MASK_001_20240223T132851_2405409_006.nc
Size: 3580.37 MB
Cloud Hosted: False
[20]:
e.crop_scene(geojson=GEOJSON_FILE, granule_ur=emit_data[0].granule_ur, out_path=None, mask_and_scale=True)
[20]:
<xarray.Dataset> Size: 2MB
Dimensions: (latitude: 33, longitude: 61, wavelengths: 285)
Coordinates:
* wavelengths (wavelengths) float32 1kB 381.0 388.4 ... 2.493e+03
fwhm (wavelengths) float32 1kB ...
good_wavelengths (wavelengths) float64 2kB ...
* latitude (latitude) float64 264B -32.95 -32.95 ... -32.97 -32.97
* longitude (longitude) float64 488B 18.76 18.76 18.76 ... 18.79 18.8
elev (latitude, longitude) float32 8kB ...
spatial_ref int64 8B ...
Data variables:
reflectance (latitude, longitude, wavelengths) float32 2MB ...
Attributes: (12/41)
ncei_template_version: NCEI_NetCDF_Swath_Template_v2.0
summary: The Earth Surface Mineral Dust Source ...
keywords: Imaging Spectroscopy, minerals, EMIT, ...
Conventions: CF-1.63
sensor: EMIT (Earth Surface Mineral Dust Sourc...
instrument: EMIT
... ...
geotransform: [ 1.87625957e+01 5.42232520e-04 0.00...
day_night_flag: Day
title: EMIT L2A Estimated Surface Reflectance...
granule_id: EMIT_L2A_RFL_001_20240223T132851_24054...
subset_downtrack_range: [840 877]
subset_crosstrack_range: [414 450][23]:
OUTPATH = 'test.nc'
e.crop_scene(geojson=GEOJSON_FILE, granule_ur=emit_data[0].granule_ur, output_path=OUTPATH, mask_and_scale=True)
4. Direct S3 Access
All data can be accessed via direct S3 access, but please note that read times may be significantly slower, especially for larger files. Therefore, this method is not recommended for working with large datasets.
[1]:
import rioxarray as rxr
import os
import s3fs
[2]:
s3 = s3fs.S3FileSystem(anon=False)
files = s3.ls('bioscape-data/')
files
[2]:
['bioscape-data/AVNG',
'bioscape-data/AVNG_V2',
'bioscape-data/BioSCapeVegPolys2023_10_18',
'bioscape-data/BioSCapeVegPolys2023_10_18.geoparquet',
'bioscape-data/LVIS',
'bioscape-data/PRISM',
'bioscape-data/bioscape_avng.yaml',
'bioscape-data/bioscape_avng_v2.yaml']
[15]:
rxr.open_rasterio(os.path.join('s3://', 'bioscape-data/PRISM/L2/prm20231022t141344_rfl_ort'))
[15]:
<xarray.DataArray (band: 246, y: 6449, x: 918)> Size: 6GB
[1456364772 values with dtype=float32]
Coordinates:
wavelength (band) float64 2kB 350.6 353.4 356.2 ... 1.043e+03 1.046e+03
fwhm (band) float64 2kB 3.332 3.332 3.332 ... 3.327 3.314 3.326
* band (band) int64 2kB 1 2 3 4 5 6 7 ... 240 241 242 243 244 245 246
xc (y, x) float64 47MB 3.306e+05 3.306e+05 ... 3.261e+05 3.261e+05
yc (y, x) float64 47MB 6.24e+06 6.24e+06 ... 6.208e+06 6.208e+06
spatial_ref int64 8B 0
Dimensions without coordinates: y, x
Attributes: (12/263)
wavelength_units: Nanometers
Band_1: 350.5548293 Nanometers
Band_2: 353.3850859 Nanometers
Band_3: 356.21539889999997 Nanometers
Band_4: 359.045768 Nanometers
Band_5: 361.8761934 Nanometers
... ...
file_type: ENVI
data_type: 4
interleave: bil
byte_order: 0
smoothing_factors: 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 , 1.0 ...
data_ignore_value: -9999[ ]:
