Streaming data from NASA's Earth Surface Minteral Dust Source Investigation (EMIT)¶
This is a proof of concept notebook to demonstrate how earthaccess can facilitate the use of cloud hosted data from NASA using xarray and holoviews. For a formal tutorial on EMIT please visit the official repository where things are explained in detail. EMIT Science Tutorial
Prerequisites
- NASA EDL credentials
- Openscapes Conda environment installed
- For direct access this notebook should run in AWS
IMPORTANT: This notebook should run out of AWS but is not recommended as streaming HDF5 data is slow out of region
from pprint import pprint
import earthaccess
import xarray as xr
print(f"using earthaccess version {earthaccess.__version__}")
auth = earthaccess.login()
using earthaccess version 0.12.0
Searching for the dataset with .search_datasets()
¶
Note: API docs can be found at earthaccess
results = earthaccess.search_datasets(short_name="EMITL2ARFL", cloud_hosted=True)
# Let's print our datasets
for dataset in results:
pprint(dataset.summary())
{'cloud-info': {'Region': 'us-west-2', 'S3BucketAndObjectPrefixNames': ['s3://lp-prod-protected/EMITL2ARFL.001', 's3://lp-prod-public/EMITL2ARFL.001'], 'S3CredentialsAPIDocumentationURL': 'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentialsREADME', 'S3CredentialsAPIEndpoint': 'https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials'}, 'concept-id': 'C2408750690-LPCLOUD', 'file-type': "[{'FormatType': 'Native', 'AverageFileSize': 1.8, 'Format': " "'netCDF-4', 'TotalCollectionFileSizeBeginDate': " "'2022-08-09T00:00:00.000Z', 'FormatDescription': 'Network " "Common Data Format Version 4', 'AverageFileSizeUnit': 'GB', " "'Media': ['Earthdata Cloud', 'HTTPS']}]", 'get-data': ['https://search.earthdata.nasa.gov/search/granules?p=C2408750690-LPCLOUD', 'https://appeears.earthdatacloud.nasa.gov/'], 'short-name': 'EMITL2ARFL', 'version': '001'}
Searching for the data with .search_data()
over Ecuador¶
# ~Ecuador = -82.05,-3.17,-76.94,-0.52
granules = earthaccess.search_data(
short_name="EMITL2ARFL",
bounding_box=(-82.05, -3.17, -76.94, -0.52),
count=10,
)
print(len(granules))
70
earthaccess
can print a preview of the data using the metadata from CMR¶
Note: there is a bug in earthaccess where the reported size of the granules are always 0, fix is coming next week
granules[7]
Data: EMIT_L2A_RFL_001_20230304T151234_2306310_003.ncEMIT_L2A_RFLUNCERT_001_20230304T151234_2306310_003.ncEMIT_L2A_MASK_001_20230304T151234_2306310_003.nc
Size: 3578.78 MB
Cloud Hosted: True
Streaming data from S3 with fsspec¶
Opening the data with earthaccess.open()
and accessing the NetCDF as if it was local
If we run this code in AWS(us-west-2), earthaccess can use direct S3 links. If we run it out of AWS, earthaccess can only use HTTPS links. Direct S3 access for NASA data is only allowed in region.
# open() accepts a list of results or a list of links
file_handlers = earthaccess.open(granules)
file_handlers
--------------------------------------------------------------------------- DispatchError Traceback (most recent call last) Cell In[5], line 2 1 # open() accepts a list of results or a list of links ----> 2 file_handlers = earthaccess.open(granules) 3 file_handlers File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/900/lib/python3.10/site-packages/earthaccess/api.py:275, in open(granules, provider, pqdm_kwargs) 255 def open( 256 granules: Union[List[str], List[DataGranule]], 257 provider: Optional[str] = None, 258 *, 259 pqdm_kwargs: Optional[Mapping[str, Any]] = None, 260 ) -> List[AbstractFileSystem]: 261 """Returns a list of file-like objects that can be used to access files 262 hosted on S3 or HTTPS by third party libraries like xarray. 263 (...) 273 A list of "file pointers" to remote (i.e. s3 or https) files. 274 """ --> 275 return earthaccess.__store__.open( 276 granules=granules, 277 provider=_normalize_location(provider), 278 pqdm_kwargs=pqdm_kwargs, 279 ) File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/900/lib/python3.10/site-packages/earthaccess/store.py:363, in Store.open(self, granules, provider, pqdm_kwargs) 348 """Returns a list of file-like objects that can be used to access files 349 hosted on S3 or HTTPS by third party libraries like xarray. 350 (...) 360 A list of "file pointers" to remote (i.e. s3 or https) files. 361 """ 362 if len(granules): --> 363 return self._open(granules, provider, pqdm_kwargs=pqdm_kwargs) 364 return [] File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/900/lib/python3.10/site-packages/multimethod/__init__.py:373, in multimethod.__call__(self, *args, **kwargs) 371 """Resolve and dispatch to best method.""" 372 self.evaluate() --> 373 func = self.dispatch(*args) 374 try: 375 return func(*args, **kwargs) File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/900/lib/python3.10/site-packages/multimethod/__init__.py:365, in multimethod.dispatch(self, *args) 363 types = tuple(map(type, args)) 364 if not any(map(issubclass, types, self.generics)): --> 365 return self[types] 366 matches = {key for key in list(self) if isinstance(key, signature) and key.instances(*args)} 367 matches -= {ancestor for match in matches for ancestor in match.parents} File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/900/lib/python3.10/site-packages/multimethod/__init__.py:360, in multimethod.__missing__(self, types) 358 if types in self: 359 return self[types] --> 360 return self.setdefault(types, self.select(types, self.parents(types))) File ~/checkouts/readthedocs.org/user_builds/earthaccess/envs/900/lib/python3.10/site-packages/multimethod/__init__.py:352, in multimethod.select(self, types, keys) 350 if len(funcs) == 1: 351 return funcs.pop() --> 352 raise DispatchError(f"{self.__name__}: {len(keys)} methods found", types, keys) DispatchError: ('_open: 0 methods found', (<class 'earthaccess.store.Store'>, <class 'earthaccess.search.DataGranules'>, <class 'NoneType'>), set())
%%time
# we can use any file from the array
file_p = file_handlers[4]
refl = xr.open_dataset(file_p)
wvl = xr.open_dataset(file_p, group="sensor_band_parameters")
loc = xr.open_dataset(file_p, group="location")
ds = xr.merge([refl, loc])
ds = ds.assign_coords(
{
"downtrack": (["downtrack"], refl.downtrack.data),
"crosstrack": (["crosstrack"], refl.crosstrack.data),
**wvl.variables,
}
)
ds
--------------------------------------------------------------------------- NameError Traceback (most recent call last) File <timed exec>:2 NameError: name 'file_handlers' is not defined
Plotting non orthorectified data¶
Use the following code to plot the Panel widget when you run this code on AWS us-west-2
import holoviews as hv
import hvplot.xarray
import numpy as np
import panel as pn
pn.extension()
# Find band nearest to value of 850 nm (NIR)
b850 = np.nanargmin(abs(ds["wavelengths"].values - 850))
ref_unc = ds["reflectance_uncertainty"]
image = ref_unc.sel(bands=b850).hvplot("crosstrack", "downtrack", cmap="viridis")
stream = hv.streams.Tap(source=image, x=255, y=484)
def wavelengths_histogram(x, y):
histo = ref_unc.sel(crosstrack=x, downtrack=y, method="nearest").hvplot(
x="wavelengths", color="green"
)
return histo
tap_dmap = hv.DynamicMap(wavelengths_histogram, streams=[stream])
pn.Column(image, tap_dmap)