gpm.dataset package

gpm.dataset package#

Subpackages#

gpm.dataset.decoding package

Submodules#

gpm.dataset.attrs module#

This module contains functions to parse GPM granule attributes.

gpm.dataset.attrs.add_history(ds)[source]#: Add the history attribute to the xr.Dataset.

gpm.dataset.attrs.decode_attrs(attrs)[source]#: Decode GPM nested dictionary attributes from a xarray object.

gpm.dataset.attrs.decode_string(string)[source]#

Decode string dictionary.

Format: "<key>=<value>\\n"..

It removes ; and \\t prior to parsing the string.

gpm.dataset.attrs.get_granule_attrs(dt)[source]#: Get granule global attributes.

gpm.dataset.conventions module#

This module contains functions to enforce CF-conventions into the GPM-API objects.

gpm.dataset.conventions.finalize_dataset(ds, product, decode_cf, scan_mode, start_time=None, end_time=None)[source]#: Finalize GPM xr.Dataset object.

gpm.dataset.conventions.reshape_dataset(ds)[source]#

Define the dataset dimension order.

It ensures that the output dimension order is (y, x) This shape is expected by i.e. pyresample and matplotlib For GPM GRID objects: (…, time, lat, lon) For GPM ORBIT objects: (cross_track, along_track, …)

gpm.dataset.coords module#

This module contains functions to extract the coordinates from GPM files.

gpm.dataset.coords.get_coords(dt, scan_mode)[source]#: Get coordinates from GPM objects.

gpm.dataset.coords.get_coords_attrs_dict(ds)[source]#: Return relevant GPM coordinates attributes.

gpm.dataset.coords.get_grid_coords(dt, scan_mode)[source]#

Get coordinates from Grid objects.

Set ‘time’ to the end of the accumulation period. Example: IMERG provide the average rain rate (mm/hr) over the half-hour period

NOTE: IMERG and GRID products does not have GranuleNumber!

gpm.dataset.coords.get_orbit_coords(dt, scan_mode)[source]#: Get coordinates from Orbit objects.

gpm.dataset.coords.get_time_delta_from_time_interval(time_interval)[source]#

gpm.dataset.coords.set_coords_attrs(ds)[source]#: Set dataset coordinate attributes.

gpm.dataset.crs module#

This module contains functions to define and create CF-compliant CRS.

gpm.dataset.crs.get_pyproj_crs(xr_obj)[source]#

Return pyproj.crs.CoordinateSystem from CRS coordinate(s).

If a geographic and projected CRS are present, it returns the projected.

Parameters:: xr_obj (xarray.Dataset or xarray.DataArray) –
Returns:: proj_crs
Return type:: CoordinateSystem

gpm.dataset.crs.get_pyresample_area(xr_obj)[source]#

Define pyresample area from CF-compliant xarray object.

To be used by the pyresample accessor: ds.pyresample.area

gpm.dataset.crs.get_pyresample_projection(xr_obj)[source]#: Get pyresample AreaDefinition from CF-compliant xarray object.

gpm.dataset.crs.get_pyresample_swath(xr_obj)[source]#: Get pyresample SwathDefinition from CF-compliant xarray object.

gpm.dataset.crs.has_proj_coords(xr_obj)[source]#

gpm.dataset.crs.has_swath_coords(xr_obj)[source]#

gpm.dataset.crs.remove_existing_crs_info(ds)[source]#: Remove existing grid_mapping attributes.

gpm.dataset.crs.set_dataset_crs(ds, crs, grid_mapping_name='spatial_ref', inplace=False)[source]#

Add CF-compliant CRS information to an xr.Dataset.

It assumes all dataset variables have same CRS ! For projected CRS, it expects that the CRS dimension coordinates are specified. For swath dataset, it expects that the geographic coordinates are specified.

For projected CRS, if 2D latitude/longitude arrays are specified, it assumes they refer to the WGS84 CRS !

Parameters:

ds (xarray.Dataset) –
crs (CoordinateSystem) – CRS information to be added to the xr.Dataset
grid_mapping_name (str) – Name of the grid_mapping coordinate to store the CRS information The default is spatial_ref. Other common names are grid_mapping and crs.

Returns:

ds – Dataset with CF-compliant CRS information.

Return type:

xarray.Dataset

gpm.dataset.crs.set_dataset_single_crs(ds, crs, grid_mapping_name='spatial_ref', inplace=False)[source]#

Add CF-compliant CRS information to an xr.Dataset.

Parameters:

ds (xarray.Dataset) –
crs (CoordinateSystem) – CRS information to be added to the xr.Dataset
grid_mapping_name (str) – Name of the grid_mapping coordinate to store the CRS information The default is spatial_ref. Other common names are grid_mapping and crs.

Returns:

ds – Dataset with CF-compliant CRS information.

Return type:

xarray.Dataset

gpm.dataset.crs.simplify_grid_mapping_values(ds)[source]#

Simplify grid_mapping value.

GDAL does not support grid_mapping defined as “crs_wgs84: lat lon” If only 1 CRS is specified in such format, it returns “crs_wgs84”

gpm.dataset.dataset module#

This module contains functions to read files into a GPM-API Dataset.

gpm.dataset.dataset.open_dataset(product, start_time, end_time, variables=None, groups=None, scan_mode=None, version=None, product_type='RS', chunks={}, decode_cf=True, parallel=False, prefix_group=False, verbose=False)[source]#

Lazily map HDF5 data into xarray.Dataset with relevant GPM data and attributes.

Note:

gpm.open_dataset does not load GPM granules with the FileHeader flag 'EmptyGranule' != 'NOT_EMPTY'
The group ScanStatus provides relevant data flags for Swath products.
The variable dataQuality provides an overall quality flag status. If dataQuality = 0, no issues have been detected.
The variable SCorientation provides the orientation of the sensor from the forward track of the satellite.

Parameters:

product (str) – GPM product acronym.
start_time ((datetime.datetime, datetime.date, np.datetime64, str)) – Start time. Accepted types: datetime.datetime, datetime.date, np.datetime64 or str. If string type, it expects the isoformat YYYY-MM-DD hh:mm:ss.
end_time ((datetime.datetime, datetime.date, np.datetime64, str)) – End time. Accepted types: datetime.datetime, datetime.date, np.datetime64 or str. If string type, it expects the isoformat YYYY-MM-DD hh:mm:ss.
variables (list, str, optional) – Variables to read from the HDF5 file. The default is None (all variables).
groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is None (all groups).
scan_mode (str, optional) –
Scan mode of the GPM product. The default is None. Use gpm.available_scan_modes(product, version) to get the available scan modes for a specific product. The radar products have the following scan modes:
- 'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).
- 'NS': Normal Scan. For Ku band and DPR (till version 6 products).
- 'MS': Matched Scan. For Ka band and DPR (till version 6 products).
- 'HS': High-sensitivity Scan. For Ka band and DPR.
product_type (str, optional) – GPM product type. Either 'RS' (Research) or 'NRT' (Near-Real-Time). The default is 'RS'.
version (int, optional) – GPM version of the data to retrieve if product_type = "RS". GPM data readers currently support version 4, 5, 6 and 7.
chunks (int, dict, 'auto' or None, optional) –
Chunk size for dask array:
- chunks=-1 loads the dataset with dask using a single chunk for all arrays.
- chunks={} loads the dataset with dask using the file chunks.
- chunks='auto' will use dask auto chunking taking into account the file chunks.
If you want to load data in memory directly, specify chunks=None. The default is {}.

Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().
decode_cf (bool, optional) – Whether to decode the dataset. The default is False.
prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. If you aim to save the Dataset to disk as netCDF or Zarr, you need to set prefix_group=False or later remove the prefix before writing the dataset. The default is False.
parallel (bool) – If True, the dataset are opened in parallel using dask.delayed. If parallel=True, 'chunks' can not be None. The underlying data must be dask.Array. The default is False.

Return type:

xarray.Dataset

gpm.dataset.datatree module#

This module contains functions to read a GPM granule into a DataTree object.

gpm.dataset.datatree.check_non_empty_granule(dt, filepath)[source]#: Check that the datatree (or dataset) is not empty.

gpm.dataset.datatree.check_valid_granule(filepath)[source]#: Raise an explanatory error if the GPM granule is not readable.

gpm.dataset.datatree.open_datatree(filepath, chunks={}, decode_cf=False, use_api_defaults=True)[source]#

Open HDF5 in datatree object.

chunks={} –> Lazy map to dask.array –> Wait for pydata/xarray#7948 –> Maybe need to implement “auto” option manually that defaults to full shape”
chunks=”auto” –> datatree fails. Can not estimate size of object dtype !
chunks=None –> lazy map to numpy.array

gpm.dataset.dimensions module#

This module contains functions to retrieve the dimensions associated to each GPM variable.

gpm.dataset.granule module#

This module contains functions to read a single file into a GPM-API Dataset.

gpm.dataset.granule.get_variables(ds)[source]#: Retrieve the dataset variables.

gpm.dataset.granule.get_variables_dims(ds)[source]#: Retrieve the dimensions used by the xr.Dataset variables.

gpm.dataset.granule.open_granule(filepath, scan_mode=None, groups=None, variables=None, decode_cf=True, chunks={}, prefix_group=False)[source]#

Create a lazy xarray.Dataset with relevant GPM data and attributes for a specific granule.

Parameters:

filepath (str) – Filepath of GPM granule dataset
scan_mode (str, optional) –
Scan mode of the GPM product. The default is None. Use gpm.available_scan_modes(product, version) to get the available scan modes for a specific product. The radar products have the following scan modes:
- 'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).
- 'NS': Normal Scan. For Ku band and DPR (till version 6 products).
- 'MS': Matched Scan. For Ka band and DPR (till version 6 products).
- 'HS': High-sensitivity Scan. For Ka band and DPR.
variables (list, str, optional) – Variables to read from the HDF5 file. The default is None (all variables).
groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is None (all groups).
chunks (int, dict, 'auto' or None, optional) –
Chunk size for dask array:
- chunks=-1 loads the dataset with dask using a single chunk for all arrays.
- chunks={} loads the dataset with dask using the file chunks.
- chunks='auto' will use dask auto chunking taking into account the file chunks.
If you want to load data in memory directly, specify chunks=None. The default is {}.

Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().
decode_cf (bool, optional) – Whether to decode the dataset. The default is False.
prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. THe default is True.

Returns:

Return type:

xarray.Dataset

gpm.dataset.granule.remove_unused_var_dims(ds)[source]#: Remove coordinates and dimensions not used by the xr.Dataset variables.

gpm.dataset.granule.unused_var_dims(ds)[source]#: Retrieve the dimensions not used by the the xr.Dataset variables.

gpm.dataset.groups_variables module#

This module contains functions to read GPM file groups, sub-groups and variables.

Module contents#

This directory defines the GPM-API datasets.

gpm.dataset package

Contents

gpm.dataset package#

Subpackages#

Submodules#

gpm.dataset.attrs module#

gpm.dataset.conventions module#

gpm.dataset.coords module#

gpm.dataset.crs module#

gpm.dataset.dataset module#

gpm.dataset.datatree module#

gpm.dataset.dimensions module#

gpm.dataset.granule module#

gpm.dataset.groups_variables module#

Module contents#