gpm.dataset package#

Subpackages#

Submodules#

gpm.dataset.attrs module#

This module contains functions to parse GPM granule attributes.

gpm.dataset.attrs.add_history(ds)[source]#

Add the history attribute to the xr.Dataset.

gpm.dataset.attrs.decode_attrs(attrs)[source]#

Decode GPM nested dictionary attributes from a xarray object.

gpm.dataset.attrs.decode_string(string)[source]#

Decode string dictionary.

Format: "<key>=<value>\\n"..

It removes ; and \\t prior to parsing the string.

gpm.dataset.attrs.get_granule_attrs(dt)[source]#

Get granule global attributes.

gpm.dataset.conventions module#

This module contains functions to enforce CF-conventions into the GPM-API objects.

gpm.dataset.conventions.finalize_dataset(ds, product, decode_cf, scan_mode, start_time=None, end_time=None)[source]#

Finalize GPM xr.Dataset object.

gpm.dataset.conventions.reshape_dataset(ds)[source]#

Define the dataset dimension order.

It ensures that the output dimension order is (y, x) This shape is expected by i.e. pyresample and matplotlib For GPM GRID objects: (…, time, lat, lon) For GPM ORBIT objects: (cross_track, along_track, …)

gpm.dataset.coords module#

This module contains functions to extract the coordinates from GPM files.

gpm.dataset.coords.get_coords(dt, scan_mode)[source]#

Get coordinates from GPM objects.

gpm.dataset.coords.get_coords_attrs_dict(ds)[source]#

Return relevant GPM coordinates attributes.

gpm.dataset.coords.get_grid_coords(dt, scan_mode)[source]#

Get coordinates from Grid objects.

Set ‘time’ to the end of the accumulation period. Example: IMERG provide the average rain rate (mm/hr) over the half-hour period

NOTE: IMERG and GRID products does not have GranuleNumber!

gpm.dataset.coords.get_orbit_coords(dt, scan_mode)[source]#

Get coordinates from Orbit objects.

gpm.dataset.coords.get_time_delta_from_time_interval(time_interval)[source]#
gpm.dataset.coords.set_coords_attrs(ds)[source]#

Set dataset coordinate attributes.

gpm.dataset.crs module#

This module contains functions to define and create CF-compliant CRS.

gpm.dataset.crs.get_pyproj_crs(xr_obj)[source]#

Return pyproj.crs.CoordinateSystem from CRS coordinate(s).

If a geographic and projected CRS are present, it returns the projected.

Parameters:

xr_obj (xarray.Dataset or xarray.DataArray) –

Returns:

proj_crs

Return type:

CoordinateSystem

gpm.dataset.crs.get_pyresample_area(xr_obj)[source]#

Define pyresample area from CF-compliant xarray object.

To be used by the pyresample accessor: ds.pyresample.area

gpm.dataset.crs.get_pyresample_projection(xr_obj)[source]#

Get pyresample AreaDefinition from CF-compliant xarray object.

gpm.dataset.crs.get_pyresample_swath(xr_obj)[source]#

Get pyresample SwathDefinition from CF-compliant xarray object.

gpm.dataset.crs.has_proj_coords(xr_obj)[source]#
gpm.dataset.crs.has_swath_coords(xr_obj)[source]#
gpm.dataset.crs.remove_existing_crs_info(ds)[source]#

Remove existing grid_mapping attributes.

gpm.dataset.crs.set_dataset_crs(ds, crs, grid_mapping_name='spatial_ref', inplace=False)[source]#

Add CF-compliant CRS information to an xr.Dataset.

It assumes all dataset variables have same CRS ! For projected CRS, it expects that the CRS dimension coordinates are specified. For swath dataset, it expects that the geographic coordinates are specified.

For projected CRS, if 2D latitude/longitude arrays are specified, it assumes they refer to the WGS84 CRS !

Parameters:
  • ds (xarray.Dataset) –

  • crs (CoordinateSystem) – CRS information to be added to the xr.Dataset

  • grid_mapping_name (str) – Name of the grid_mapping coordinate to store the CRS information The default is spatial_ref. Other common names are grid_mapping and crs.

Returns:

ds – Dataset with CF-compliant CRS information.

Return type:

xarray.Dataset

gpm.dataset.crs.set_dataset_single_crs(ds, crs, grid_mapping_name='spatial_ref', inplace=False)[source]#

Add CF-compliant CRS information to an xr.Dataset.

It assumes all dataset variables have same CRS ! For projected CRS, it expects that the CRS dimension coordinates are specified. For swath dataset, it expects that the geographic coordinates are specified.

Parameters:
  • ds (xarray.Dataset) –

  • crs (CoordinateSystem) – CRS information to be added to the xr.Dataset

  • grid_mapping_name (str) – Name of the grid_mapping coordinate to store the CRS information The default is spatial_ref. Other common names are grid_mapping and crs.

Returns:

ds – Dataset with CF-compliant CRS information.

Return type:

xarray.Dataset

gpm.dataset.crs.simplify_grid_mapping_values(ds)[source]#

Simplify grid_mapping value.

GDAL does not support grid_mapping defined as “crs_wgs84: lat lon” If only 1 CRS is specified in such format, it returns “crs_wgs84”

gpm.dataset.dataset module#

This module contains functions to read files into a GPM-API Dataset.

gpm.dataset.dataset.open_dataset(product, start_time, end_time, variables=None, groups=None, scan_mode=None, version=None, product_type='RS', chunks={}, decode_cf=True, parallel=False, prefix_group=False, verbose=False)[source]#

Lazily map HDF5 data into xarray.Dataset with relevant GPM data and attributes.

Note:

  • gpm.open_dataset does not load GPM granules with the FileHeader flag 'EmptyGranule' != 'NOT_EMPTY'

  • The group ScanStatus provides relevant data flags for Swath products.

  • The variable dataQuality provides an overall quality flag status. If dataQuality = 0, no issues have been detected.

  • The variable SCorientation provides the orientation of the sensor from the forward track of the satellite.

Parameters:
  • product (str) – GPM product acronym.

  • start_time ((datetime.datetime, datetime.date, np.datetime64, str)) – Start time. Accepted types: datetime.datetime, datetime.date, np.datetime64 or str. If string type, it expects the isoformat YYYY-MM-DD hh:mm:ss.

  • end_time ((datetime.datetime, datetime.date, np.datetime64, str)) – End time. Accepted types: datetime.datetime, datetime.date, np.datetime64 or str. If string type, it expects the isoformat YYYY-MM-DD hh:mm:ss.

  • variables (list, str, optional) – Variables to read from the HDF5 file. The default is None (all variables).

  • groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is None (all groups).

  • scan_mode (str, optional) –

    Scan mode of the GPM product. The default is None. Use gpm.available_scan_modes(product, version) to get the available scan modes for a specific product. The radar products have the following scan modes:

    • 'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).

    • 'NS': Normal Scan. For Ku band and DPR (till version 6 products).

    • 'MS': Matched Scan. For Ka band and DPR (till version 6 products).

    • 'HS': High-sensitivity Scan. For Ka band and DPR.

  • product_type (str, optional) – GPM product type. Either 'RS' (Research) or 'NRT' (Near-Real-Time). The default is 'RS'.

  • version (int, optional) – GPM version of the data to retrieve if product_type = "RS". GPM data readers currently support version 4, 5, 6 and 7.

  • chunks (int, dict, 'auto' or None, optional) –

    Chunk size for dask array:

    • chunks=-1 loads the dataset with dask using a single chunk for all arrays.

    • chunks={} loads the dataset with dask using the file chunks.

    • chunks='auto' will use dask auto chunking taking into account the file chunks.

    If you want to load data in memory directly, specify chunks=None. The default is {}.

    Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().

  • decode_cf (bool, optional) – Whether to decode the dataset. The default is False.

  • prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. If you aim to save the Dataset to disk as netCDF or Zarr, you need to set prefix_group=False or later remove the prefix before writing the dataset. The default is False.

  • parallel (bool) – If True, the dataset are opened in parallel using dask.delayed. If parallel=True, 'chunks' can not be None. The underlying data must be dask.Array. The default is False.

Return type:

xarray.Dataset

gpm.dataset.datatree module#

This module contains functions to read a GPM granule into a DataTree object.

gpm.dataset.datatree.check_non_empty_granule(dt, filepath)[source]#

Check that the datatree (or dataset) is not empty.

gpm.dataset.datatree.check_valid_granule(filepath)[source]#

Raise an explanatory error if the GPM granule is not readable.

gpm.dataset.datatree.open_datatree(filepath, chunks={}, decode_cf=False, use_api_defaults=True)[source]#

Open HDF5 in datatree object.

  • chunks={} –> Lazy map to dask.array –> Wait for pydata/xarray#7948 –> Maybe need to implement “auto” option manually that defaults to full shape”

  • chunks=”auto” –> datatree fails. Can not estimate size of object dtype !

  • chunks=None –> lazy map to numpy.array

gpm.dataset.dimensions module#

This module contains functions to retrieve the dimensions associated to each GPM variable.

gpm.dataset.granule module#

This module contains functions to read a single file into a GPM-API Dataset.

gpm.dataset.granule.get_variables(ds)[source]#

Retrieve the dataset variables.

gpm.dataset.granule.get_variables_dims(ds)[source]#

Retrieve the dimensions used by the xr.Dataset variables.

gpm.dataset.granule.open_granule(filepath, scan_mode=None, groups=None, variables=None, decode_cf=True, chunks={}, prefix_group=False)[source]#

Create a lazy xarray.Dataset with relevant GPM data and attributes for a specific granule.

Parameters:
  • filepath (str) – Filepath of GPM granule dataset

  • scan_mode (str, optional) –

    Scan mode of the GPM product. The default is None. Use gpm.available_scan_modes(product, version) to get the available scan modes for a specific product. The radar products have the following scan modes:

    • 'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).

    • 'NS': Normal Scan. For Ku band and DPR (till version 6 products).

    • 'MS': Matched Scan. For Ka band and DPR (till version 6 products).

    • 'HS': High-sensitivity Scan. For Ka band and DPR.

  • variables (list, str, optional) – Variables to read from the HDF5 file. The default is None (all variables).

  • groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is None (all groups).

  • chunks (int, dict, 'auto' or None, optional) –

    Chunk size for dask array:

    • chunks=-1 loads the dataset with dask using a single chunk for all arrays.

    • chunks={} loads the dataset with dask using the file chunks.

    • chunks='auto' will use dask auto chunking taking into account the file chunks.

    If you want to load data in memory directly, specify chunks=None. The default is {}.

    Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().

  • decode_cf (bool, optional) – Whether to decode the dataset. The default is False.

  • prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. THe default is True.

Returns:

ds

Return type:

xarray.Dataset

gpm.dataset.granule.remove_unused_var_dims(ds)[source]#

Remove coordinates and dimensions not used by the xr.Dataset variables.

gpm.dataset.granule.unused_var_dims(ds)[source]#

Retrieve the dimensions not used by the the xr.Dataset variables.

gpm.dataset.groups_variables module#

This module contains functions to read GPM file groups, sub-groups and variables.

Module contents#

This directory defines the GPM-API datasets.