gpm.dataset package#

Subpackages#

Submodules#

gpm.dataset.attrs module#

This module contains functions to parse GPM granule attributes.

gpm.dataset.attrs.add_history(ds)[source][source]#

Add the history attribute to the xarray.Dataset.

gpm.dataset.attrs.decode_attrs(attrs)[source][source]#

Decode GPM nested dictionary attributes from a xarray object.

gpm.dataset.attrs.decode_string(string)[source][source]#

Decode string dictionary.

Format: "<key>=<value>\\n"..

It removes ; and \\t prior to parsing the string.

gpm.dataset.attrs.get_granule_attrs(dt)[source][source]#

Get granule global attributes.

gpm.dataset.conventions module#

This module contains functions to enforce CF-conventions into the GPM-API objects.

gpm.dataset.conventions.add_gpm_api_product(ds, product)[source][source]#

Add gpm_api_product attribute to Dataset and DataArray variables.

gpm.dataset.conventions.finalize_dataset(ds, product, decode_cf, scan_mode, start_time=None, end_time=None)[source][source]#

Finalize GPM xarray.Dataset object.

gpm.dataset.conventions.reshape_dataset(ds)[source][source]#

Define the dataset dimension order.

It ensures that the output dimension order is (y, x) This shape is expected by i.e. pyresample and matplotlib For GPM GRID objects: (…, time, lat, lon) For GPM ORBIT objects: (cross_track, along_track, …)

gpm.dataset.coords module#

This module contains functions to extract the coordinates from GPM files.

gpm.dataset.coords.get_coords(dt, scan_mode)[source][source]#

Get coordinates from GPM objects.

gpm.dataset.coords.get_coords_attrs_dict(ds)[source][source]#

Return relevant GPM coordinates attributes.

gpm.dataset.coords.get_grid_coords(dt, scan_mode)[source][source]#

Get coordinates from Grid objects.

Set ‘time’ to the end of the accumulation period. Example: IMERG provide the average rain rate (mm/hr) over the half-hour period

NOTE: IMERG and GRID products does not have GranuleNumber!

gpm.dataset.coords.get_orbit_coords(dt, scan_mode)[source][source]#

Get coordinates from Orbit objects.

gpm.dataset.coords.get_time_delta_from_time_interval(time_interval)[source][source]#
gpm.dataset.coords.set_coords_attrs(ds)[source][source]#

Set dataset coordinate attributes.

gpm.dataset.crs module#

This module contains functions to define and create CF-compliant CRS.

gpm.dataset.crs.compute_extent(x_coords, y_coords)[source][source]#

Compute the extent (x_min, x_max, y_min, y_max) from pixel centroids.

This function assumes that the spacing between each pixel is uniform. It takes into account the decreasing/increasing order of the coordinates.

The output extent format is the one expected by matplotlib and cartopy. Please note that the pyresample area_extent is [x_min, ymin, y_max, y_max]

gpm.dataset.crs.compute_pyresample_area_extent(x_coords, y_coords)[source][source]#

Compute the pyresamnple area extent [x_min, ymin, y_max, y_max] from pixel centroids.

gpm.dataset.crs.get_pyproj_crs(xr_obj)[source][source]#

Return a pyproj.crs.CRS from CRS coordinate(s).

If a geographic and projected CRS are present, it returns the projected.

This method is also available as property through the xarray accessor gpm.pyproj_crs.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) –

Returns:

proj_crs

Return type:

pyproj.crs.CRS

gpm.dataset.crs.get_pyresample_area(xr_obj)[source][source]#

Define pyresample area from CF-compliant xarray.DataArray or xarray.Dataset.

To be used by the pyresample accessor: ds.pyresample.area

gpm.dataset.crs.get_pyresample_projection(xr_obj)[source][source]#

Get pyresample AreaDefinition from CF-compliant xarray.DataArray or xarray.Dataset.

gpm.dataset.crs.get_pyresample_swath(xr_obj)[source][source]#

Get pyresample SwathDefinition from CF-compliant xarray.DataArray or xarray.Dataset.

gpm.dataset.crs.get_spatial_coordinates(xr_obj)[source][source]#

Return the xarray object x and y spatial coordinates.

gpm.dataset.crs.get_x_coordinate(xr_obj)[source][source]#

Return the xarray object x spatial coordinate.

gpm.dataset.crs.get_y_coordinate(xr_obj)[source][source]#

Return the xarray object y spatial coordinate.

gpm.dataset.crs.has_proj_coords(xr_obj)[source][source]#
gpm.dataset.crs.has_swath_coords(xr_obj)[source][source]#
gpm.dataset.crs.remove_existing_crs_info(xr_obj)[source][source]#

Remove existing grid_mapping attributes.

gpm.dataset.crs.set_dataset_crs(ds, crs, grid_mapping_name='spatial_ref', inplace=False)[source][source]#

Add CF-compliant CRS information to an xarray DataArray or Dataset.

If a xarray Dataset, it assumes all dataset variables have same CRS ! For projected CRS, it expects that the CRS dimension coordinates are specified. For swath dataset, it expects that the geographic coordinates are specified.

For projected CRS, if 2D latitude/longitude arrays are specified, it assumes they refer to the WGS84 CRS !

Parameters:
  • ds (xarray.Dataset or xarray.DataArray) –

  • crs (pyproj.crs.CRS) – CRS information to be added to the xarray.Dataset

  • grid_mapping_name (str) – Name of the grid_mapping coordinate to store the CRS information The default is spatial_ref. Other common names are grid_mapping and crs.

Returns:

ds – Dataset or DataArray with CF-compliant CRS information.

Return type:

xarray.Dataset or xarray.DataArray

gpm.dataset.crs.set_dataset_single_crs(xr_obj, crs, grid_mapping_name='spatial_ref', inplace=False)[source][source]#

Add CF-compliant CRS information to an xarray.Dataset.

It assumes all dataset variables have same CRS ! For projected CRS, it expects that the CRS dimension coordinates are specified. For swath dataset, it expects that the geographic coordinates are specified.

Parameters:
  • ds (xarray.Dataset) –

  • crs (pyproj.crs.CRS) – CRS information to be added to the xarray.Dataset

  • grid_mapping_name (str) – Name of the grid_mapping coordinate to store the CRS information The default is spatial_ref. Other common names are grid_mapping and crs.

Returns:

ds – Dataset with CF-compliant CRS information.

Return type:

xarray.Dataset

gpm.dataset.crs.simplify_grid_mapping_values(xr_obj)[source][source]#

Simplify grid_mapping value.

GDAL does not support grid_mapping defined as “crs_wgs84: lat lon” If only 1 CRS is specified in such format, it returns “crs_wgs84”

gpm.dataset.dataset module#

This module contains functions to read files into a GPM-API Dataset or DataTree.

gpm.dataset.dataset.open_dataset(product, start_time, end_time, variables=None, groups=None, scan_mode=None, version=None, product_type='RS', chunks=-1, decode_cf=True, parallel=False, prefix_group=False, verbose=False, base_dir=None, **kwargs)[source][source]#

Lazily map HDF5 data into xarray.Dataset with relevant GPM data and attributes.

Note:

  • gpm.open_dataset does not load GPM granules with the FileHeader flag 'EmptyGranule' != 'NOT_EMPTY'.

  • The coordinates Quality or dataQuality provide an overall quality flag status.

  • The coordinate SCorientation provides the orientation of the sensor from the forward track of the satellite.

Parameters:
  • product (str) – GPM product acronym.

  • start_time (datetime.datetime, datetime.date, numpy.datetime64 or str) – Start time. Accepted types: datetime.datetime, datetime.date, numpy.datetime64 or str. If string type, it expects the isoformat YYYY-MM-DD hh:mm:ss.

  • end_time (datetime.datetime, datetime.date, numpy.datetime64 or str) – End time. Accepted types: datetime.datetime, datetime.date, numpy.datetime64 or str. If string type, it expects the isoformat YYYY-MM-DD hh:mm:ss.

  • variables (list, str, optional) – Variables to read from the HDF5 file. The default is None (all variables).

  • groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is None (all groups).

  • scan_mode (str, optional) –

    Scan mode of the GPM product. The default is None. Use gpm.available_scan_modes(product, version) to get the available scan modes for a specific product. The radar products have the following scan modes:

    • 'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).

    • 'NS': Normal Scan. For Ku band and DPR (till version 6 products).

    • 'MS': Matched Scan. For Ka band and DPR (till version 6 products).

    • 'HS': High-sensitivity Scan. For Ka band and DPR.

  • product_type (str, optional) – GPM product type. Either 'RS' (Research) or 'NRT' (Near-Real-Time). The default is 'RS'.

  • version (int, optional) – GPM version of the data to retrieve if product_type = "RS". GPM data readers currently support version 4, 5, 6 and 7.

  • chunks (int, dict, str or None, optional) –

    Chunk size for dask array:

    • chunks=-1 loads the dataset with dask using a single chunk for each granule arrays.

    • chunks={} loads the dataset with dask using the file chunks.

    • chunks='auto' will use dask auto chunking taking into account the file chunks.

    If you want to load data in memory directly, specify chunks=None. The default is auto.

    Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().

  • decode_cf (bool, optional) – Whether to decode the dataset. The default is False.

  • prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. If you aim to save the Dataset to disk as netCDF or Zarr, you need to set prefix_group=False or later remove the prefix before writing the dataset. The default is False.

  • parallel (bool) – If True, the dataset are opened in parallel using dask.delayed.delayed. If parallel=True, 'chunks' can not be None. The underlying data must be dask.array.Array. The default is False.

  • **kwargs (dict) – Additional keyword arguments passed to open_dataset() for each group.

Return type:

xarray.Dataset

gpm.dataset.dataset.open_datatree(product, start_time, end_time, variables=None, groups=None, scan_modes=None, version=None, product_type='RS', chunks=-1, decode_cf=True, parallel=False, prefix_group=False, verbose=False, base_dir=None, **kwargs)[source][source]#

Lazily map HDF5 data into xarray.DataTree objects with relevant GPM data and attributes.

Note:

  • gpm.open_datatree does not load GPM granules with the FileHeader flag 'EmptyGranule' != 'NOT_EMPTY'.

  • The coordinates Quality or dataQuality provide an overall quality flag status.

  • The coordinate SCorientation provides the orientation of the sensor from the forward track of the satellite.

Parameters:
  • product (str) – GPM product acronym.

  • start_time (datetime.datetime, datetime.date, numpy.datetime64 or str) – Start time. Accepted types: datetime.datetime, datetime.date, numpy.datetime64 or str. If string type, it expects the isoformat YYYY-MM-DD hh:mm:ss.

  • end_time (datetime.datetime, datetime.date, numpy.datetime64 or str) – End time. Accepted types: datetime.datetime, datetime.date, numpy.datetime64 or str. If string type, it expects the isoformat YYYY-MM-DD hh:mm:ss.

  • variables (list, str, optional) – Variables to read from the HDF5 file. The default is None (all variables).

  • groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is None (all groups).

  • scan_modes (str, optional) –

    Scan mode of the GPM product. If None (the default), loads all scan modes. Use gpm.available_scan_modes(product, version) to see the available scan modes for a specific product. The radar products have the following scan modes:

    • 'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).

    • 'NS': Normal Scan. For Ku band and DPR (till version 6 products).

    • 'MS': Matched Scan. For Ka band and DPR (till version 6 products).

    • 'HS': High-sensitivity Scan. For Ka band and DPR.

  • product_type (str, optional) – GPM product type. Either 'RS' (Research) or 'NRT' (Near-Real-Time). The default is 'RS'.

  • version (int, optional) – GPM version of the data to retrieve if product_type = "RS". GPM data readers currently support version 4, 5, 6 and 7.

  • chunks (int, dict, str or None, optional) –

    Chunk size for dask array:

    • chunks=-1 loads the dataset with dask using a single chunk for each granule arrays.

    • chunks={} loads the dataset with dask using the file chunks.

    • chunks='auto' will use dask auto chunking taking into account the file chunks.

    If you want to load data in memory directly, specify chunks=None. The default is auto.

    Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().

  • decode_cf (bool, optional) – Whether to decode the dataset. The default is False.

  • prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. If you aim to save the Dataset to disk as netCDF or Zarr, you need to set prefix_group=False or later remove the prefix before writing the dataset. The default is False.

  • parallel (bool) – If True, the dataset are opened in parallel using dask.delayed.delayed. If parallel=True, 'chunks' can not be None. The underlying data must be dask.array.Array. The default is False.

  • **kwargs (dict) – Additional keyword arguments passed to open_datatree() for each group.

Return type:

xarray.DataTree

gpm.dataset.dataset.open_files(filepaths, parallel=False, scan_modes=None, groups=None, variables=None, prefix_group=False, start_time=None, end_time=None, chunks=-1, decode_cf=True, **kwargs)[source][source]#

gpm.dataset.datatree module#

This module contains functions to read a GPM granule into a DataTree object.

gpm.dataset.datatree.check_non_empty_granule(dt, filepath)[source][source]#

Check that the datatree (or dataset) is not empty.

gpm.dataset.datatree.check_valid_granule(filepath)[source][source]#

Raise an explanatory error if the GPM granule is not readable.

gpm.dataset.datatree.open_raw_datatree(filepath, chunks={}, decode_cf=False, use_api_defaults=True, **kwargs)[source][source]#

Open a GPM HDF5 file into a xarray.DataTree object with intuitive dimensions names.

Parameters:
  • chunks (int, dict, str or None, optional) –

    Chunk size for dask array:

    • chunks=-1 loads the dataset with dask using a single chunk for each granule arrays.

    • chunks={} loads the dataset with dask using the file chunks.

    • chunks='auto' will use dask auto chunking taking into account the file chunks.

    If you want to load data in memory directly, specify chunks=None. The default is auto.

    Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().

  • decode_cf (bool, optional) – Whether to decode the dataset. The default is False.

  • **kwargs (dict) – Additional keyword arguments passed to open_dataset() for each group.

Return type:

xarray.DataTree

gpm.dataset.dimensions module#

This module contains functions to retrieve the dimensions associated to each GPM variable.

gpm.dataset.dimensions.rename_dataarray_dimensions(da)[source][source]#

Rename xarray.DataArray dimensions.

gpm.dataset.dimensions.rename_dataset_dimensions(ds, use_api_defaults=True)[source][source]#

Rename xarray.Dataset dimension to the actual dimension names.

The actual dimensions names are retrieved from the xarray.DataArrays DimensionNames attribute. The dimension renaming is performed at each Dataset level. If use_api_defaults is True (the default), it sets the GPM-API dimension names.

gpm.dataset.dimensions.rename_datatree_dimensions(dt, use_api_defaults=True)[source][source]#

Rename xarray.DataTree dimension to the actual dimension names.

The actual dimensions names are retrieved from the xarray.DataArrays DimensionNames attribute. The renaming is performed at the xarray.DataArray level because DataArrays sharing same dimension size (but semantic different dimension) are given the same phony_dim_number within xarray.Dataset !

The dimension renaming is performed at each Dataset level. If use_api_defaults is True (the default), it sets the GPM-API dimension names.

gpm.dataset.granule module#

This module contains functions to read a single file into a GPM-API Dataset.

gpm.dataset.granule.get_scan_modes_datasets(filepath, groups, variables, decode_cf, chunks, prefix_group, scan_modes=None, **kwargs)[source][source]#

Return a dictionary with a dataset for each scan mode.

gpm.dataset.granule.get_variables(ds)[source][source]#

Retrieve the dataset variables.

gpm.dataset.granule.get_variables_dims(ds)[source][source]#

Retrieve the dimensions used by the xarray.Dataset variables.

gpm.dataset.granule.open_granule(*args, **kwargs)[source][source]#
gpm.dataset.granule.open_granule_dataset(filepath, scan_mode=None, groups=None, variables=None, decode_cf=True, chunks={}, prefix_group=False, **kwargs)[source][source]#

Create a lazy xarray.Dataset with relevant GPM data and attributes for a specific granule.

Parameters:
  • filepath (str) – Filepath of GPM granule dataset

  • scan_mode (str, optional) –

    Scan mode of the GPM product. The default is None. Use gpm.available_scan_modes(product, version) to get the available scan modes for a specific product. The radar products have the following scan modes:

    • 'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).

    • 'NS': Normal Scan. For Ku band and DPR (till version 6 products).

    • 'MS': Matched Scan. For Ka band and DPR (till version 6 products).

    • 'HS': High-sensitivity Scan. For Ka band and DPR.

  • variables (list, str, optional) – Variables to read from the HDF5 file. The default is None (all variables).

  • groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is None (all groups).

  • chunks (int, dict, str or None, optional) –

    Chunk size for dask array:

    • chunks=-1 loads the dataset with dask using a single chunk for all arrays.

    • chunks={} loads the dataset with dask using the file chunks.

    • chunks='auto' will use dask auto chunking taking into account the file chunks.

    If you want to load data in memory directly, specify chunks=None. The default is {}.

    Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().

  • decode_cf (bool, optional) – Whether to decode the dataset. The default is False.

  • prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. THe default is True.

  • **kwargs (dict) – Additional keyword arguments passed to open_dataset() for each group.

Returns:

ds

Return type:

xarray.Dataset

gpm.dataset.granule.open_granule_datatree(filepath, scan_modes=None, groups=None, variables=None, decode_cf=True, chunks={}, prefix_group=False, **kwargs)[source][source]#

Create a lazy xarray.Dataset with relevant GPM data and attributes for a specific granule.

Parameters:
  • filepath (str) – Filepath of GPM granule dataset

  • scan_mode (str, optional) –

    Scan mode of the GPM product. The default is None. Use gpm.available_scan_modes(product, version) to get the available scan modes for a specific product. The radar products have the following scan modes:

    • 'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).

    • 'NS': Normal Scan. For Ku band and DPR (till version 6 products).

    • 'MS': Matched Scan. For Ka band and DPR (till version 6 products).

    • 'HS': High-sensitivity Scan. For Ka band and DPR.

  • variables (list, str, optional) – Variables to read from the HDF5 file. The default is None (all variables).

  • groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is None (all groups).

  • chunks (int, dict, str or None, optional) –

    Chunk size for dask array:

    • chunks=-1 loads the dataset with dask using a single chunk for all arrays.

    • chunks={} loads the dataset with dask using the file chunks.

    • chunks='auto' will use dask auto chunking taking into account the file chunks.

    If you want to load data in memory directly, specify chunks=None. The default is {}.

    Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking ds.compute().

  • decode_cf (bool, optional) – Whether to decode the dataset. The default is False.

  • prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. THe default is True.

  • **kwargs (dict) – Additional keyword arguments passed to open_dataset() for each group.

Returns:

ds

Return type:

xarray.Dataset

gpm.dataset.granule.remove_unused_var_dims(ds)[source][source]#

Remove coordinates and dimensions not used by the xarray.Dataset variables.

Exception made for nv, lonv, latv bounds dimensions.

gpm.dataset.granule.unused_var_dims(ds)[source][source]#

Retrieve the dimensions not used by the the xarray.Dataset variables.

gpm.dataset.groups_variables module#

This module contains functions to read GPM file groups, sub-groups and variables.

gpm.dataset.tcprimed module#

gpm.dataset.tcprimed.convert_passive_microwave(ds)[source][source]#
gpm.dataset.tcprimed.ensure_standard_longitude_values(da_lon)[source][source]#
gpm.dataset.tcprimed.open_granule_tcprimed(filepath, chunks={}, **kwargs)[source][source]#

Module contents#

This directory defines the GPM-API datasets.