gpm.dataset package#
Subpackages#
- gpm.dataset.decoding package
- Submodules
- gpm.dataset.decoding.cf module
- gpm.dataset.decoding.coordinates module
- gpm.dataset.decoding.dataarray_attrs module
- gpm.dataset.decoding.decode_1b_radar module
- gpm.dataset.decoding.decode_1c_pmw module
- gpm.dataset.decoding.decode_2a_pmw module
decode_airmassLiftIndex()decode_cloudWaterPath()decode_iceWaterPath()decode_pixelStatus()decode_precip1stTertial()decode_precip2ndTertial()decode_precipitationYesNoFlag()decode_product()decode_qualityFlag()decode_rainWaterPath()decode_sunGlintAngle()decode_surfacePrecipitation()decode_surfaceTypeIndex()
- gpm.dataset.decoding.decode_2a_radar module
decode_attenuationNP()decode_flagAnvil()decode_flagBB()decode_flagGraupelHail()decode_flagHail()decode_flagHeavyIcePrecip()decode_flagPrecip()decode_flagShallowRain()decode_flagSurfaceSnowfall()decode_heightBB()decode_landSurfaceType()decode_phase()decode_phaseNearSurface()decode_product()decode_qualityBB()decode_qualityFlag()decode_qualityTypePrecip()decode_reliabFlag()decode_snowIceCover()decode_widthBB()decode_zFactorMeasured()
- gpm.dataset.decoding.decode_2b_corra module
- gpm.dataset.decoding.decode_imerg module
decode_HQobservationTime()decode_HQprecipSource()decode_HQprecipitation()decode_IRinfluence()decode_IRkalmanFilterWeight()decode_IRprecipitation()decode_MWobservationTime()decode_MWprecipSource()decode_MWprecipitation()decode_precipitation()decode_precipitationCal()decode_precipitationQualityIndex()decode_precipitationUncal()decode_probabilityLiquidPrecipitation()decode_product()decode_randomError()
- gpm.dataset.decoding.routines module
- gpm.dataset.decoding.utils module
- Module contents
Submodules#
gpm.dataset.attrs module#
This module contains functions to parse GPM granule attributes.
- gpm.dataset.attrs.decode_attrs(attrs)[source][source]#
Decode GPM nested dictionary attributes from a xarray object.
gpm.dataset.conventions module#
This module contains functions to enforce CF-conventions into the GPM-API objects.
- gpm.dataset.conventions.add_gpm_api_product(ds, product)[source][source]#
Add gpm_api_product attribute to Dataset and DataArray variables.
- gpm.dataset.conventions.finalize_dataset(ds, product, decode_cf, scan_mode, start_time=None, end_time=None)[source][source]#
Finalize GPM xarray.Dataset object.
- gpm.dataset.conventions.reshape_dataset(ds)[source][source]#
Define the dataset dimension order.
It ensures that the output dimension order is (y, x) This shape is expected by i.e. pyresample and matplotlib For GPM GRID objects: (…, time, lat, lon) For GPM ORBIT objects: (cross_track, along_track, …)
gpm.dataset.coords module#
This module contains functions to extract the coordinates from GPM files.
- gpm.dataset.coords.get_coords_attrs_dict(ds)[source][source]#
Return relevant GPM coordinates attributes.
- gpm.dataset.coords.get_grid_coords(dt, scan_mode)[source][source]#
Get coordinates from Grid objects.
Set ‘time’ to the end of the accumulation period. Example: IMERG provide the average rain rate (mm/hr) over the half-hour period
NOTE: IMERG and GRID products does not have GranuleNumber!
gpm.dataset.crs module#
This module contains functions to define and create CF-compliant CRS.
- gpm.dataset.crs.compute_extent(x_coords, y_coords)[source][source]#
Compute the extent (x_min, x_max, y_min, y_max) from pixel centroids.
This function assumes that the spacing between each pixel is uniform. It takes into account the decreasing/increasing order of the coordinates.
The output extent format is the one expected by matplotlib and cartopy. Please note that the pyresample area_extent is [x_min, ymin, y_max, y_max]
- gpm.dataset.crs.compute_pyresample_area_extent(x_coords, y_coords)[source][source]#
Compute the pyresamnple area extent [x_min, ymin, y_max, y_max] from pixel centroids.
- gpm.dataset.crs.get_pyproj_crs(xr_obj)[source][source]#
Return a
pyproj.crs.CRSfrom CRS coordinate(s).If a geographic and projected CRS are present, it returns the projected.
This method is also available as property through the xarray accessor
gpm.pyproj_crs.- Parameters:
xr_obj (xarray.DataArray or xarray.Dataset) –
- Returns:
proj_crs
- Return type:
- gpm.dataset.crs.get_pyresample_area(xr_obj)[source][source]#
Define pyresample area from CF-compliant xarray.DataArray or xarray.Dataset.
To be used by the pyresample accessor: ds.pyresample.area
- gpm.dataset.crs.get_pyresample_projection(xr_obj)[source][source]#
Get pyresample AreaDefinition from CF-compliant xarray.DataArray or xarray.Dataset.
- gpm.dataset.crs.get_pyresample_swath(xr_obj)[source][source]#
Get pyresample SwathDefinition from CF-compliant xarray.DataArray or xarray.Dataset.
- gpm.dataset.crs.get_spatial_coordinates(xr_obj)[source][source]#
Return the xarray object x and y spatial coordinates.
- gpm.dataset.crs.get_x_coordinate(xr_obj)[source][source]#
Return the xarray object x spatial coordinate.
- gpm.dataset.crs.get_y_coordinate(xr_obj)[source][source]#
Return the xarray object y spatial coordinate.
- gpm.dataset.crs.remove_existing_crs_info(xr_obj)[source][source]#
Remove existing grid_mapping attributes.
- gpm.dataset.crs.set_dataset_crs(ds, crs, grid_mapping_name='spatial_ref', inplace=False)[source][source]#
Add CF-compliant CRS information to an xarray DataArray or Dataset.
If a xarray Dataset, it assumes all dataset variables have same CRS ! For projected CRS, it expects that the CRS dimension coordinates are specified. For swath dataset, it expects that the geographic coordinates are specified.
For projected CRS, if 2D latitude/longitude arrays are specified, it assumes they refer to the WGS84 CRS !
- Parameters:
ds (xarray.Dataset or xarray.DataArray) –
crs (pyproj.crs.CRS) – CRS information to be added to the xarray.Dataset
grid_mapping_name (str) – Name of the grid_mapping coordinate to store the CRS information The default is
spatial_ref. Other common names aregrid_mappingandcrs.
- Returns:
ds – Dataset or DataArray with CF-compliant CRS information.
- Return type:
- gpm.dataset.crs.set_dataset_single_crs(xr_obj, crs, grid_mapping_name='spatial_ref', inplace=False)[source][source]#
Add CF-compliant CRS information to an xarray.Dataset.
It assumes all dataset variables have same CRS ! For projected CRS, it expects that the CRS dimension coordinates are specified. For swath dataset, it expects that the geographic coordinates are specified.
- Parameters:
ds (xarray.Dataset) –
crs (pyproj.crs.CRS) – CRS information to be added to the xarray.Dataset
grid_mapping_name (str) – Name of the grid_mapping coordinate to store the CRS information The default is
spatial_ref. Other common names aregrid_mappingandcrs.
- Returns:
ds – Dataset with CF-compliant CRS information.
- Return type:
gpm.dataset.dataset module#
This module contains functions to read files into a GPM-API Dataset or DataTree.
- gpm.dataset.dataset.open_dataset(product, start_time, end_time, variables=None, groups=None, scan_mode=None, version=None, product_type='RS', chunks=-1, decode_cf=True, parallel=False, prefix_group=False, verbose=False, base_dir=None, **kwargs)[source][source]#
Lazily map HDF5 data into xarray.Dataset with relevant GPM data and attributes.
Note:
gpm.open_datasetdoes not load GPM granules with the FileHeader flag'EmptyGranule' != 'NOT_EMPTY'.The coordinates
QualityordataQualityprovide an overall quality flag status.The coordinate
SCorientationprovides the orientation of the sensor from the forward track of the satellite.
- Parameters:
product (str) – GPM product acronym.
start_time (datetime.datetime, datetime.date, numpy.datetime64 or str) – Start time. Accepted types:
datetime.datetime,datetime.date,numpy.datetime64orstr. If string type, it expects the isoformatYYYY-MM-DD hh:mm:ss.end_time (datetime.datetime, datetime.date, numpy.datetime64 or str) – End time. Accepted types:
datetime.datetime,datetime.date,numpy.datetime64orstr. If string type, it expects the isoformatYYYY-MM-DD hh:mm:ss.variables (list, str, optional) – Variables to read from the HDF5 file. The default is
None(all variables).groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is
None(all groups).scan_mode (str, optional) –
Scan mode of the GPM product. The default is
None. Usegpm.available_scan_modes(product, version)to get the available scan modes for a specific product. The radar products have the following scan modes:'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).'NS': Normal Scan. For Ku band and DPR (till version 6 products).'MS': Matched Scan. For Ka band and DPR (till version 6 products).'HS': High-sensitivity Scan. For Ka band and DPR.
product_type (str, optional) – GPM product type. Either
'RS'(Research) or'NRT'(Near-Real-Time). The default is'RS'.version (int, optional) – GPM version of the data to retrieve if
product_type = "RS". GPM data readers currently support version 4, 5, 6 and 7.chunks (int, dict, str or None, optional) –
Chunk size for dask array:
chunks=-1loads the dataset with dask using a single chunk for each granule arrays.chunks={}loads the dataset with dask using the file chunks.chunks='auto'will use daskautochunking taking into account the file chunks.
If you want to load data in memory directly, specify
chunks=None. The default isauto.Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking
ds.compute().decode_cf (bool, optional) – Whether to decode the dataset. The default is
False.prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. If you aim to save the Dataset to disk as netCDF or Zarr, you need to set
prefix_group=Falseor later remove the prefix before writing the dataset. The default isFalse.parallel (bool) – If
True, the dataset are opened in parallel usingdask.delayed.delayed. Ifparallel=True,'chunks'can not beNone. The underlying data must bedask.array.Array. The default isFalse.**kwargs (dict) – Additional keyword arguments passed to
open_dataset()for each group.
- Return type:
- gpm.dataset.dataset.open_datatree(product, start_time, end_time, variables=None, groups=None, scan_modes=None, version=None, product_type='RS', chunks=-1, decode_cf=True, parallel=False, prefix_group=False, verbose=False, base_dir=None, **kwargs)[source][source]#
Lazily map HDF5 data into xarray.DataTree objects with relevant GPM data and attributes.
Note:
gpm.open_datatreedoes not load GPM granules with the FileHeader flag'EmptyGranule' != 'NOT_EMPTY'.The coordinates
QualityordataQualityprovide an overall quality flag status.The coordinate
SCorientationprovides the orientation of the sensor from the forward track of the satellite.
- Parameters:
product (str) – GPM product acronym.
start_time (datetime.datetime, datetime.date, numpy.datetime64 or str) – Start time. Accepted types:
datetime.datetime,datetime.date,numpy.datetime64orstr. If string type, it expects the isoformatYYYY-MM-DD hh:mm:ss.end_time (datetime.datetime, datetime.date, numpy.datetime64 or str) – End time. Accepted types:
datetime.datetime,datetime.date,numpy.datetime64orstr. If string type, it expects the isoformatYYYY-MM-DD hh:mm:ss.variables (list, str, optional) – Variables to read from the HDF5 file. The default is
None(all variables).groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is
None(all groups).scan_modes (str, optional) –
Scan mode of the GPM product. If
None(the default), loads all scan modes. Usegpm.available_scan_modes(product, version)to see the available scan modes for a specific product. The radar products have the following scan modes:'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).'NS': Normal Scan. For Ku band and DPR (till version 6 products).'MS': Matched Scan. For Ka band and DPR (till version 6 products).'HS': High-sensitivity Scan. For Ka band and DPR.
product_type (str, optional) – GPM product type. Either
'RS'(Research) or'NRT'(Near-Real-Time). The default is'RS'.version (int, optional) – GPM version of the data to retrieve if
product_type = "RS". GPM data readers currently support version 4, 5, 6 and 7.chunks (int, dict, str or None, optional) –
Chunk size for dask array:
chunks=-1loads the dataset with dask using a single chunk for each granule arrays.chunks={}loads the dataset with dask using the file chunks.chunks='auto'will use daskautochunking taking into account the file chunks.
If you want to load data in memory directly, specify
chunks=None. The default isauto.Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking
ds.compute().decode_cf (bool, optional) – Whether to decode the dataset. The default is
False.prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. If you aim to save the Dataset to disk as netCDF or Zarr, you need to set
prefix_group=Falseor later remove the prefix before writing the dataset. The default isFalse.parallel (bool) – If
True, the dataset are opened in parallel usingdask.delayed.delayed. Ifparallel=True,'chunks'can not beNone. The underlying data must bedask.array.Array. The default isFalse.**kwargs (dict) – Additional keyword arguments passed to
open_datatree()for each group.
- Return type:
gpm.dataset.datatree module#
This module contains functions to read a GPM granule into a DataTree object.
- gpm.dataset.datatree.check_non_empty_granule(dt, filepath)[source][source]#
Check that the datatree (or dataset) is not empty.
- gpm.dataset.datatree.check_valid_granule(filepath)[source][source]#
Raise an explanatory error if the GPM granule is not readable.
- gpm.dataset.datatree.open_raw_datatree(filepath, chunks={}, decode_cf=False, use_api_defaults=True, **kwargs)[source][source]#
Open a GPM HDF5 file into a xarray.DataTree object with intuitive dimensions names.
- Parameters:
chunks (int, dict, str or None, optional) –
Chunk size for dask array:
chunks=-1loads the dataset with dask using a single chunk for each granule arrays.chunks={}loads the dataset with dask using the file chunks.chunks='auto'will use daskautochunking taking into account the file chunks.
If you want to load data in memory directly, specify
chunks=None. The default isauto.Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking
ds.compute().decode_cf (bool, optional) – Whether to decode the dataset. The default is
False.**kwargs (dict) – Additional keyword arguments passed to
open_dataset()for each group.
- Return type:
gpm.dataset.dimensions module#
This module contains functions to retrieve the dimensions associated to each GPM variable.
- gpm.dataset.dimensions.rename_dataarray_dimensions(da)[source][source]#
Rename xarray.DataArray dimensions.
- gpm.dataset.dimensions.rename_dataset_dimensions(ds, use_api_defaults=True)[source][source]#
Rename xarray.Dataset dimension to the actual dimension names.
The actual dimensions names are retrieved from the xarray.DataArrays DimensionNames attribute. The dimension renaming is performed at each Dataset level. If use_api_defaults is True (the default), it sets the GPM-API dimension names.
- gpm.dataset.dimensions.rename_datatree_dimensions(dt, use_api_defaults=True)[source][source]#
Rename xarray.DataTree dimension to the actual dimension names.
The actual dimensions names are retrieved from the xarray.DataArrays DimensionNames attribute. The renaming is performed at the xarray.DataArray level because DataArrays sharing same dimension size (but semantic different dimension) are given the same phony_dim_number within xarray.Dataset !
The dimension renaming is performed at each Dataset level. If
use_api_defaultsisTrue(the default), it sets the GPM-API dimension names.
gpm.dataset.granule module#
This module contains functions to read a single file into a GPM-API Dataset.
- gpm.dataset.granule.get_scan_modes_datasets(filepath, groups, variables, decode_cf, chunks, prefix_group, scan_modes=None, **kwargs)[source][source]#
Return a dictionary with a dataset for each scan mode.
- gpm.dataset.granule.get_variables_dims(ds)[source][source]#
Retrieve the dimensions used by the xarray.Dataset variables.
- gpm.dataset.granule.open_granule_dataset(filepath, scan_mode=None, groups=None, variables=None, decode_cf=True, chunks={}, prefix_group=False, **kwargs)[source][source]#
Create a lazy xarray.Dataset with relevant GPM data and attributes for a specific granule.
- Parameters:
filepath (str) – Filepath of GPM granule dataset
scan_mode (str, optional) –
Scan mode of the GPM product. The default is
None. Usegpm.available_scan_modes(product, version)to get the available scan modes for a specific product. The radar products have the following scan modes:'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).'NS': Normal Scan. For Ku band and DPR (till version 6 products).'MS': Matched Scan. For Ka band and DPR (till version 6 products).'HS': High-sensitivity Scan. For Ka band and DPR.
variables (list, str, optional) – Variables to read from the HDF5 file. The default is
None(all variables).groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is
None(all groups).chunks (int, dict, str or None, optional) –
Chunk size for dask array:
chunks=-1loads the dataset with dask using a single chunk for all arrays.chunks={}loads the dataset with dask using the file chunks.chunks='auto'will use daskautochunking taking into account the file chunks.
If you want to load data in memory directly, specify
chunks=None. The default is{}.Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking
ds.compute().decode_cf (bool, optional) – Whether to decode the dataset. The default is
False.prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. THe default is
True.**kwargs (dict) – Additional keyword arguments passed to
open_dataset()for each group.
- Returns:
ds
- Return type:
- gpm.dataset.granule.open_granule_datatree(filepath, scan_modes=None, groups=None, variables=None, decode_cf=True, chunks={}, prefix_group=False, **kwargs)[source][source]#
Create a lazy xarray.Dataset with relevant GPM data and attributes for a specific granule.
- Parameters:
filepath (str) – Filepath of GPM granule dataset
scan_mode (str, optional) –
Scan mode of the GPM product. The default is
None. Usegpm.available_scan_modes(product, version)to get the available scan modes for a specific product. The radar products have the following scan modes:'FS': Full Scan. For Ku, Ka and DPR (since version 7 products).'NS': Normal Scan. For Ku band and DPR (till version 6 products).'MS': Matched Scan. For Ka band and DPR (till version 6 products).'HS': High-sensitivity Scan. For Ka band and DPR.
variables (list, str, optional) – Variables to read from the HDF5 file. The default is
None(all variables).groups (list, str, optional) – HDF5 Groups from which to read all variables. The default is
None(all groups).chunks (int, dict, str or None, optional) –
Chunk size for dask array:
chunks=-1loads the dataset with dask using a single chunk for all arrays.chunks={}loads the dataset with dask using the file chunks.chunks='auto'will use daskautochunking taking into account the file chunks.
If you want to load data in memory directly, specify
chunks=None. The default is{}.Hint: xarray’s lazy loading of remote or on-disk datasets is often but not always desirable. Before performing computationally intense operations, load the dataset entirely into memory by invoking
ds.compute().decode_cf (bool, optional) – Whether to decode the dataset. The default is
False.prefix_group (bool, optional) – Whether to add the group as a prefix to the variable names. THe default is
True.**kwargs (dict) – Additional keyword arguments passed to
open_dataset()for each group.
- Returns:
ds
- Return type:
gpm.dataset.groups_variables module#
This module contains functions to read GPM file groups, sub-groups and variables.
gpm.dataset.tcprimed module#
Module contents#
This directory defines the GPM-API datasets.