gpm.utils package

gpm.utils.checks.get_slices_contiguous_scans(xr_obj, min_size=2, min_n_scans=3, x='lon', y='lat', along_track_dim='along_track', cross_track_dim='cross_track')[source][source]#

Return a list of slices ensuring contiguous scans (and granules).

It checks for contiguous scans only in the middle of the cross-track ! If a scan geolocation is NaN, it will be considered non-contiguous.

An input with less than 3 scans (along-track) returns an empty list, since scan contiguity can’t be verified. Consecutive non-contiguous scans are discarded and not included in the outputs. The minimum size of the output slices is 2.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – GPM xarray object.
min_size (int) – Minimum size for a slice to be returned.

Returns:

list_slices – List of slice object to select contiguous scans. Output format: [slice(start,stop), slice(start,stop),...]

Return type:

gpm.utils.checks.get_slices_non_contiguous_scans(xr_obj, x='lon', y='lat', along_track_dim='along_track', cross_track_dim='cross_track')[source][source]#

Return a list of slices where the scans discontinuity occurs.

An input with less than 2 scans (along-track) returns an empty list.

Parameters:: xr_obj (xarray.DataArray or xarray.Dataset) – GPM xarray object.
Returns:: list_slices – List of slice object to select discontiguous scans. Output format: [slice(start,stop), slice(start,stop),...]
Return type:: list

gpm.utils.checks.get_slices_non_regular_time(xr_obj, tolerance=None)[source][source]#

Return a list of slices where there are supposedly missing timesteps.

The output slices have size 2. An input with less than 2 scans (along-track) returns an empty list.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – GPM xarray object.
tolerance (numpy.timedelta64, optional) – The timedelta tolerance to define regular vs. non-regular timesteps. The default is None. If GPM GRID object, it uses the first 2 timesteps to derive the tolerance timedelta. If GPM ORBIT object, it uses the ORBIT_TIME_TOLERANCE. It is discouraged to use this function for GPM ORBIT objects !

Returns:

list_slices – List of slice object to select intervals with non-regular timesteps. Output format: [slice(start,stop), slice(start,stop),...]

Return type:

gpm.utils.checks.get_slices_non_valid_geolocation(xr_obj, x='lon', y='lat', along_track_dim='along_track', cross_track_dim='cross_track')[source][source]#

Return a list of GPM ORBIT along-track slices with non-valid geolocation.

The minimum size of the output slices is 2.

If at a given cross-track index, there are always wrong geolocation, it discards such cross-track index(es) before identifying the along-track slices.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – GPM xarray object.
min_size (int) – Minimum size for a slice to be returned. The default is 1.

Returns:

list_slices – List of slice object with non-valid geolocation. Output format: [slice(start,stop), slice(start,stop),...]

Return type:

gpm.utils.checks.get_slices_non_wobbling_swath(xr_obj, threshold=100, y='lat', along_track_dim='along_track', cross_track_dim='cross_track')[source][source]#

Return the GPM ORBIT along-track slices along which the swath is not wobbling.

For wobbling, we define the occurrence of changes in latitude directions in less than threshold scans. The function extract the along-track boundary on both swath sides and identify where the change in orbit direction occurs.

gpm.utils.checks.get_slices_regular(xr_obj, min_size=None, min_n_scans=3, x='lon', y='lat', along_track_dim='along_track', cross_track_dim='cross_track')[source][source]#

Return a list of slices to select regular GPM objects.

For GPM ORBITS, it returns slices to select contiguous scans with valid geolocation. For GPM GRID, it returns slices to select periods with regular timesteps.

For more information, read the documentation of: - gpm.utils.checks.get_slices_contiguous_scans - gpm.utils.checks.get_slices_regular_time

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – GPM xarray object.
min_size (int) – Minimum size for a slice to be returned. If None, default to 1 for GRID objects, 2 for ORBIT objects.
min_n_scans (int) – Minimum number of scans to be able to check for scan contiguity. For visualization purpose, this value might want to be set to 2. This parameter applies only to ORBIT objects.

Returns:

list_slices – List of slice object to select regular portions. Output format: [slice(start,stop), slice(start,stop),...]

Return type:

gpm.utils.checks.get_slices_regular_time(xr_obj, tolerance=None, min_size=1)[source][source]#

Return a list of slices ensuring timesteps to be regular.

Output format: [slice(start,stop), slice(start,stop),…]

Consecutive non-regular timesteps leads to slices of size 1. An xarray object with a single timestep leads to a slice of size 1. If min_size=1 (the default), such slices are returned.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – GPM xarray object.
tolerance (numpy.timedelta64, optional) – The timedelta tolerance to define regular vs. non-regular timesteps. The default is None. If GPM GRID object, it uses the first 2 timesteps to derive the tolerance timedelta. If GPM ORBIT object, it uses the ORBIT_TIME_TOLERANCE.
min_size (int) – Minimum size for a slice to be returned.

Returns:

list_slices – List of slice object to select regular time intervals. Output format: [slice(start,stop), slice(start,stop),...]

Return type:

gpm.utils.checks.get_slices_valid_geolocation(xr_obj, min_size=2, x='lon', y='lat', along_track_dim='along_track', cross_track_dim='cross_track')[source][source]#

Return a list of GPM ORBIT along-track slices with valid geolocation.

The minimum size of the output slices is 2.

If at a given cross-track index, there are always wrong geolocation, it discards such cross-track index(es) before identifying the along-track slices.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – GPM ORBIT xarray object.
min_size (int) – Minimum size for a slice to be returned. The default is 2.

Returns:

list_slices – List of slice object with valid geolocation. Output format: [slice(start,stop), slice(start,stop),...]

Return type:

gpm.utils.checks.get_slices_var_between(da, dim, vmin=-inf, vmax=inf, criteria='all')[source][source]#

Return a list of slices along the dim dimension where values are between the interval.

If the xarray.DataArray has additional dimensions, the criteria parameter is used to determine whether all values within each slice index must be between the interval (if set to "all") or if at least one value within the slice index must be between the interval (if set to "any").

gpm.utils.checks.get_slices_var_equals(da, dim, values, union=True, criteria='all')[source][source]#

Return a list of slices along the dim dimension where values occurs.

The function is applied recursively to each value in values. If the xarray.DataArray has additional dimensions, the “criteria” parameter is used to determine whether all values within each slice index must be equal to value (if set to "all") or if at least one value within the slice index must be equal to value (if set to "any").

If values are a list of values: - if union=True, it return slices corresponding to the sequence of consecutive values. - if union=False, it return slices for each value in values.

If union=False [0,0, 1, 1] with values=[0,1] will return [slice(0,2), slice(2,4)] If union=True [0,0, 1, 1] with values=[0,1] will return [slice(0,4)]

union matters when multiple values are specified criteria matters when the xarray.DataArray has multiple dimensions.

gpm.utils.checks.get_slices_wobbling_swath(xr_obj, threshold=100, y='lat', cross_track_dim='cross_track', along_track_dim='along_track')[source][source]#

Return the GPM ORBIT along-track slices along which the swath is wobbling.

For wobbling, we define the occurrence of changes in latitude directions in less than threshold scans. The function extract the along-track boundary on both swath sides and identify where the change in orbit direction occurs.

gpm.utils.checks.has_contiguous_granules(xr_obj)[source][source]#

Checks GPM object is composed of consecutive granules.

For ORBIT objects, it checks the gpm_granule_id. For GRID objects, it checks timesteps regularity.

gpm.utils.checks.has_contiguous_scans(xr_obj, x='lon', y='lat', along_track_dim='along_track', cross_track_dim='cross_track')[source][source]#

Return True if all scans are contiguous. False otherwise.

This functions also works with nadir-only looking orbit.

gpm.utils.checks.has_missing_granules(xr_obj)[source][source]#

Checks GPM object has missing granules.

For ORBIT objects, it checks the gpm_granule_id. For GRID objects, it checks timesteps regularity.

gpm.utils.checks.has_regular_time(xr_obj)[source][source]#: Return True if all timesteps are regular. False otherwise.

gpm.utils.checks.has_valid_geolocation(xr_obj, x='lon', y='lat', along_track_dim='along_track', cross_track_dim='cross_track')[source][source]#: Checks GPM object has valid geolocation.

gpm.utils.checks.is_regular(xr_obj)[source][source]#

Checks the GPM object is regular.

For GPM ORBITS, it checks that the scans are contiguous. For GPM GRID, it checks that the timesteps are regularly spaced.

gpm.utils.collocation module#

This module contains utilities for GPM product collocation.

gpm.utils.collocation.collocate_product(ds, product, product_type='RS', version=None, storage='GES_DISC', scan_modes=None, variables=None, groups=None, verbose=True, decode_cf=True, chunks={})[source][source]#

Collocate a product on the provided dataset.

It assumes that along all the input dataset, there is an approximate collocated product.

gpm.utils.collocation.preprocess_datatree(dt, exclude_vars=None, fixed_vars=None)[source][source]#

Preprocess DataTree for remapping by handling variables consistently.

Parameters:

dt (xarray.DataTree) – DataTree with multiple scan modes.
exclude_vars (list, optional) – Variables to exclude from processing.
fixed_vars (list, optional) – Variables to preserve as-is.

Returns:

Preprocessed DataTree ready for remapping.

Return type:

xarray.DataTree

gpm.utils.collocation.regrid_pmw_l1(dt, scan_mode_reference='S1', radius_of_influence=20000)[source][source]#

Regrid the scan modes of a PMW Level 1 product into a common grid.

Parameters:

dt (xarray.DataTree) – DataTree containing multiple scan modes (nodes).
scan_mode_reference (str, optional) – The scan mode/node with the spatial coordinates to use as reference grid.

Returns:

The collocated dataset, with PMW channels concatenated along a ‘pmw_frequency’ dimension.

Return type:

See https://distributed.dask.org/en/latest/worker-memory.html#manually-trim-memory

gpm.utils.collocation.remap_era5(ds, variables)[source][source]#: Remap ERA5 variables onto the input dataset using nearest neighbour.

gpm.utils.dask module#

This module contains utilities for Dask Distributed processing.

gpm.utils.dask.clean_memory(client)[source][source]#

Call the garbage collector on each process.

gpm.utils.dask.close_dask_cluster(cluster, client)[source][source]#: Close Dask Cluster.

gpm.utils.dask.get_client()[source][source]#

gpm.utils.dask.get_scheduler(get=None, collection=None)[source][source]#

Determine the dask scheduler that is being used.

None is returned if no dask scheduler is active.

See also

dask.base.get_scheduler

gpm.utils.dask.initialize_dask_cluster(minimum_memory=None)[source][source]#: Initialize Dask Cluster.

gpm.utils.dask.trim_memory() → int[source][source]#

gpm.utils.dataframe module#

This module contains general utility to convert xarray objects to dataframes.

gpm.utils.dataframe.compute_2d_histogram(df, x, y, var=None, x_bins=10, y_bins=10, x_labels=None, y_labels=None, prefix_name=True)[source][source]#

Compute bivariate statistics.

Parameters:

df (pandas.DataFrame) – Input dataframe
x (str) – Column name for x-axis binning (will be rounded to integers)
y (str) – Column name for y-axis binning
var (str, optional) – Column name for which statistics will be computed. If None, only counts are computed.
x_bins (int or array-like) – Number of bins or bin edges for x
y_bins (int or array-like) – Number of bins or bin edges for y
x_labels (array-like, optional) – Labels for x bins. If None, uses bin centers
y_labels (array-like, optional) – Labels for y bins. If None, uses bin centers

Returns:

Dataset with dimensions corresponding to binned variables and data variables for each statistic

Return type:

gpm.utils.dataframe.drop_undesired_columns(df)[source][source]#: Drop undesired columns like dataset dimensions without coordinates.

gpm.utils.dataframe.ensure_pyarrow_string_columns(df)[source][source]#: Convert ‘object’ type columns to pyarrow strings.

gpm.utils.dataframe.get_df_object_columns(df)[source][source]#: Get the dataframe columns which have ‘object’ type.

gpm.utils.dataframe.to_dask_dataframe(ds)[source][source]#: Convert an xarray.Dataset to a dask.dataframe.DataFrame.

gpm.utils.dataframe.to_pandas_dataframe(ds, drop_index=True)[source][source]#: Convert an xarray.Dataset to a pandas.DataFrame.

gpm.utils.decorators module#

This module contains functions decorators checking GPM-API object type.

gpm.utils.decorators.check_has_along_track_dimension(function)[source][source]#

Check that the along-track dimension is available.

If not available, raise an error.

gpm.utils.decorators.check_has_cross_track_dimension(function)[source][source]#

Check that the cross-track dimension is available.

If not available, raise an error.

gpm.utils.decorators.check_is_gpm_object(function)[source][source]#: Decorator function to check if input is a GPM object. Raise ValueError if not.

gpm.utils.decorators.check_is_grid(function)[source][source]#: Decorator function to check if input is a GPM GRID object. Raise ValueError if not.

gpm.utils.decorators.check_is_orbit(function)[source][source]#: Decorator function to check if input is a GPM ORBIT object. Raise ValueError if not.

gpm.utils.decorators.check_software_availability(software, conda_package)[source][source]#

A decorator to ensure that a software package is installed.

Parameters:

software (str) – The package name as recognized by Python’s import system.
conda_package (str) – The package name as recognized by conda-forge.

gpm.utils.directories module#

This module contains functions to search files and directories into the local machine.

gpm.utils.directories.check_glob_pattern(pattern: str) → None[source][source]#

Check if glob pattern is a string and is a valid pattern.

Parameters:: pattern (str) – String to be checked.

gpm.utils.directories.check_glob_patterns(patterns: str | list) → list[source][source]#: Check if glob patterns are valids.

gpm.utils.directories.get_filepaths_by_path(paths, parallel=True, file_extension=None, glob_pattern=None, regex_pattern=None)[source][source]#: Return a dictionary with the files within each directory path matching the filename filtering criteria.

gpm.utils.directories.get_filepaths_within_paths(paths, parallel=True, file_extension=None, glob_pattern=None, regex_pattern=None)[source][source]#: Return a list with all filepaths within a list of directories matching the filename filtering criteria.

gpm.utils.directories.get_first_file(directory)[source][source]#: Retrieve filepath of first file inside a directory.

gpm.utils.directories.get_parallel_dict_results(function, inputs, **kwargs)[source][source]#

gpm.utils.directories.get_parallel_list_results(function, inputs, **kwargs)[source][source]#

gpm.utils.directories.get_subdirectories(base_dir, path=True)[source][source]#: Return the name or path of the directories present in the input directory.

gpm.utils.directories.list_and_filter_files(path, file_extension=None, glob_pattern=None, regex_pattern=None, sort=True)[source][source]#: Retrieve list of files (filtered by extension and custom patterns).

gpm.utils.directories.list_directories(dir_path, glob_pattern='*', recursive=False, skip_hidden=True, return_paths=True)[source][source]#: Return a list of directory paths (exclude file paths).

gpm.utils.directories.list_files(dir_path, glob_pattern='*', recursive=False, skip_hidden=True, return_paths=True)[source][source]#: Return a list of filepaths (exclude directory paths).

gpm.utils.directories.list_paths(dir_path, glob_pattern, recursive=False, skip_hidden=True)[source][source]#

Return a list of filepaths and directory paths.

This function accept also a list of glob patterns !

gpm.utils.directories.match_extension(filename, extension=None)[source][source]#

gpm.utils.directories.match_filters(filename, file_extension=None, glob_pattern=None, regex_pattern=None)[source][source]#

gpm.utils.directories.match_glob_pattern(filename, pattern=None)[source][source]#

gpm.utils.directories.match_regex_pattern(filename, pattern=None)[source][source]#

gpm.utils.directories.search_leaf_directories(base_dir, parallel=True, remove_base_path=True)[source][source]#: Search leaf directories.

gpm.utils.directories.search_leaf_files(base_dir, parallel=True, file_extension=None, glob_pattern=None, regex_pattern=None)[source][source]#: Search files in leaf directories.

gpm.utils.events module#

This module contains utility to define events.

gpm.utils.events.get_event_slices(indices, neighbor_min_size, neighbor_interval, intra_event_max_distance)[source][source]#

gpm.utils.events.group_indices_into_events(indices, intra_event_max_distance)[source][source]#

Group indices into events based on intra_event_max_distance.

Parameters:

indices (array-like) – Sorted array of valid indices. Accept also datetime64 arrays.
intra_event_max_distance (int or numpy.timedelta64) – Maximum distance allowed between consecutive indices for them to be considered part of the same event. If indices are datetime64 arrays, specify intra_event_max_distance as numpy.timedelta64.

Returns:

A list of events, where each event is an array of indices.

Return type:

list of numpy.ndarray

gpm.utils.events.remove_isolated_indices(indices, neighbor_min_size, neighbor_interval)[source][source]#

Remove isolated indices that do not have enough neighboring indices within a specified time gap.

An index is considered isolated (and thus removed) if it does not have at least neighbor_min_size other indices within the neighbor_interval before or after it. In other words, for each index, we look for how many other indices fall into the index neighborhood defined as [index - neighbor_interval, index + neighbor_interval], excluding it itself. If the count of such neighbors is less than neighbor_min_size, that index is removed.

Parameters:

indices (array-like of numpy.datetime64) – Sorted or unsorted array of indices.
neighbor_interval (int or numpy.timedelta64) – The size of the neighborhood. Only indices that fall in the [index - neighbor_interval, index + neighbor_interval] are considered neighbors.
neighbor_min_size (int, optional) – The minimum number of indices required to fall into the neighborhood for an index to be considered non-isolated. - If neighbor_min_size=0, then no index is considered isolated and no filtering occurs. - If `neighbor_min_size=1, the index must have at least another index within the neighborhood.. - If neighbor_min_size=2, the index must have at least two other indices within the neighborhood. Defaults to 1.

Returns:

Array of indices with isolated entries removed.

Return type:

numpy.ndarray

gpm.utils.geospatial module#

This module contains functions for geospatial processing.

class gpm.utils.geospatial.Extent(xmin, xmax, ymin, ymax)[source]#

Bases: tuple

Create new instance of Extent(xmin, xmax, ymin, ymax)

xmax#: Alias for field number 1

xmin#: Alias for field number 0

ymax#: Alias for field number 3

ymin#: Alias for field number 2

gpm.utils.geospatial.adjust_extent(extent, size)[source][source]#

Adjust the extent to have the desired size.

Parameters:

extent (tuple) – A tuple of four values representing the extent. The extent format must be [xmin, xmax, ymin, ymax].
size (int, float, tuple, list) – The size in degrees of the extent in each direction. If size is a single number, the same size is ensured in all directions. If size is a tuple or list, it must of size 2 and specifying the desired size of the extent in the x direction and the y direction.

Returns:

The adjusted extent.

Return type:

gpm.utils.geospatial.adjust_geographic_extent(extent, size)[source][source]#

Adjust the extent to have the desired size.

Parameters:

extent (tuple) – A tuple of four values representing the lat/lon extent. The extent format must be [xmin, xmax, ymin, ymax].
size (int, float, tuple, list) – The size in degrees of the extent in each direction. If size is a single number, the same size is ensured in all directions. If size is a tuple or list, it must of size 2 and specifying the desired size of the extent in the x direction (longitude) and the y direction (latitude).

Returns:

The adjusted extent.

Return type:

gpm.utils.geospatial.check_extent(extent)[source][source]#

Validates the extent to ensure it has the correct format and logical consistency.

Note: this function does not check for the realism of extent values !

Parameters:: extent (list or tuple) – The extent specified as [xmin, xmax, ymin, ymax].
Returns:: extent
Return type:: tuple

gpm.utils.geospatial.crop(xr_obj, extent)[source][source]#

Crop a xarray object based on the provided bounding box.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
extent (list or tuple) – The bounding box over which to crop the xarray object. extent must follow the matplotlib and cartopy extent conventions: extent = [x_min, x_max, y_min, y_max]

Returns:

xr_obj – Cropped xarray object.

Return type:

gpm.utils.geospatial.crop_around_point(xr_obj, lon: float, lat: float, distance=None, size=None)[source][source]#

Crop an xarray object around a point.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
lon (float) – Longitude of the point.
lat (float) – Latitude of the point.
distance (float) – Distance (in meters) from the point in each direction.
size (int, float, tuple, list) – The size in degrees of the extent in each direction. If size is a single number, the same size is ensured in all directions. If size is a tuple or list, it must of size 2 and specifying the desired size of the extent in the x direction (longitude) and the y direction (latitude).

Returns:

xr_obj – Cropped xarray object.

Return type:

gpm.utils.geospatial.crop_by_continent(xr_obj, name: str)[source][source]#

Crop an xarray object based on the specified continent name.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
name (str) – Continent name.

Returns:

xr_obj – Cropped xarray object.

Return type:

gpm.utils.geospatial.crop_by_country(xr_obj, name: str)[source][source]#

Crop an xarray object based on the specified country name.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
name (str) – Country name.

Returns:

xr_obj – Cropped xarray object.

Return type:

gpm.utils.geospatial.extend_extent(extent, padding: int | float | tuple | list = 0)[source][source]#

Extend the extent by padding in every direction.

Parameters:

extent (tuple) – A tuple of four values representing the extent. The extent format must be [xmin, xmax, ymin, ymax].
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as x and y padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom).

Returns:

The extended extent.

Return type:

gpm.utils.geospatial.extend_geographic_extent(extent, padding: int | float | tuple | list = 0)[source][source]#

Extend the lat/lon extent by x degrees in every direction.

Parameters:

extent (tuple) – A tuple of four values representing the lat/lon extent. The extent format must be [xmin, xmax, ymin, ymax].
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom).

Returns:

The extended extent.

Return type:

gpm.utils.geospatial.get_circle_coordinates_around_point(lon, lat, radius, num_vertices=360)[source][source]#

Get the coordinates of a circle with custom radius around a point.

Parameters:

lon (float) – Longitude of the point.
lat (float) – Latitude of the point.
radius (float) – Radius (in meters) around the point.
num_vertices (int, optional) – Number of circle coordinates to return. The default is 360.

Returns:

lons (numpy.ndarray) – Longitude vertices of the circle around the point.
lats (numpy.ndarray) – Latitude vertices of the circle around the point.

gpm.utils.geospatial.get_continent_extent(name: str, padding: int | float | tuple | list = 0)[source][source]#

Retrieves the extent of a continent.

Parameters:

name (str) – The name of the continent.
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). Default is 0.

Returns:

extent – A tuple containing the longitude and latitude extent of the continent.

Return type:

Raises:

TypeError – If the continent name is not provided as a string.
ValueError – If the provided continent name is not valid or does not match any continent. If a similar continent name is found and suggested as a possible match.

gpm.utils.geospatial.get_country_extent(name, padding=0.2)[source][source]#

Retrieves the extent of a country.

Parameters:

name (str) – The name of the country.
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). Default is 0.2.

Returns:

extent – A tuple containing the longitude and latitude extent of the country.

Return type:

Raises:

TypeError – If the country name is not provided as a string.
ValueError – If the country name is not valid or if there is no matching country.

Notes

This function retrieves the extent of a country from a dictionary of country extents. The country extent is defined as the longitude and latitude range that encompasses the country’s borders. The extent is returned as a tuple of four values: (xmin, xmax, ymin, ymax). The extent can be optionally padded by specifying the padding parameter.

gpm.utils.geospatial.get_crop_slices_around_point(xr_obj, lon: float, lat: float, distance=None, size=None)[source][source]#

Compute the xarray object slices which are within the specified distance from a point.

If the input is a GPM Orbit, it returns a list of along-track slices. If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
lon (float) – Longitude of the point.
lat (float) – Latitude of the point.
distance (float) – Distance (in meters) from the point in each direction.
size (int, float, tuple, list) – The size in degrees of the extent in each direction. If size is a single number, the same size is ensured in all directions. If size is a tuple or list, it must of size 2 and specifying the desired size of the extent in the x direction (longitude) and the y direction (latitude).

Returns:

xr_obj – Cropped xarray object.

Return type:

gpm.utils.geospatial.get_crop_slices_by_continent(xr_obj, name)[source][source]#

Compute the xarray object slices which are within the specified continent.

If the input is a GPM Orbit, it returns a list of along-track slices. If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
name (str) – Continent name.

gpm.utils.geospatial.get_crop_slices_by_country(xr_obj, name)[source][source]#

Compute the xarray object slices which are within the specified country.

If the input is a GPM Orbit, it returns a list of along-track slices. If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
name (str) – Country name.

gpm.utils.geospatial.get_crop_slices_by_extent(xr_obj, extent)[source][source]#

Compute the xarray object slices which are within the specified extent.

If the input is a GPM Orbit, it returns a list of along-track slices If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
extent (list or tuple) – The extent over which to crop the xarray object. extent must follow the matplotlib and cartopy conventions: extent = [x_min, x_max, y_min, y_max]

gpm.utils.geospatial.get_extent_around_point(x, y, distance=None, size=None)[source][source]#

Get the extent around a point.

Either specify distance or the wished extent size (in the unit of the extent).

Parameters:

x (float) – X coordinate of the point.
y (float) – Y coordinate of the point.
distance (float) – Distance from the point in each direction.
size (int, float, tuple, list) – The size of the extent in each direction. If size is a single number, the same size is ensured in all directions. If size is a tuple or list, it must of size 2 and specifying the desired size of the extent in the x direction and the y direction.

Returns:

The adjusted extent.

Return type:

gpm.utils.geospatial.get_geodesic_line(start_point, end_point, steps, geod=None)[source][source]#

Construct a geodesic path between two points.

This function acts as a wrapper for the geodesic construction available in pyproj.

Parameters:

start_point (tuple) – A longitude-latitude pair designating the start point of the cross section (units are degrees east and degrees north).
end_point (tuple) – A longitude-latitude pair designating the end point of the cross section (units are degrees east and degrees north).
steps (int, optional) – The number of points along the geodesic between the start and the end point (including the end points) to use in the cross section.

Returns:

The list of x, y points in the given CRS of length steps along the geodesic.

Return type:

numpy.ndarray

gpm.utils.geospatial.get_geographic_extent_around_point(lon, lat, distance=None, size=None)[source][source]#

Get the geographic extent around a point.

Either specify distance (in meters) or the wished extent size (in degrees).

NOTE: this function is not yet designed to define an extent when the area of interest would cross the antimeridian or the poles.

Parameters:

lon (float) – Longitude of the point.
lat (float) – Latitude of the point.
distance (float) – Distance (in meters) from the point in each direction.
size (int, float, tuple, list) – The size in degrees of the extent in each direction. If size is a single number, the same size is ensured in all directions. If size is a tuple or list, it must of size 2 and specifying the desired size of the extent in the x direction (longitude) and the y direction (latitude).

Returns:

The adjusted extent.

Return type:

Get the geographic extent from an xarray object.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object.
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). The default is 0.
size (int, float, tuple, list) – The desired size in degrees of the extent in each direction. If size is a single number, the same size is enforced in all directions. If size is a tuple or list, it must of size 2 and specify the desired size of the extent in the x direction (longitude) and the y direction (latitude). The default is None.

Returns:

extent – A tuple containing the longitude and latitude extent of the xarray object. The extent follows the matplotlib/cartopy format (xmin, xmax, ymin, ymax).

Return type:

gpm.utils.geospatial.get_great_circle_arc_endpoints(point, azimuth, distance)[source][source]#

Get great circle arc vertices.

Calculate two points at a given distance from a central point in both the specified azimuth direction and its opposite direction along the great circle path.

Parameters:

point (tuple of float) – A tuple representing the middle point (longitude, latitude) of the great circle arc.
azimuth (float) – The azimuth (in degrees) from the starting point. 0 correspond to the North. 180 to the South. The opposite direction will be automatically calculated as (azimuth + 180) % 360.
distance (float) – The distance (in meters) to the points from the center point.

Returns:

start_point (tuple of float) – The point (longitude, latitude) at the specified distance in the given azimuth direction.
end_point (tuple of float) – The point (longitude, latitude) at the specified distance in the opposite azimuth direction.

Examples

>>> point = (-74.0060, 40.7128)  # New York City
>>> azimuth = 90  # East
>>> distance = 100000  # 100 km
>>> get_great_circle_arc_endpoints(point, azimuth, distance)
((-72.54170804504108, 40.65355582184445), (-75.47074533517052, 40.77179828472569))

gpm.utils.geospatial.merge_extents(list_extent)[source][source]#: Return the outer extent of a list of extents.

gpm.utils.geospatial.read_continents_extent_dictionary()[source][source]#

Read and return a dictionary containing the extents of continents.

Returns:: A dictionary containing the extents of continents.
Return type:: dict

gpm.utils.geospatial.read_countries_extent_dictionary()[source][source]#

Reads a YAML file containing countries extent information and returns it as a dictionary.

Returns:: A dictionary containing countries extent information.
Return type:: dict

gpm.utils.geospatial.unwrap_longitude_degree(x, period=360)[source][source]#: Unwrap longitude array.

gpm.utils.list module#

This module contains functions for list processing.

gpm.utils.list.flatten_list(nested_list)[source][source]#: Flatten a nested list into a single-level list.

gpm.utils.manipulations module#

This module contains functions for manipulating GPM-API Datasets.

gpm.utils.manipulations.conversion_factors_degree_to_meter(latitude, earth_radius=None)[source][source]#

Calculate conversion factors from degrees to meters as a function of latitude.

Parameters:: latitude (numpy.ndarray) – Latitude in degrees where the conversion is needed
Returns:: (cx, cy) – Tuple containing conversion factors for longitude and latitude
Return type:: tuple

gpm.utils.manipulations.convert_from_decibel(da)[source][source]#: Convert dB to unit.

gpm.utils.manipulations.convert_to_decibel(da)[source][source]#: Convert unit to dB.

gpm.utils.manipulations.crop_around_valid_data(xr_obj, variable=None)[source][source]#

Return a sub-region of the specified DataArray containing all the non-NaN values.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – A xarray object to crop around valid data (of variable).
variable (str, optional) – Name of the variable to use to crop the dataset. Only to be specified if xr_obj is a xr.Dataset

Returns:

Cropped DataArray such that NaN-only outer rows/columns are removed.

Return type:

gpm.utils.manipulations.define_transect_isel_dict(xr_obj, point, dim)[source][source]#: Define the isel dictionary required to extract a transect along the specified dimension.

gpm.utils.manipulations.ensure_vertical_datarray_prototype(da)[source][source]#: Return a xarray.DataArray with only spatial and vertical dimensions.

gpm.utils.manipulations.extract_at_points(xr_obj, points, method='nearest', new_dim='points')[source][source]#

Extract values at a set of points.

This routine is useful particularly useful to extract values observed close to meteorological stations or along a trajectory.

You could also exploit this function to “nearest-neighbour” remapping values to another 2D grid/orbit if you stack such object, pass the coordinates to this function and then unstack. However for this last application, it is better to use the remap function.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – Dataset or DataArray from which to extract values at points.
points (numpy.ndarray) – An array of shape (N, 2) with the lon, lat points at which to interpolate the data.
method (str, optional) – The interpolation method. The default method is 'nearest'. If input data have 2D-coordinates, only 'nearest' method is implemented. If input data have 1D-coordinates, See xarray.DataArray.interp for other methods.
new_dim (str, optional) – The name of the new points dimension. Defaults to “points”.

Returns:

The values at the specified points.

Return type:

gpm.utils.manipulations.extract_dataset_above_bin(ds, bins, new_range_size=None, strict=False, reverse=False)[source][source]#

Extract a radar dataset with the range bins above the <bins> index.

If reverse=False, the new last range bin corresponds to the <bins> index. If reverse=True, the new first range bin corresponds to the <bins> index.

Parameters:

ds (xarray.Dataset) – GPM RADAR xarray dataset.
bin_variable (str) – The variable name containing the radar gate bin index of interest. GPM bin variables are assumed to start at 1, not 0!
new_range_size (int, optional) – If specified, the size of the new range dimension. The dataset is shortened along the range dimension (from the top). The default is None.
strict (bool, optional) – If True, it extract only radar gates above the bin index. If False, it extract also the radar gate at the bin index. The default is False.
reverse (bool, optional) – If False (the default), the last range bin corresponds to the <bins> index. If True, the first range bin corresponds to the <bins> index.

Returns:

ds – xarray dataset with the range bins above the specified bin.

Return type:

gpm.utils.manipulations.extract_dataset_below_bin(ds, bins, new_range_size=None, strict=False, reverse=False)[source][source]#

Extract a radar dataset with the range bins below the <bins> index.

If reverse=False, the new first range bin corresponds to the <bins> index. If reverse=True, the last range bin corresponds to the <bins> index.

Parameters:

ds (xarray.Dataset) – GPM RADAR xarray dataset.
bins (str) – The variable name containing the radar gate bin index of interest. GPM bin variables are assumed to start at 1, not 0!
new_range_size (int, optional) – If specified, the size of the new range dimension. The dataset is shortened along the range dimension (from the top). The default is None.
strict (bool, optional) – If True, it extract only radar gates above the bin index. If False, it extract also the radar gate at the bin index. The default is False.
reverse (bool, optional) – If False (the default), the new first range bin corresponds to the <bins> index. If `True, the last range bin corresponds to the <bins> index.

Returns:

ds – xarray dataset with the range bins below the specified bin.

Return type:

gpm.utils.manipulations.extract_l2_dataset(ds, bin_ellipsoid='binEllipsoid', shortened_range=True, new_range_size=None)[source][source]#

Returns the radar dataset with the last range bin corresponding to the ellipsoid (as in L2 products).

After extraction, ‘echoLowResBinNumber’ and ‘echoHighResBinNumber’ make no sense anymore. Retrieve ‘sampling_type’ before extraction !

Parameters:

ds (xarray.Dataset) – GPM RADAR L1B xarray dataset.
bin_ellipsoid (str, optional) – The variable name containing the bin index of the ellipsoid. The default is binEllipsoid.
shortened_range (bool, optional) – Whether to shorten the range dimension of the dataset. This procedure is applied to generate the L2 products. The default is True. Note that the range is also shortened if new_range_size is specified.
new_range_size (int, optional) – The size of the new range dimension. If shortened_range=True and new_range_size=None, new_range_size``takes the default values employed by the L2 PRE module. The default values are ``176 for Ku and 88 for Ka. The default is None.

Returns:

ds – xarray dataset with the last range bin corresponding to the ellipsoid.

Return type:

gpm.utils.manipulations.extract_transect_along_dimension(xr_obj, point, dim)[source][source]#

Extract a transect along the specified spatial dimension passing through the specified location.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – Dataset or DataArray from which extract a transect.
point (tuple of float) – A tuple representing the middle point (longitude, latitude) of the great circle arc.
dim (str) – The desired spatial dimension of the transect.

Returns:

The transect object with spatial dimension dim.

Return type:

gpm.utils.manipulations.extract_transect_around_point(xr_obj, point, azimuth, distance, steps=100, method='linear', new_dim='transect')[source][source]#

Extract a transect following the great circle arc centered on the specified point.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – Dataset or DataArray from which extract a transect.
point (tuple of float) – A tuple representing the middle point (longitude, latitude) of the great circle arc.
azimuth (float) – The azimuth (in degrees) from the starting point. 0 correspond to the North. 180 to the South. The opposite direction will be automatically calculated as (azimuth + 180) % 360.
distance (float) – The distance (in meters) to the points from the center point.
steps (int, optional) – The number of points along the geodesic between the start and the end point (including the end points) to use in the cross section. Defaults to 100.
method (str, optional) – The interpolation method, either 'linear' or 'nearest'. If input data have 2D-coordinates, only 'nearest' method is implemented. If input data have 1D-coordinates, the default method is 'linear'. See xarray.DataArray.interp for other methods.
new_dim (str, optional) – The name of the new transect dimension. Defaults to “transect”.

Returns:

The transect object, with the new_dim dimension (of size steps).

Return type:

gpm.utils.manipulations.extract_transect_at_points(xr_obj, points, method='linear', new_dim='transect')[source][source]#

Obtain an transect through a series of points.

It allows to extract data along a custom curvilinear track / trajectory.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – Dataset or DataArray from which extract a transect.
points (numpy.ndarray) – An array of shape (N, 2) with the lon, lat points at which to interpolate the data.
method (str, optional) – The interpolation method, either 'linear' or 'nearest'. If input data have 2D-coordinates, only 'nearest' method is implemented. If input data have 1D-coordinates, the default method is 'linear'. See xarray.DataArray.interp for other methods.
new_dim (str, optional) – The name of the new transect dimension. Defaults to “transect”.

Returns:

The transect object, with the new_dim dimension (of size N).

Return type:

gpm.utils.manipulations.extract_transect_between_points(xr_obj, start_point, end_point, steps=100, method='linear', new_dim='transect')[source][source]#

Extract an interpolated transect between two points on a sphere.

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – Dataset or DataArray from which extract a transect.
start_point (tuple) – A longitude-latitude pair designating the start point of the cross section (units are degrees east and degrees north).
end_point (tuple) – A longitude-latitude pair designating the end point of the cross section (units are degrees east and degrees north).
steps (int, optional) – The number of points along the geodesic between the start and the end point (including the end points) to use in the cross section. Defaults to 100.
method (str, optional) – The interpolation method, either 'linear' or 'nearest'. If input data have 2D-coordinates, only 'nearest' method is implemented. If input data have 1D-coordinates, the default method is 'linear'. See xarray.DataArray.interp for other methods.
new_dim (str, optional) – The name of the new transect dimension. Defaults to “transect”.

Returns:

The transect object, with the new_dim dimension (of size steps).

Return type:

gpm.utils.manipulations.get_bin_dataarray(xr_obj, bins, mask_first_bin=False, mask_last_bin=False, fillvalue=None)[source][source]#: Get bin xarray.DataArray.

gpm.utils.manipulations.get_bright_band_mask(ds)[source][source]#

Retrieve bright band mask defined by binBBBottom and binBBTop bin variables.

The bin is numerated from top to bottom. binBBTop has lower values than binBBBottom. binBBBottom and binBBTop are NaN when bright band limit is not detected !

gpm.utils.manipulations.get_height_at_bin(xr_obj, bins)[source][source]#: Retrieve height values at range bins specified by bins.

gpm.utils.manipulations.get_height_at_temperature(da_height, da_temperature, temperature)[source][source]#: Retrieve height at a specific temperature.

gpm.utils.manipulations.get_height_dataarray(xr_obj)[source][source]#

gpm.utils.manipulations.get_liquid_phase_mask(ds)[source][source]#: Retrieve the mask of the liquid phase profile.

gpm.utils.manipulations.get_range_axis(da)[source][source]#: Get range dimension axis index.

gpm.utils.manipulations.get_range_index_at_max(da)[source][source]#: Retrieve index along the range dimension where the xarray.DataArray has maximum values.

gpm.utils.manipulations.get_range_index_at_min(da)[source][source]#: Retrieve index along the range dimension where the xarray.DataArray has minimum values.

gpm.utils.manipulations.get_range_index_at_value(da, value)[source][source]#: Retrieve index along the range dimension where the xarray.DataArray values is closest to value.

gpm.utils.manipulations.get_range_slices_with_valid_data(xr_obj, variable=None)[source][source]#: Get the vertical (‘range’/’height’) slices with valid data.

gpm.utils.manipulations.get_range_slices_within_values(xr_obj, variable=None, vmin=-inf, vmax=inf)[source][source]#: Get the ‘range’ slices with data within a given data interval.

gpm.utils.manipulations.get_solid_phase_mask(ds)[source][source]#: Retrieve the mask of the solid phase profile.

gpm.utils.manipulations.get_spatial_2d_datarray_template(ds, fill_value=nan)[source][source]#: Get spatial 2D DataArray template.

gpm.utils.manipulations.get_spatial_3d_datarray_template(ds, fill_value=nan)[source][source]#: Get spatial 3D DataArray template.

gpm.utils.manipulations.get_vertical_coords_and_vars(ds)[source][source]#: Return a ‘prototype’ with only spatial and vertical dimensions.

gpm.utils.manipulations.get_vertical_datarray_prototype(ds, fill_value=nan)[source][source]#: Return a xarray.DataArray ‘prototype’ with only spatial and vertical dimensions.

gpm.utils.manipulations.infill_below_bin(xr_obj, bins)[source][source]#

Infill values below a spatially variable range bin.

Parameters:

xr_obj (xarray.Dataset or xarray.DataArray) – GPM RADAR xarray object.
bins (str or xarray.DataArray) – Either a xarray.DataArray or a string pointing to the dataset variable with the range bins. GPM bin variables are assumed to start at 1, not 0!

Returns:

Infilled GPM RADAR xarray object.

Return type:

gpm.utils.manipulations.integrate_profile_concentration(dataarray, name, scale_factor=None, units=None)[source][source]#

Utility to convert LWC or IWC to LWP or IWP.

Input data have unit g/m³. Output data will have unit kg/m² if scale_factor=1000

height a list or array of corresponding heights for each level.

gpm.utils.manipulations.locate_max_value(da, return_isel_dict=False)[source][source]#

Find the geographic point where the maximum value occur in the data array.

Parameters:

da (xarray.DataArray) – The data array to analyze.
return_isel_dict (bool, optional) – If True, returns a dictionary with the spatial dimension indices corresponding to the maximum value. If False (the default), returns a (lon, lat) tuple of the point where the maximum value occurs.

Returns:

If return_isel_dict=True, returns a dictionary with the spatial dimension and indices corresponding to the maximum value. If return_isel_dict=False (the default), returns a (lon, lat) tuple of the point where the maximum value occurs.

Return type:

tuple or dict

gpm.utils.manipulations.locate_min_value(da, return_isel_dict=False)[source][source]#

Find the geographic point where the minimum value occurs in the data array.

Parameters:

da (xarray.DataArray) – The data array to analyze.
return_isel_dict (bool, optional) – If True, returns a dictionary with the spatial dimension indices corresponding to the minimum value. If False (the default), returns a (lon, lat) tuple of the point where the minimum value occurs.

Returns:

If return_isel_dict=True, returns a dictionary with the spatial dimension and indices corresponding to the minimum value. If return_isel_dict=False (the default), returns a (lon, lat) tuple of the point where the minimum value occurs.

Return type:

tuple or dict

gpm.utils.manipulations.locate_points(xr_obj, points)[source][source]#: Return a list of isel dictionary corresponding to the nearest location of the set of points.

gpm.utils.manipulations.mask_above_bin(xr_obj, bins, strict=True, fillvalue=nan)[source][source]#

Mask the xarray object below the <bins> index.

The method does not mask where bins values are NaN or invalid.

Parameters:

xr_obj (xarray.Dataset or xarray.DataArray) – GPM RADAR xarray object.
bins (str or xarray.DataArray) – Either a xarray.DataArray or a string pointing to the dataset variable with the range bins above which to mask. GPM bin variables are assumed to start at 1, not 0!
strict (bool, optional) – If False, it masks only radar gates above the bin index. If True, it masks also the radar gate at the bin index. The default is True.

Returns:

Masked GPM RADAR xarray object.

Return type:

gpm.utils.manipulations.mask_below_bin(xr_obj, bins, strict=True, fillvalue=nan)[source][source]#

Mask the xarray object below the <bins> index.

The method does not mask where bins values are NaN or invalid.

Parameters:

xr_obj (xarray.Dataset or xarray.DataArray) – GPM RADAR xarray object.
bins (str or xarray.DataArray) – Either a xarray.DataArray or a string pointing to the dataset variable with the range bins below which to mask. GPM bin variables are assumed to start at 1, not 0!
strict (bool, optional) – If False, it masks only radar gates below the bin index. If True, it masks also the radar gate at the bin index. The default is True.

Returns:

Masked GPM RADAR xarray object.

Return type:

gpm.utils.manipulations.mask_between_bins(xr_obj, bottom_bins, top_bins, strict=True, fillvalue=nan)[source][source]#

Mask the xarray object between bottom and top <bins> indices.

The method does not mask where bins values are NaN or invalid.

Parameters:

xr_obj (xarray.Dataset or xarray.DataArray) – GPM RADAR xarray object.
bottom_bins (str or xarray.DataArray) – Either a xarray.DataArray or a string pointing to the dataset variable with the bottom range bins. GPM bin variables are assumed to start at 1, not 0!
top_bins (str or xarray.DataArray) – Either a xarray.DataArray or a string pointing to the dataset variable with the top range bins. GPM bin variables are assumed to start at 1, not 0!
strict (bool, optional) – If False, it masks only radar gates between the bin indices. If True, it masks also the radar gates at the bin indices. The default is True.

Returns:

Masked GPM RADAR xarray object.

Return type:

gpm.utils.manipulations.mask_vertical_variables(ds, mask, fillvalue)[source][source]#

gpm.utils.manipulations.reverse_range(ds)[source][source]#

Reverse the range dimension of a dataset.

The bin variables are updated accordingly.

gpm.utils.manipulations.select_bin_variables(ds)[source][source]#: Return xarray.Dataset with only bin variables.

gpm.utils.manipulations.select_cross_section_variables(ds, strict=False, squeeze=True)[source][source]#

Return xarray.Dataset with only cross-section variables.

It select variables with only a single horizontal and vertical dimension.

gpm.utils.manipulations.select_frequency_variables(ds)[source][source]#: Return xarray.Dataset with only multifrequency variables.

gpm.utils.manipulations.select_spatial_2d_variables(ds, strict=False, squeeze=True)[source][source]#: Return xarray.Dataset with only 2D spatial variables.

gpm.utils.manipulations.select_spatial_3d_variables(ds, strict=False, squeeze=True)[source][source]#: Return xarray.Dataset with only 3D spatial variables.

gpm.utils.manipulations.select_vertical_variables(ds)[source][source]#: Return xarray.Dataset with only variables with vertical dimension.

gpm.utils.manipulations.slice_range_at_bin(xr_obj, bins)[source][source]#

Extract values at the range bins specified by bin_variable.

bin_variable can be a bin xarray.DataArray or the name of a bin variable of the input xarray.Dataset.

The function extract the gates based on the ‘range’ coordinate values. Bin values are assumed to start at 1, not 0 !

If you want to extract a slice at a single range bin, use instead xr_obj.sel(range=range_bin_value).

Parameters:

xr_obj (xarray.DataArray or xarray.Dataset) – xarray object with the ‘range’ dimension (and coordinate).
bins (str or xarray.DataArray) – Either a xarray.DataArray or a string pointing to the dataset variable with the range bins to extract. Bin values are assumed to start at 1, not 0 !

Returns:

xr_out – xarray object with values at the specified range bins.

Return type:

gpm.utils.manipulations.slice_range_at_height(xr_obj, value)[source][source]#: Slice the 3D array at a given height.

gpm.utils.manipulations.slice_range_at_max_value(xr_obj, variable=None)[source][source]#: Slice the 3D arrays where the variable values are at maximum.

gpm.utils.manipulations.slice_range_at_min_value(xr_obj, variable=None)[source][source]#: Slice the 3D arrays where the variable values are at minimum.

gpm.utils.manipulations.slice_range_at_temperature(ds, temperature, variable_temperature='airTemperature')[source][source]#: Slice the 3D arrays along a specific isotherm.

gpm.utils.manipulations.slice_range_at_value(xr_obj, value, variable=None)[source][source]#: Slice the 3D arrays where the variable values are close to value.

gpm.utils.manipulations.subset_range_where_values(xr_obj, variable=None, vmin=-inf, vmax=inf)[source][source]#: Select the ‘range’ interval where values are within the [vmin, vmax] interval.

gpm.utils.manipulations.subset_range_with_valid_data(xr_obj, variable=None)[source][source]#: Select the ‘range’ interval with valid data.

gpm.utils.orbit module#

This module contains utilities for orbit processing.

gpm.utils.orbit.adjust_short_sequences(arr, min_size)[source][source]#

Replace value of short sequences of consecutive identical values.

The function examines contiguous sequences of identical elements in the input array. If a sequence is shorter than min_size, its values are replaced with the value of the adjacent longer sequence, working outward from the first valid sequence.

Parameters:

arr (array-like) – The input array of values.
min_size (int) – The minimum number of consecutive identical elements to not be modified. Shorter sequences will be replaced with the previous sequence value.

Returns:

arr – The modified array with updated values.

Return type:

gpm.utils.orbit.get_orbit_direction(lats, n_tol=1)[source][source]#

Infer the satellite orbit direction from latitude values.

This function determines the orbit direction by computing the sign of the differences between consecutive latitude values.

A positive sign (+1) indicates an ascending orbit (increasing latitude), while a negative sign (-1) indicates a descending orbit (decreasing latitude).

Any zero differences are replaced by the nearest nonzero direction. Additionally, short sequences of direction changes - those lasting fewer than n_tol consecutive data points - are adjusted to reduce the influence of geolocation errors.

Parameters:

lats (array-like) – 1-dimensional array of latitude values corresponding to the satellite’s orbit.
n_tol (int, optional) – The minimum number of consecutive data points required to confirm a change in direction. Sequences shorter than this threshold will be smoothed. Default is 1.

Returns:

A 1-dimensional array of the same length as lats containing the inferred orbit direction. A value of +1 denotes an ascending orbit and -1 denotes a descending orbit.

Return type:

numpy.ndarray

Examples

>>> lats = [10, 10.5, 11, 10.8, 10.3, 10, 9.5, 9.8, 10.1]
>>> get_orbit_direction(lats)
array([ 1,  1,  1, -1, -1, -1, -1,  1,  1])

gpm.utils.orbit.get_orbit_mode(ds)[source][source]#

gpm.utils.parallel module#

This module contains utilities for parallel processing.

gpm.utils.parallel.compute_list_delayed(list_delayed, max_concurrent_tasks=None)[source][source]#

Compute the list of Dask delayed objects in blocks of max_concurrent_tasks.

Parameters:

list_delayed (list) – List of Dask delayed objects.
max_concurrent_task (int) – Maximum number of concurrent tasks to execute.

Returns:

List of computed results.

Return type:

gpm.utils.parallel.create_group_slices(chunksizes, group_size)[source][source]#

Create slices by grouping contiguous chunks along a dimension.

Parameters:

chunksizes (list or tuple of int) – Sizes of chunks along the dimension to be grouped.
group_size (int) – Number of chunks to group together.

Returns:

List of slice objects representing the start and stop positions of each group of contiguous chunks.

Return type:

list of slice

gpm.utils.parallel.get_block_slices(ds, **dim_chunks_kwargs)[source][source]#

Generate a list of slice dictionaries for grouping chunks in an xarray Dataset.

Parameters:

ds (xarray.Dataset) – The dataset for which slices are generated.
**dim_chunks_kwargs (dict) – Keyword arguments where each key is a dimension name and each value is the number of contiguous chunks to group together for that dimension.

Returns:

A list of dictionaries where each dictionary maps dimension names to slice objects, defining groups of contiguous chunks along the specified dimensions.

Return type:

list of dict

gpm.utils.pmw module#

This module provides PMW utilities.

class gpm.utils.pmw.PMWFrequency(center_frequency: float, polarization: str = '', offset=None)[source][source]#

Bases: object

Class to represent a Passive Microwave frequency channel.

center_frequency#

The (nominal) center frequency in GHz.

Type:: float or str

polarization#

Polarization code, e.g. ‘V’, ‘H’, ‘QV’, ‘QH’.

Type:: str

offset#

Offset from the center frequency in GHz (e.g., 3 for 183±3 GHz). None if not applicable.

Type:: float or None

property center_frequency_str[source]#: Return center frequency string.

classmethod from_string(string: str) → PMWFrequency[source][source]#

Create a PMWFrequency object from a string like ‘10.65V’, ‘18.7H’, or ‘183V3’.

Pattern:

Numeric frequency (integer or float)
Polarization (‘V’, ‘H’, ‘QV’, ‘QH’)
Optional numeric offset, e.g. ‘3’

Examples

‘10.65V’ -> center_frequency=10.65, polarization=’V’, offset=None ‘183V3’ -> center_frequency=183, polarization=’V’, offset=3 ‘183.31QH7.5’ -> center_frequency=183.31, polarization=’QH’, offset=7.5

has_same_center_frequency(other: PMWFrequency, tol: float = 1e-06) → bool[source][source]#: Return True if the center frequency is the same as other within a specified tolerance.

has_same_offset(other: PMWFrequency) → bool[source][source]#: Return True if offset is the same as other.

has_same_polarization(other: PMWFrequency) → bool[source][source]#: Return True if the polarization is the same as other.

property offset_str[source]#: Return center frequency offset string.

opposite_polarization()[source][source]#

Return a new PMWFrequency object with flipped polarization (V <-> H, QV <-> QH).

Examples

10.65V -> 10.65H 183QH3 -> 183QV3

title() → str[source][source]#: Return a nicely formatted string representation of the frequency channel.

to_string() → str[source][source]#

Recreate the original acronym string from the PMWFrequency object.

Examples

“10.65V”
“183V3”
“89QV7.5”

property wavelength: float[source]#

Returns the channel wavelength in meters.

The wavelength is computed as c / (f * 1e9), where c ~ 3e8 m/s and f is the center_frequency in GHz.

gpm.utils.pmw.available_pmw_frequencies(sensor)[source][source]#

gpm.utils.pmw.create_rgb_composite(ds, receipt)[source][source]#

Generate an RGB composite from a 1C PMW dataset using the provided receipt.

Parameters:

ds (xarray.Dataset) – Input dataset containing PMW data.
receipt (dict) –
Dictionary containing configuration for each channel (‘R’, ‘G’, ‘B’) and optional global normalization. Each channel configuration should include the following keys:
- ’name’: The PMW feature name.
- ’vmin’: Minimum value for normalization.
- ’vmax’: Maximum value for normalization.
- ’vmin_dynamic’: Boolean flag for dynamic minimum adjustment.
- ’vmax_dynamic’: Boolean flag for dynamic maximum adjustment.
- ’invert’: Boolean flag to invert the channel.
Optionally, the dictionary may include a ‘global_normalization’ key (bool). If True, the vmin and vmax of channels with vmin_dynamic and vmax_dynamic equal False are updated with the minimum and maximum values across all such channels. The specified vmin and vmax are used to bound the update of vmin and vmax.

Returns:

An RGB composite DataArray with a coordinate ‘rgb’ corresponding to channels [‘r’, ‘g’, ‘b’].

Return type:

gpm.utils.pmw.find_closely_matching_center_frequency(center_frequency, center_frequencies)[source][source]#: Find the closely matching center frequency within a set of frequencies.

gpm.utils.pmw.find_closely_matching_frequency(pmw_frequency, pmw_frequencies, center_frequency_tol)[source][source]#: Find the closely matching frequency within a set of frequencies.

gpm.utils.pmw.find_polarization_pairs(pmw_frequencies)[source][source]#

Identify polariazion pairs of PMWFrequency objects.

The PMWFrequency objects must share the same center frequency but differ in polarization (e.g., vertical vs. horizontal).

This function iterates through each PMWFrequency in the input list and attempts to match it with another PMWFrequency that:

Has the same center frequency.
Has the opposite polarization (e.g., V vs. H or QV vs. QH).

Once a valid pair is found, it is stored in a dictionary keyed by the shared center frequency. For consistent ordering of pairs, any item with vertical polarization (e.g., “V”, “QV”) is placed first in the tuple, followed by the corresponding horizontal polarization (“H”, “QH”).

Parameters:: pmw_frequencies (list of PMWFrequency) – A list of PMWFrequency objects to be examined for pairs.
Returns:: A dictionary where keys are center frequencies (float or int), and values are 2-tuples of PMWFrequency objects in (vertical, horizontal) order. If no match is found for a given frequency, that frequency is not included in the dictionary.
Return type:: dict

gpm.utils.pmw.get_available_pct_features(ds)[source][source]#: Get list of available PCT features.

gpm.utils.pmw.get_available_pd_features(ds)[source][source]#: Get list of available PCT features.

gpm.utils.pmw.get_available_pr_features(ds)[source][source]#: Get list of available PR features.

gpm.utils.pmw.get_brightness_temperature(xr_obj, variable)[source][source]#

Retrieve the brightness temperature data array from an xarray object.

Parameters:

xr_obj (xarray.Dataset or xarray.DataArray) – Input xarray object containing brightness temperature data.
variable (str or None) – Variable name to extract. If None, a default variable is determined based on possible options.

Returns:

dataarray – DataArray of the brightness temperature.

Return type:

gpm.utils.pmw.get_frequencies_polarized_pairs(ds)[source][source]#: Get center frequency of polarized pairs.

gpm.utils.pmw.get_pct(ds, name)[source][source]#

Compute a PCT (Polarization Corrected Temperature) from a PMW L1C product.

Parameters:

ds (xarray.Dataset) – PMW L1C product containing the brightness temperature variable.
name (str) – Name of the PCT feature in the format ‘PCT_<center_freq>’.

Returns:

dataarray – DataArray representing the computed PCT.

Return type:

gpm.utils.pmw.get_pct_coefficient(center_frequency)[source][source]#

gpm.utils.pmw.get_pd(ds, name)[source][source]#

Compute a PD (Polarization Difference) from a PMW L1C product.

Parameters:

ds (xarray.Dataset) – PMW L1C product containing the brightness temperature variable.
name (str) – Name of the PD feature in the format ‘PD_<center_freq>’.

Returns:

dataarray – DataArray representing the computed PD.

Return type:

gpm.utils.pmw.get_pmw_channel(xr_obj, name, variable=None)[source][source]#

Extract a specific PMW (Passive Microwave) channel from an xarray object.

Parameters:

xr_obj (xarray.Dataset or xarray.DataArray) – PMW L1B or L1C product containing the brightness temperature variable.
name (str or PMWFrequency) – PMW channel name or PMWFrequency object representing the desired frequency.
variable (str, optional) – Variable name to extract from the xarray object. If None, the default variable is used.

Returns:

dataarray – DataArray corresponding to the selected PMW channel.

Return type:

gpm.utils.pmw.get_pmw_feature(ds, name)[source][source]#

Retrieve or compute a PMW feature from an L1C PMW product.

This function can handle:

Simple PMW feature names such as “PCT_37”, “PD_37”, or direct channel names.

Complex expressions using PMW feature names combined with basic arithmetic operations.

Parameters:

ds (xarray.Dataset) – PMW L1C product containing the brightness temperature variable.
name (str) – A single PMW feature name (e.g. “PCT_37”, “PD_37”, “37V”, …) or a string expression combining multiple PMW feature names via mathematical operators (+, -, *, /) and parentheses (e.g. `"(PCT_37 + PCT_19)/(PCT_37 - PCT_19)" `).

Returns:

DataArray corresponding to the requested or computed PMW feature.

Return type:

gpm.utils.pmw.get_pmw_frequency(sensor, scan_mode)[source][source]#: Get product info dictionary.

gpm.utils.pmw.get_pmw_frequency_dict()[source][source]#: Get PMW info dictionary.

gpm.utils.pmw.get_pmw_rgb_receipts(sensor)[source][source]#: Return the RGB composite receipts available for a PMW sensor.

gpm.utils.pmw.get_pr(ds, name)[source][source]#

Compute a PR (Polarization Ratio) from a PMW L1C product.

Parameters:

ds (xarray.Dataset) – PMW L1C product containing the brightness temperature variable.
name (str) – Name of the PR feature in the format ‘PR_<center_freq>’.

Returns:

dataarray – DataArray representing the computed PR.

Return type:

gpm.utils.pmw.is_simple_feature_name(name)[source][source]#

Determine if name is a single PMW feature or a more complex expression.

A ‘simple’ feature name is assumed to not contain any arithmetic symbols (+, -, *, /) nor parentheses.

gpm.utils.pmw.normalize_channel(arr, vmin, vmax, vmin_dynamic, vmax_dynamic, invert)[source][source]#

Normalize array between 0 and 1 with optional inversion and dynamic min/max adjustment.

Parameters:

arr (array-like) – Input array to be normalized.
vmin (float) – Minimum value for normalization.
vmax (float) – Maximum value for normalization.
vmin_dynamic (bool) – If True, update vmin to the maximum of the provided vmin and the array minimum.
vmax_dynamic (bool) – If True, update vmax to the minimum of the provided vmax and the array maximum.
invert (bool) – If True, invert the normalized values (i.e., compute 1 - normalized array).

Returns:

norm_arr – Normalized array with values clipped between 0 and 1.

Return type:

array-like

gpm.utils.pmw.strip_trailing_zero_decimals(num: float) → str[source][source]#: Strip trailing zeros from a float, e.g. 183.0 -> ‘183’.

gpm.utils.pyresample module#

This module contains pyresample utility functions.

gpm.utils.pyresample.get_cartopy_crs(xr_obj)[source][source]#: Returns the cartopy CRS.

gpm.utils.pyresample.get_pyresample_area(xr_obj)[source][source]#: It returns the corresponding pyresample area.

gpm.utils.pyresample.remap(src_ds, dst_ds, radius_of_influence=20000, fill_value=nan)[source][source]#

Remap dataset to another one using nearest-neighbour.

The spatial non-dimensional coordinates of the source dataset are not remapped. ! The output dataset has the spatial coordinates of the destination dataset !

gpm.utils.remapping module#

This module contains tools for coordinates transformation and data remapping.

gpm.utils.remapping.chunks_inputs(x, n_blocks=None)[source][source]#

Split array into n chunks.

Parameters:

x (numpy.ndarray or xarray.DataArray) – The input array to be chunked.
n_blocks (int, optional) – Number of blocks. If None (the default), is set equal to the number of CPUs available.

Returns:

The chunked array.

Return type:

numpy.ndarray or xarray.DataArray

gpm.utils.remapping.reproject_coords(x, y, z=None, parallel=False, **kwargs)[source][source]#

Transform coordinates from a source projection to a target projection.

Longitude coordinates should be provided as x, latitude as y.

Parameters:

x (numpy.ndarray, dask.array.Array or xarray.DataArray) – Array of x coordinates.
y (numpy.ndarray, dask.array.Array or xarray.DataArray) – Array of y coordinates.
z (numpy.ndarray, dask.array.Array or xarray.DataArray, optional) – Array of z coordinates.
parallel (bool, optional) – Whether to use multiple cores to transform coordinates when input arrays are backed by numpy arrays. The default is False.

Keyword Arguments:

src_crs (pyproj.crs.CRS) – Source CRS
dst_crs (pyproj.crs.CRS) – Destination CRS

Returns:

trans – Arrays of reprojected coordinates (X, Y) or (X, Y, Z) depending on input.

Return type:

tuple of numpy.ndarray, dask.array.Array or xarray.DataArray

gpm.utils.slices module#

This module contains utilities for list of slices processing.

gpm.utils.slices.enlarge_slice(slc, min_size, min_start=0, max_stop=inf)[source][source]#

Enlarge a slice object to have at least a size of min_size.

The function enforces the left and right bounds of the slice by max_stop and min_start. If the original slice size is larger than min_size, the original slice will be returned.

Parameters:

slc (slice) – The original slice object to be enlarged.
min_size (int) – The desired minimum size of the new slice.
min_start (int, optional) – The minimum value for the start of the new slice. The default is 0.
max_stop (int, optional) – The maximum value for the stop of the new slice. The default is np.inf.

Returns:

The new slice object with a size of at least min_size and respecting the left and right bounds.

Return type:

slice

gpm.utils.slices.enlarge_slices(list_slices, min_size, valid_shape)[source][source]#

Enlarge a list of slice object to have at least a size of min_size.

The function enforces the left and right bounds of the slice to be between 0 and valid_shape. If the original slice size is larger than min_size, the original slice will be returned.

Parameters:

list_slices (list) – List of slice objects.
min_size (int or tuple) – Minimum size of the output slice.
valid_shape (int or tuple) – The shape of the array which the slices should be valid on.

Returns:

list_slices – The list of slices after enlarging it (if necessary).

Return type:

gpm.utils.slices.ensure_is_slice(slc)[source][source]#

gpm.utils.slices.get_indices_from_list_slices(list_slices, check_non_intersecting=True)[source][source]#: Return a numpy array of indices from a list of slices.

gpm.utils.slices.get_list_slices_from_bool_arr(bool_arr, include_false=True, skip_consecutive_false=True)[source][source]#

Return the slices corresponding to sequences of True in the input arrays.

If include_false=True, the last element of each slice sequence (except the last) will be False. If include_false=False, no element in each slice sequence will be False. If skip_consecutive_false=True (default), the first element of each slice must be a True. If skip_consecutive_false=False, it returns also slices of size 1 which selects just the False values. If include_false=False, skip_consecutive_false is automatically True.

Examples

If include_false=True and skip_consecutive_false=False: –> [False, False] --> ``[slice(0,1), slice(1,2)] If include_false=True and skip_consecutive_false=True: –> [False, False] --> [] –> [False, False, True] --> ``[slice(2,3)] –> [False, False, True, False] --> [slice(2,4)] If include_false=False: –> [False, False, True, False] --> [slice(2,3)]

gpm.utils.slices.get_list_slices_from_indices(indices)[source][source]#

Return a list of slices from a list/array of integer indices.

Example: [0,1,2,4,5,8] –> [slices(0,3),slice(4,6), slice(8,9)]

gpm.utils.slices.get_slice_from_idx_bounds(idx_start, idx_end)[source][source]#: Return the slice required to include the idx bounds.

gpm.utils.slices.get_slice_size(slc)[source][source]#

Get size of the slice.

Note: The actual slice size must not be representative of the true slice if slice.stop is larger than the length of object to be sliced.

gpm.utils.slices.list_slices_combine(*args)[source][source]#: Combine together a list of list_slices, without any additional operation.

gpm.utils.slices.list_slices_difference(list_slices1, list_slices2)[source][source]#: Return the list of slices covered by list_slices1 not intersecting list_slices2.

gpm.utils.slices.list_slices_filter(list_slices, min_size=None, max_size=None)[source][source]#: Filter list of slices by size.

gpm.utils.slices.list_slices_flatten(list_slices)[source][source]#

Flatten out list of slices with 2 nested level.

Examples

[[slice(1, 7934, None)], [slice(1, 2, None)]] --> [slice(1, 7934, None), slice(1, 2, None)] [slice(1, 7934, None), slice(1, 2, None)] --> [slice(1, 7934, None), slice(1, 2, None)]

gpm.utils.slices.list_slices_intersection(*args, min_size=1)[source][source]#: Return the intersecting slices from multiple list of slices.

gpm.utils.slices.list_slices_simplify(list_slices)[source][source]#

Simplify list of of sequential slices.

Example 1: [slice(0,2), slice(2,4)] –> [slice(0,4)]

gpm.utils.slices.list_slices_sort(*args)[source][source]#

Sort a single or multiple list of slices by slice.start.

It output a single list of slices!

gpm.utils.slices.list_slices_union(*args)[source][source]#: Return the union slices from multiple list of slices.

gpm.utils.slices.pad_slice(slc, padding, min_start=0, max_stop=inf)[source][source]#

Increase/decrease the slice with the padding argument.

Does not ensure that all output slices have same size.

Parameters:

slc (slice) – Slice objects.
padding (int) – Padding to be applied to the slice.
min_start (int, optional) – The minimum value for the start of the new slice. The default is 0.
max_stop (int) – The maximum value for the stop of the new slice. The default is np.inf.

Returns:

list_slices – The list of slices after applying padding.

Return type:

gpm.utils.slices.pad_slices(list_slices, padding, valid_shape)[source][source]#

Increase/decrease the list of slices with the padding argument.

Parameters:

list_slices (list) – List of slice objects.
padding (int or tuple) – Padding to be applied on each slice.
valid_shape (int or tuple) – The shape of the array which the slices should be valid on.

Returns:

list_slices – The list of slices after applying padding.

Return type:

gpm.utils.subsetting module#

This module contains functions for subsetting and aligning GPM ORBIT Datasets.

gpm.utils.subsetting.align_along_track(*args)[source][source]#

Align GPM / GPM-GEO xarray objects in the along-track direction.

Parameters:: args (list) – A list of GPM / GPM-GEO xr.Dataset or xr.DataArray.
Returns:: list_aligned – A list of aligned GPM / GPM-GEO xr.Dataset or xr.DataArray.
Return type:: list

gpm.utils.subsetting.align_cross_track(*args)[source][source]#

Align GPM / GPM-GEO xarray objects in the cross-track direction.

Parameters:: args (list) – A list of GPM / GPM-GEO xr.Dataset or xr.DataArray.
Returns:: list_aligned – A list of aligned GPM / GPM-GEO xr.Dataset or xr.DataArray.
Return type:: list

gpm.utils.subsetting.is_1d_non_dimensional_coord(xr_obj, coord)[source][source]#: Checks if a coordinate is a 1d, non-dimensional coordinate.

gpm.utils.subsetting.isel(xr_obj, indexers=None, drop=False, **indexers_kwargs)[source][source]#: Perform index-based dimension selection.

gpm.utils.subsetting.sel(xr_obj, indexers=None, drop=False, method=None, **indexers_kwargs)[source][source]#

Perform value-based coordinate selection.

Slices are treated as inclusive of both the start and stop values, unlike normal Python indexing. The gpm sel method is empowered to:

slice by gpm-id strings !
slice by any xarray coordinate value !

You can use string shortcuts for datetime coordinates (e.g., ‘2000-01’ to select all values in January 2000).

gpm.utils.time module#

This module contains utilities for time processing.

gpm.utils.time.ensure_time_validity(xr_obj, limit=10)[source][source]#

Attempt to correct the time coordinate if less than ‘limit’ consecutive NaT values are present.

It raise a ValueError if more than consecutive NaT occurs.

Parameters:: xr_obj (xarray.DataArray or xarray.Dataset) – GPM xarray object.
Returns:: xr_obj – GPM xarray object.
Return type:: xarray.DataArray or xarray.Dataset

gpm.utils.time.get_dataset_start_end_time(ds: Dataset, time_dim='time')[source][source]#

Retrieves dataset starting and ending time.

Parameters:

ds (xarray.Dataset) – Input dataset
time_dim (str) – Name of the time dimension. The default is “time”.

Returns:

(starting_time, ending_time)

Return type:

gpm.utils.time.has_nat(timesteps)[source][source]#: Return True if any of the timesteps is NaT.

gpm.utils.time.infill_timesteps(timesteps, limit)[source][source]#: Infill missing timesteps if less than <limit> consecutive.

gpm.utils.time.interpolate_nat(timesteps, method='linear', limit=5, limit_direction=None, limit_area=None)[source][source]#

Fill NaT values using an interpolation method.

For further information refers to pandas.DataFrame.interpolate.

Parameters:

method (str) – Interpolation technique to use. 'linear', the default, treat the timesteps as equally spaced.
limit (int, optional) – Maximum number of consecutive NaTs to fill. Must be greater than 0.
limit_direction (str, optional) – Valid values are 'forward', 'backward' and 'both'. Consecutive NaTs will be filled in this direction.
limit_area (str, None) – Valid values are None, 'inside' and 'outside', If limit is specified, consecutive NaTs will be filled with this restriction. * None: No fill restriction. * ‘inside’: Only fill NaTs surrounded by valid values (interpolate). * ‘outside’: Only fill NaTs outside valid values (extrapolate).

Notes

Depending on the interpolation method (i.e. linear) the infilled values could have ns resolution. For further information refers to pandas.DataFrame.interpolate.

Returns:: timesteps – Timesteps array of type datetime64[ns].
Return type:: numpy.ndarray

gpm.utils.time.is_nat(timesteps)[source][source]#: Return a boolean array indicating timesteps which are NaT.

gpm.utils.time.regularize_dataset(ds: Dataset, freq: str, time_dim: str = 'time', method: str | None = None, fill_value=None)[source][source]#

Regularize a dataset across time dimension with uniform resolution.

Parameters:

ds (xarray.Dataset) – xarray Dataset.
time_dim (str, optional) – The time dimension in the xarray.Dataset. The default is "time".
freq (str) – The freq string to pass to pd.date_range() to define the new time coordinates. Examples: freq="2min".
method (str, optional) – Method to use for filling missing timesteps. If None, fill with fill_value. The default is None. For other possible methods, see xarray.Dataset.reindex()`.
fill_value ((float, dict), optional) – Fill value to fill missing timesteps. If not specified, for float variables it uses dtypes.NA while for for integers variables it uses the maximum allowed integer value or, in case of undecoded variables, the _FillValue DataArray attribute..

Returns:

ds_reindexed – Regularized dataset.

Return type:

gpm.utils.time.subset_by_time(xr_obj, start_time=None, end_time=None)[source][source]#

Filter a GPM xarray object by start_time and end_time.

Parameters:

xr_obj – A xarray object.
start_time (datetime.datetime) – Start time. By default is None
end_time (datetime.datetime) – End time. By default is None

Returns:

xr_obj – GPM xarray object

Return type:

gpm.utils.time.subset_by_time_slice(xr_obj, slice)[source][source]#

gpm.utils.timing module#

This module contains decorators which measure the function time of execuution.

gpm.utils.timing.print_elapsed_time(fn)[source][source]#

gpm.utils.timing.print_task_elapsed_time(prefix=' - ')[source][source]#

gpm.utils.warnings module#

This module defines GPM Warning classes.

exception gpm.utils.warnings.GPMDownloadWarning(message)[source][source]#: Bases: Warning

exception gpm.utils.warnings.GPM_Warning(message)[source][source]#: Bases: Warning

gpm.utils.xarray module#

This module contains general utility for xarray objects.

gpm.utils.xarray.broadcast_like(xr_obj, other, add_coords=True)[source][source]#: Broadcast an xarray object against another one.

gpm.utils.xarray.check_is_xarray(x)[source][source]#

gpm.utils.xarray.check_is_xarray_dataarray(x)[source][source]#

gpm.utils.xarray.check_is_xarray_dataset(x)[source][source]#

gpm.utils.xarray.check_variable_availabilty(ds, variable, argname)[source][source]#: Check variable availability in an xarray Dataset.

gpm.utils.xarray.ensure_dim_order_dataarray(da, func, *args, **kwargs)[source][source]#

Ensure that the output DataArray has the same dimensions order as the input.

New dimensions are moved to the last positions.

gpm.utils.xarray.ensure_dim_order_dataset(ds, func, *args, **kwargs)[source][source]#

Ensure that the output Dataset has the same dimensions order as the input.

New dimensions are moved to the last positions.

gpm.utils.xarray.ensure_unique_chunking(ds)[source][source]#

Ensure the dataset has unique chunking.

Conversion to dask.dataframe.DataFrame requires unique chunking. If the xarray.Dataset does not have unique chunking, perform ds.unify_chunks.

Variable chunks can be visualized with:

for var in ds.data_vars:: print(var, ds[var].chunks)

gpm.utils.xarray.get_dataset_variables(ds, sort=False)[source][source]#: Get list of xarray.Dataset variables.

gpm.utils.xarray.get_default_variable(ds: Dataset, possible_variables) → str[source][source]#

Return one of the possible default variables.

Check if one of the variables in ‘possible_variables’ is present in the xarray.Dataset. If neither variable is present, raise an error. If both are present, raise an error. Return the name of the single available variable in the xarray.Dataset

Parameters:

ds (xarray.Dataset) – The xarray dataset to inspect.
possible_variables (list of str) – The variable names to look for.

Returns:

The name of the variable found in the xarray.Dataset.

Return type:

str

gpm.utils.xarray.get_dimensions_without(xr_obj, dims)[source][source]#: Return the dimensions of the xarray object without the specified dimensions.

gpm.utils.xarray.get_xarray_variable(xr_obj, variable=None)[source][source]#

Return variable DataArray from xarray object.

If variable is a xr.DataArray, it returns it If variable is None and the the input is a xr.DataArray, it returns it If the input is a xr.Dataset, it returns the specified variable.

gpm.utils.xarray.has_unique_chunking(ds)[source][source]#: Check if a dataset has unique chunking.

gpm.utils.xarray.squeeze_unsqueeze_dataarray(da, func, *args, **kwargs)[source][source]#

Ensure that the output DataArray has the same dimensions as the input.

Dimensions of size 1 are kept also if the function drop them ! New dimensions are moved to the last positions.

gpm.utils.xarray.squeeze_unsqueeze_dataset(ds, func, *args, **kwargs)[source][source]#

Ensure that the output Dataset has the same dimensions as the input.

Dimensions of size 1 are kept also if the function drop them ! New dimensions are moved to the last positions.

gpm.utils.xarray.unstack_datarray_dimension(da, dim, coord_handling='keep', prefix='', suffix='')[source][source]#

Split a DataArray along a specified dimension into a Dataset with separate prefixed and suffixed variables.

Parameters:

da (xarray.DataArray) – The DataArray to split.
dim (str) – The dimension along which to split the DataArray.
coord_handling (str, optional) – Option to handle coordinates sharing the target dimension. Choices are ‘keep’, ‘drop’, or ‘unstack’. Defaults to ‘keep’.
prefix (str, optional) – String to prepend to each new variable name.
suffix (str, optional) – String to append to each new variable name.

Returns:

A Dataset with each variable split along the specified dimension. The Dataset variables are named “{prefix}{name}{suffix}{dim_value}”. Coordinates sharing the target dimension are handled based on coord_handling.

Return type:

gpm.utils.xarray.unstack_dataset_dimension(ds, dim, coord_handling='keep', prefix='', suffix='')[source][source]#

Split Dataset variables with the specified dimension into separate prefixed and suffixed variables.

Parameters:

ds (xarray.Dataset) – The DataArray to split.
dim (str) – The dimension along which to split the DataArray.
coord_handling (str, optional) – Option to handle coordinates sharing the target dimension. Choices are ‘keep’, ‘drop’, or ‘unstack’. Defaults to ‘keep’.
prefix (str, optional) – String to prepend to each new variable name.
suffix (str, optional) – String to append to each new variable name.

Returns:

A Dataset with each variable with dimension dim split into new variables. The new Dataset variables are named “{prefix}{name}{suffix}{dim_value}”. Coordinates sharing the target dimension are handled based on coord_handling.

Return type:

gpm.utils.xarray.unstack_dimension(xr_obj, dim, coord_handling='keep', prefix='', suffix='')[source][source]#

Split xarray object with the specified dimension into separate prefixed and suffixed Dataset variables.

Parameters:

xr_obj (xarray.DataArray, xarray.Dataset) – The DataArray to split.
dim (str) – The dimension along which to split the DataArray.
coord_handling (str, optional) – Option to handle coordinates sharing the target dimension. Choices are ‘keep’, ‘drop’, or ‘unstack’. Defaults to ‘keep’.
prefix (str, optional) – String to prepend to each new variable name.
suffix (str, optional) – String to append to each new variable name.

Returns:

Return type:

gpm.utils.xarray.xr_drop_constant_dimension(xr_obj)[source][source]#: Return the first valid (non-NaN) value along the specified dimension.

gpm.utils.xarray.xr_ensure_dimension_order(func)[source][source]#

Decorator which ensures the output xarray object has same dimension order as input.

The decorator expects that the functions return the same type of xarray object !

The decorator can deal with functions that: - returns an xarray object with new dimensions - returns an xarray object with less dimensions than the originals

New dimensions are moved to the last positions.

gpm.utils.xarray.xr_first(xr_obj, dim)[source][source]#: Return the first valid (non-NaN) value along the specified dimension.

gpm.utils.xarray.xr_sorted_distribution(da, values, dim)[source][source]#

Compute the ranked frequency distribution of integer values along a given dimension.

Parameters:

da (xarray.DataArray) – The input data array containing integer values along the specified dimension.
values (array-like) – An array of the expected values (e.g. np.arange(1, 13) for months, np.arange(0, 24) for hours, etc.).
dim (str) – The name of the dimension along which to compute the ranked distribution (e.g., “year”).

Returns:

ds_out –

A dataset with three DataArrays along a new dimension “rank”:

sorted_values: The provided values sorted in descending order of occurrence.
occurrence: The count of occurrences for each sorted value.
percentage: The percentage occurrence (relative to the size along dim).

For each pixel (or location), index along “rank” to retrieve, for example, the most frequent value at rank 0, the second most at rank 1, etc.

Return type:

gpm.utils.xarray.xr_squeeze_unsqueeze(func)[source][source]#

Decorator that squeeze-unsqueeze the xarray object before passing it to the function.

This decorator allow to keep the dimensions of the xarray object intact. Dimensions of size 1 are kept also if the function drop them. The dimension order of the arrays is conserved. New dimensions are moved to the last positions.

gpm.utils.yaml module#

This module defines a YAML file reader and writer.

class gpm.utils.yaml.NoAliasDumper(stream, default_style=None, default_flow_style=False, canonical=None, indent=None, width=None, allow_unicode=None, line_break=None, encoding=None, explicit_start=None, explicit_end=None, version=None, tags=None, sort_keys=True)[source][source]#

Bases: SafeDumper

YAML Safe Dumper class avoiding use of aliases.

ignore_aliases(data)[source][source]#: Ignore aliases.

gpm.utils.yaml.read_yaml(filepath: str) → dict[source][source]#

Read a YAML file into a dictionary.

Parameters:: filepath (str) – Input YAML file path.
Returns:: Dictionary with the attributes read from the YAML file.
Return type:: dict

gpm.utils.yaml.write_yaml(dictionary, filepath, sort_keys=False)[source][source]#

Write a dictionary into a YAML file.

Parameters:: dictionary (dict) – Dictionary to write into a YAML file.

gpm.utils.zonal_stats module#

This module contains tools for zonal statistics computations.

class gpm.utils.zonal_stats.BarnesGaussianWeights(kappa)[source][source]#

Bases: object

Class for calculating weights for Barnes Gaussian Weighting.

Parameters:: kappa (float or numpy.ndarray) – The smoothing parameter(s).

get_weights(idx, distances)[source][source]#

set_size(n)[source][source]#

class gpm.utils.zonal_stats.CressmanWeights(max_distance)[source][source]#

Bases: object

Class for calculating weights using Cressman Weighting.

Parameters:: max_distance (float or array-like) – The maximum allowable distance(s). If scalar, it will be replicated.

get_weights(idx, distances)[source][source]#

set_size(n)[source][source]#

class gpm.utils.zonal_stats.InverseDistanceWeights(order=1)[source][source]#

Bases: object

Class for calculating weights for Inverse Distance Weighting.

Parameters:: order (int, float or numpy.ndarray) – The order(s) of the inverse distance weighting.

get_weights(idx, distances)[source][source]#

set_size(n)[source][source]#

class gpm.utils.zonal_stats.PolyAggregator(source_polygons, target_polygons, parallel=False)[source][source]#

Bases: object

Initialize the PolyAggregator.

Parameters:

source_polygons (list of shapely.Polygon) – List of source polygons.
target_polygons (list of shapely.Polygon) – List of target polygons.
parallel (bool, optional) – Whether to run in parallel. Default is False.
use_multiprocessing (bool, optional) – Whether to use multiprocessing (if parallel is True). Default is False.

apply(func, values, weights=None, area_weighting=True, distance_weighting=False, skip_na=True)[source][source]#

Apply a custom aggregation function to the data.

Parameters:

func (callable) – Aggregation function to apply.
values (list or array-like, optional) – Array of source values to aggregate.
weights (list or array-like, optional) – Array of source weights. Default is None.
area_weighting (bool, optional) – Whether to weight by the intersection area. Default is True.
distance_weighting (bool or gpm.utils.zonal_stats.BaseDistanceWeights class, optional) – Whether to weight by distance between poylgon centroids. Default is False. Currently accepted classes are InverseDistanceWeights, BarnesGaussianWeights, CressmanWeights.
skip_na (bool, optional) – Whether to discard NaN values before applying the aggregation function. Default is True.

Returns:

List of aggregated values for each target polygon.

Return type:

average(values, weights=None, area_weighting=True, distance_weighting=False, skip_na=True)[source][source]#

counts()[source][source]#: Compute the number of source polygons intersecting each target polygon.

property dict_distances[source]#

property dict_intersection_areas[source]#

first(values, skip_na=True)[source][source]#

fraction_covered_area()[source][source]#: Compute the fraction of covered area of each target polygon by the source polygons.

property list_distances[source]#

property list_intersection_areas[source]#

max(values, skip_na=True)[source][source]#

mean(values, weights=None, area_weighting=True, distance_weighting=False, skip_na=True)[source][source]#

min(values, skip_na=True)[source][source]#

property n_source_polygons[source]#

property n_target_polygons[source]#

property source_centroids[source]#

property source_polygons_areas[source]#

std(values, weights=None, area_weighting=True, distance_weighting=False, skip_na=True)[source][source]#

sum(values, weights=None, area_weighting=True, distance_weighting=False, skip_na=True)[source][source]#

property target_centroids[source]#

property target_intersecting_indices[source]#

property target_non_intersecting_indices[source]#

property target_polygons_areas[source]#

var(values, weights=None, area_weighting=True, distance_weighting=False, skip_na=True)[source][source]#

gpm.utils.zonal_stats.create_list_indices(source_polygons, target_polygons)[source][source]#

gpm.utils.zonal_stats.create_target_dictionary(target_indices, values)[source][source]#

gpm.utils.zonal_stats.split_dict(dictionary, npartitions)[source][source]#

Splits a dictionary into n_parts of approximately equal size.

Parameters:

input_dict (str) – The dictionary to split..
npartitions (int) – The number of parts to split the dictionary into..

Returns:

list_dicts – A list of dictionaries.

Return type: