gpm.utils package

Contents

gpm.utils package#

Submodules#

gpm.utils.archive module#

This module contains utilities for GPM Data Archiving.

gpm.utils.archive.check_archive_completeness(product, start_time, end_time, version=None, product_type='RS', download=True, transfer_tool='WGET', n_threads=4, verbose=True)[source]#

Check that the GPM product archive is not missing granules over a given period.

This function does not require connection to the PPS to search for the missing files. However, the start and end period are based on the first and last file found on disk !

If download=True, it attempt to download the missing granules.

Parameters:
  • product (str) – GPM product acronym.

  • start_time (datetime.datetime) – Start time.

  • end_time (datetime.datetime) – End time.

  • product_type (str, optional) – GPM product type. Either RS (Research) or NRT (Near-Real-Time).

  • version (int, optional) – GPM version of the data to retrieve if product_type = "RS". GPM data readers currently support version 4, 5, 6 and 7.

  • download (bool, optional) – Whether to download the missing files. The default is True.

  • n_threads (int, optional) – Number of parallel downloads. The default is set to 10.

  • transfer_tool (str, optional) – Whether to use curl or wget for data download. The default is curl.

  • verbose (bool, optional) – Whether to print processing details. The default is False.

gpm.utils.archive.check_no_duplicated_files(product, start_time, end_time, version=None, product_type='RS', verbose=True)[source]#

Check that there are not duplicated files based on granule number.

gpm.utils.archive.check_time_period_coverage(filepaths, start_time, end_time, raise_error=False)[source]#

Check time period start_time, end_time is covered.

If raise_error=True, raise error if time period is not covered. If raise_error=False, it raise a GPM warning.

gpm.utils.archive.get_time_period_with_missing_files(filepaths)[source]#

It returns the time period where the are missing granules.

It assumes the input filepaths are for a single GPM product.

Parameters:

filepaths (list) – List of GPM file paths.

Returns:

list_missing – List of tuple (start_time, end_time).

Return type:

list

gpm.utils.area module#

gpm.utils.area.get_quadmesh_vertices(x, y, order='counterclockwise')[source]#

Convert (x, y) 2D centroid coordinates array to (N*M, 4, 2) QuadMesh vertices.

The output vertices can be passed directly to a matplotlib.PolyCollection. For plotting with cartopy, the polygon order must be “counterclockwise”

Vertices are defined from the top left corner.

gpm.utils.checks module#

This module contains utilities to check GPM-API Dataset coordinates.

gpm.utils.checks.apply_on_valid_geolocation(function)[source]#

Decorator appliying the input function on valid geolocation GPM ORBIT slices.

gpm.utils.checks.check_contiguous_granules(xr_obj)[source]#

Check no missing granules in the GPM Dataset.

It assumes xr_obj is a GPM ORBIT object.

Parameters:

xr_obj (xr.Dataset or xr.DataArray) – xarray object.

gpm.utils.checks.check_contiguous_scans(xr_obj, verbose=True)[source]#

Check no missing scans across the along_track direction.

Note: - This sometimes occurs between orbit granules - This sometimes occurs within a orbit granule

Parameters:
  • xr_obj (xr.Dataset or xr.DataArray) – xarray object.

  • verbose (bool) – If True, it prints the time interval when the non contiguous scans occurs

Return type:

None.

gpm.utils.checks.check_missing_granules(xr_obj)[source]#

Check no missing granules in the GPM Dataset.

It assumes xr_obj is a GPM ORBIT object.

Parameters:

xr_obj (xr.Dataset or xr.DataArray) – xarray object.

gpm.utils.checks.check_regular_time(xr_obj, tolerance=None, verbose=True)[source]#

Check no missing timesteps for longer than ‘tolerance’ seconds.

Note: - This sometimes occurs between orbit/grid granules - This sometimes occurs within a orbit granule

Parameters:
  • xr_obj (xr.Dataset or xr.DataArray) – xarray object.

  • tolerance (np.timedelta, optional) – The timedelta tolerance to define regular vs. non-regular timesteps. The default is None. If GPM GRID object, it uses the first 2 timesteps to derive the tolerance timedelta. If GPM ORBIT object, it uses the ORBIT_TIME_TOLERANCE

  • verbose (bool) – If True, it prints the time interval when the non contiguous scans occurs. The default is True.

gpm.utils.checks.check_valid_geolocation(xr_obj, verbose=True)[source]#

Check no geolocation errors in the GPM Dataset.

Parameters:

xr_obj (xr.Dataset or xr.DataArray) – xarray object.

gpm.utils.checks.get_missing_granule_numbers(xr_obj)[source]#

Return ID numbers of missing granules.

It assumes xr_obj is a GPM ORBIT object.

gpm.utils.checks.get_slices_contiguous_granules(xr_obj, min_size=2)[source]#

Return a list of slices ensuring contiguous granules.

The minimum size of the output slices is 2.

Note: for GRID (i.e. IMERG) products, it checks for regular timesteps ! Note: No granule_id is provided for GRID products.

Parameters:
  • xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.

  • min_size (int) – Minimum size for a slice to be returned.

Returns:

list_slices – List of slice object to select contiguous granules. Output format: [slice(start,stop), slice(start,stop),…]

Return type:

list

gpm.utils.checks.get_slices_contiguous_scans(xr_obj, min_size=2, min_n_scans=3)[source]#

Return a list of slices ensuring contiguous scans (and granules).

It checks for contiguous scans only in the middle of the cross-track ! If a scan geolocation is NaN, it will be considered non-contiguous.

An input with less than 3 scans (along-track) returns an empty list, since scan contiguity can’t be verified. Consecutive non-contiguous scans are discarded and not included in the outputs. The minimum size of the output slices is 2.

Parameters:
  • xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.

  • min_size (int) – Minimum size for a slice to be returned.

Returns:

list_slices – List of slice object to select contiguous scans. Output format: [slice(start,stop), slice(start,stop),…]

Return type:

list

gpm.utils.checks.get_slices_non_contiguous_scans(xr_obj)[source]#

Return a list of slices where the scans discontinuity occurs.

An input with less than 2 scans (along-track) returns an empty list.

Parameters:

xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.

Returns:

list_slices – List of slice object to select discontiguous scans. Output format: [slice(start,stop), slice(start,stop),…]

Return type:

list

gpm.utils.checks.get_slices_non_regular_time(xr_obj, tolerance=None)[source]#

Return a list of slices where there are supposedly missing timesteps.

The output slices have size 2. An input with less than 2 scans (along-track) returns an empty list.

Parameters:
  • xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.

  • tolerance (np.timedelta, optional) – The timedelta tolerance to define regular vs. non-regular timesteps. The default is None. If GPM GRID object, it uses the first 2 timesteps to derive the tolerance timedelta. If GPM ORBIT object, it uses the ORBIT_TIME_TOLERANCE. It is discouraged to use this function for GPM ORBIT objects !

Returns:

list_slices – List of slice object to select intervals with non-regular timesteps. Output format: [slice(start,stop), slice(start,stop),…]

Return type:

list

gpm.utils.checks.get_slices_non_valid_geolocation(xr_obj)[source]#

Return a list of GPM ORBIT along-track slices with non-valid geolocation.

The minimum size of the output slices is 2.

If at a given cross-track index, there are always wrong geolocation, it discards such cross-track index(es) before identifying the along-track slices.

Parameters:
  • xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.

  • min_size (int) – Minimum size for a slice to be returned. The default is 1.

Returns:

list_slices – List of slice object with non-valid geolocation. Output format: [slice(start,stop), slice(start,stop),…]

Return type:

list

gpm.utils.checks.get_slices_non_wobbling_swath(xr_obj, threshold=100)[source]#

Return the GPM ORBIT along-track slices along which the swath is not wobbling.

For wobbling, we define the occurrence of changes in latitude directions in less than threshold scans. The function extract the along-track boundary on both swath sides and identify where the change in orbit direction occurs.

gpm.utils.checks.get_slices_regular(xr_obj, min_size=None, min_n_scans=3)[source]#

Return a list of slices to select regular GPM objects.

For GPM ORBITS, it returns slices to select contiguous scans with valid geolocation. For GPM GRID, it returns slices to select periods with regular timesteps.

For more information, read the documentation of: - gpm.utils.checks.get_slices_contiguous_scans - gpm.utils.checks.get_slices_regular_time

Parameters:
  • xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.

  • min_size (int) – Minimum size for a slice to be returned. If None, default to 1 for GRID objects, 2 for ORBIT objects.

  • min_n_scans (int) – Minimum number of scans to be able to check for scan contiguity. For visualization purpose, this value might want to be set to 2. This parameter applies only to ORBIT objects.

Returns:

list_slices – List of slice object to select regular portions. Output format: [slice(start,stop), slice(start,stop),…]

Return type:

list

gpm.utils.checks.get_slices_regular_time(xr_obj, tolerance=None, min_size=1)[source]#

Return a list of slices ensuring timesteps to be regular.

Output format: [slice(start,stop), slice(start,stop),…]

Consecutive non-regular timesteps leads to slices of size 1. An xarray object with a single timestep leads to a slice of size 1. If min_size=1 (the default), such slices are returned.

Parameters:
  • xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.

  • tolerance (np.timedelta, optional) – The timedelta tolerance to define regular vs. non-regular timesteps. The default is None. If GPM GRID object, it uses the first 2 timesteps to derive the tolerance timedelta. If GPM ORBIT object, it uses the ORBIT_TIME_TOLERANCE.

  • min_size (int) – Minimum size for a slice to be returned.

Returns:

list_slices – List of slice object to select regular time intervals. Output format: [slice(start,stop), slice(start,stop),…]

Return type:

list

gpm.utils.checks.get_slices_valid_geolocation(xr_obj, min_size=2)[source]#

Return a list of GPM ORBIT along-track slices with valid geolocation.

The minimum size of the output slices is 2.

If at a given cross-track index, there are always wrong geolocation, it discards such cross-track index(es) before identifying the along-track slices.

Parameters:
  • xr_obj ((xr.Dataset, xr.DataArray)) – GPM ORBIT xarray object.

  • min_size (int) – Minimum size for a slice to be returned. The default is 2.

Returns:

list_slices – List of slice object with valid geolocation. Output format: [slice(start,stop), slice(start,stop),…]

Return type:

list

gpm.utils.checks.get_slices_var_between(da, dim, vmin=-inf, vmax=inf, criteria='all')[source]#

Return a list of slices along the dim dimension where values are between the interval.

If the DataArray has additional dimensions, the “criteria” parameter is used to determine whether all values within each slice index must be between the interval (if set to “all”) or if at least one value within the slice index must be between the interval (if set to “any”).

gpm.utils.checks.get_slices_var_equals(da, dim, values, union=True, criteria='all')[source]#

Return a list of slices along the dim dimension where values occurs.

The function is applied recursively to each value in values. If the DataArray has additional dimensions, the “criteria” parameter is used to determine whether all values within each slice index must be equal to value (if set to “all”) or if at least one value within the slice index must be equal to value (if set to “any”).

If values are a list of values: - if union=True, it return slices corresponding to the sequence of consecutive values. - if union=False, it return slices for each value in values.

If union=False [0,0, 1, 1] with values=[0,1] will return [slice(0,2), slice(2,4)] If union=True [0,0, 1, 1] with values=[0,1] will return [slice(0,4)]

union matters when multiple values are specified criteria matters when the DataArray has multiple dimensions.

gpm.utils.checks.get_slices_wobbling_swath(xr_obj, threshold=100)[source]#

Return the GPM ORBIT along-track slices along which the swath is wobbling.

For wobbling, we define the occurrence of changes in latitude directions in less than threshold scans. The function extract the along-track boundary on both swath sides and identify where the change in orbit direction occurs.

gpm.utils.checks.has_contiguous_granules(xr_obj)[source]#

Checks GPM object is composed of consecutive granules.

For ORBIT objects, it checks the gpm_granule_id. For GRID objects, it checks timesteps regularity.

gpm.utils.checks.has_contiguous_scans(xr_obj)[source]#

Return True if all scans are contiguous. False otherwise.

gpm.utils.checks.has_missing_granules(xr_obj)[source]#

Checks GPM object has missing granules.

For ORBIT objects, it checks the gpm_granule_id. For GRID objects, it checks timesteps regularity.

gpm.utils.checks.has_regular_time(xr_obj)[source]#

Return True if all timesteps are regular. False otherwise.

gpm.utils.checks.has_valid_geolocation(xr_obj)[source]#

Checks GPM object has valid geolocation.

gpm.utils.checks.is_regular(xr_obj)[source]#

Checks the GPM object is regular.

For GPM ORBITS, it checks that the scans are contiguous. For GPM GRID, it checks that the timesteps are regularly spaced.

gpm.utils.collocation module#

This module contains utilities for GPM product collocation.

gpm.utils.collocation.collocate_product(ds, product, product_type='RS', version=None, scan_modes=None, variables=None, groups=None, verbose=True, decode_cf=True, chunks={})[source]#

Collocate a product on the provided dataset.

gpm.utils.dask module#

This module contains utilities for Dask Distributed processing.

gpm.utils.dask.clean_memory(client)[source]#

Call the garbage collector on each process.

See https://distributed.dask.org/en/latest/worker-memory.html#manually-trim-memory

gpm.utils.dask.get_client()[source]#
gpm.utils.dask.trim_memory() int[source]#

gpm.utils.decorators module#

This module contains functions decorators checking GPM-API object type.

gpm.utils.decorators.check_has_along_track_dimension(function)[source]#

Check that the along-track dimension is available.

If not available, raise an error.

gpm.utils.decorators.check_has_cross_track_dimension(function)[source]#

Check that the cross-track dimension is available.

If not available, raise an error.

gpm.utils.decorators.check_is_gpm_object(function)[source]#

Decorator function to check if input is a GPM object. Raise ValueError if not.

gpm.utils.decorators.check_is_grid(function)[source]#

Decorator function to check if input is a GPM GRID object. Raise ValueError if not.

gpm.utils.decorators.check_is_orbit(function)[source]#

Decorator function to check if input is a GPM ORBIT object. Raise ValueError if not.

gpm.utils.geospatial module#

This module contains functions for geospatial processing.

class gpm.utils.geospatial.Extent(xmin, xmax, ymin, ymax)#

Bases: tuple

xmax#

Alias for field number 1

xmin#

Alias for field number 0

ymax#

Alias for field number 3

ymin#

Alias for field number 2

gpm.utils.geospatial.crop(xr_obj, extent)[source]#

Crop a xarray object based on the provided bounding box.

Parameters:
  • xr_obj (xr.DataArray or xr.Dataset) – xarray object.

  • extent (list or tuple) – The bounding box over which to crop the xarray object. extent must follow the matplotlib and cartopy extent conventions: extent = [x_min, x_max, y_min, y_max]

Returns:

xr_obj – Cropped xarray object.

Return type:

xr.DataArray or xr.Dataset

gpm.utils.geospatial.crop_by_continent(xr_obj, name: str)[source]#

Crop an xarray object based on the specified continent name.

Parameters:
  • xr_obj (xr.DataArray or xr.Dataset) – xarray object.

  • name (str) – Continent name.

Returns:

xr_obj – Cropped xarray object.

Return type:

xr.DataArray or xr.Dataset

gpm.utils.geospatial.crop_by_country(xr_obj, name: str)[source]#

Crop an xarray object based on the specified country name.

Parameters:
  • xr_obj (xr.DataArray or xr.Dataset) – xarray object.

  • name (str) – Country name.

Returns:

xr_obj – Cropped xarray object.

Return type:

xr.DataArray or xr.Dataset

gpm.utils.geospatial.extend_geographic_extent(extent, padding: int | float | tuple | list = 0)[source]#

Extend the lat/lon extent by x degrees in every direction.

Parameters:
  • extent ((tuple)) – A tuple of four values representing the lat/lon extent. The extent format must be [xmin, xmax, ymin, ymax]

  • padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom).

Returns:

The extended extent.

Return type:

new_extent, tuple

gpm.utils.geospatial.get_continent_extent(name: str, padding: int | float | tuple | list = 0)[source]#

Retrieves the extent of a continent.

Parameters:
  • name (str) – The name of the continent.

  • padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). Default is 0.

Returns:

extent – A tuple containing the longitude and latitude extent of the continent.

Return type:

tuple

Raises:
  • TypeError: – If the continent name is not provided as a string.

  • ValueError: – If the provided continent name is not valid or does not match any continent. If a similar continent name is found and suggested as a possible match.

gpm.utils.geospatial.get_country_extent(name, padding=0.2)[source]#

Retrieves the extent of a country.

Parameters:
  • name (str) – The name of the country.

  • padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). Default is 0.2.

Returns:

extent – A tuple containing the longitude and latitude extent of the country.

Return type:

tuple

Raises:
  • TypeError – If the country name is not provided as a string.

  • ValueError – If the country name is not valid or if there is no matching country.

Notes

This function retrieves the extent of a country from a dictionary of country extents. The country extent is defined as the longitude and latitude range that encompasses the country’s borders. The extent is returned as a tuple of four values: (xmin, xmax, ymin, ymax). The extent can be optionally padded by specifying the padding parameter.

gpm.utils.geospatial.get_crop_slices_by_continent(xr_obj, name)[source]#

Compute the xarray object slices which are within the specified continent.

If the input is a GPM Orbit, it returns a list of along-track slices If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.

Parameters:
  • xr_obj (xr.DataArray or xr.Dataset) – xarray object.

  • name (str) – Continent name.

gpm.utils.geospatial.get_crop_slices_by_country(xr_obj, name)[source]#

Compute the xarray object slices which are within the specified country.

If the input is a GPM Orbit, it returns a list of along-track slices If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.

Parameters:
  • xr_obj (xr.DataArray or xr.Dataset) – xarray object.

  • name (str) – Country name.

gpm.utils.geospatial.get_crop_slices_by_extent(xr_obj, extent)[source]#

Compute the xarray object slices which are within the specified extent.

If the input is a GPM Orbit, it returns a list of along-track slices If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.

Parameters:
  • xr_obj (xr.DataArray or xr.Dataset) – xarray object.

  • extent (list or tuple) – The extent over which to crop the xarray object. extent must follow the matplotlib and cartopy conventions: extent = [x_min, x_max, y_min, y_max]

gpm.utils.geospatial.get_extent(xr_obj, padding: int | float | tuple | list = 0)[source]#

Get the geographic extent from an xarray object.

Parameters:
  • xr_obj (xr.DataArray or xr.Dataset) – xarray object.

  • padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). Default is 0.

Returns:

extent – A tuple containing the longitude and latitude extent of the xarray object. The extent follows the matplotlib/cartopy format (xmin, xmax, ymin, ymax)

Return type:

tuple

gpm.utils.geospatial.read_continents_extent_dictionary()[source]#

Read and return a dictionary containing the extents of continents.

Returns:

dict

Return type:

A dictionary containing the extents of continents.

gpm.utils.geospatial.read_countries_extent_dictionary()[source]#

Reads a YAML file containing countries extent information and returns it as a dictionary.

Returns:

dict

Return type:

A dictionary containing countries extent information.

gpm.utils.geospatial.unwrap_longitude_degree(x, period=360)[source]#

Unwrap longitude array.

gpm.utils.list module#

This module contains functions for list processing.

gpm.utils.list.flatten_list(nested_list)[source]#

Flatten a nested list into a single-level list.

gpm.utils.manipulations module#

This module contains functions for manipulating GPM-API Datasets.

gpm.utils.manipulations.check_variable_availabilty(ds, variable, argname)[source]#
gpm.utils.manipulations.create_bin_idx_data_array(xr_obj)[source]#

Create a 3D DataArray with the bin index along the range dimension.

The GPM bin index start at 1 ! GPM bin index is equivalent to gpm_range_id + 1

gpm.utils.manipulations.get_bright_band_mask(ds)[source]#

Retrieve bright band mask defined by binBBBottom and binBBTop.

The bin is numerated from top to bottom. binBBTop has lower values than binBBBottom. binBBBottom and binBBTop are 0 when bright band limit is not detected !

gpm.utils.manipulations.get_dims_without(da, dims)[source]#

Remove specified ‘dims’ for list of DataArray dimensions.

gpm.utils.manipulations.get_height_at_bin(xr_obj, bin)[source]#
gpm.utils.manipulations.get_height_at_temperature(da_height, da_temperature, temperature)[source]#

Retrieve height at a specific temperature.

gpm.utils.manipulations.get_liquid_phase_mask(ds)[source]#

Retrieve the mask of the liquid phase profile.

gpm.utils.manipulations.get_range_axis(da)[source]#

Get range dimension axis index.

gpm.utils.manipulations.get_range_index_at_max(da)[source]#

Retrieve index along the range dimension where the DataArray has maximum values.

gpm.utils.manipulations.get_range_index_at_min(da)[source]#

Retrieve index along the range dimension where the DataArray has minimum values.

gpm.utils.manipulations.get_range_index_at_value(da, value)[source]#

Retrieve index along the range dimension where the DataArray values is closest to value.

gpm.utils.manipulations.get_range_slices_with_valid_data(xr_obj, variable=None)[source]#

Get the vertical (‘range’/’height’) slices with valid data.

gpm.utils.manipulations.get_range_slices_within_values(xr_obj, variable=None, vmin=-inf, vmax=inf)[source]#

Get the ‘range’ slices with data within a given data interval.

gpm.utils.manipulations.get_solid_phase_mask(ds)[source]#

Retrieve the mask of the solid phase profile.

gpm.utils.manipulations.get_variable_at_bin(xr_obj, bin, variable=None)[source]#

Retrieve variable values at range bin provided by bin_variable.

Assume bin values goes from 1 to 176.

gpm.utils.manipulations.get_variable_dataarray(xr_obj, variable)[source]#
gpm.utils.manipulations.get_xr_shape(xr_obj, dims)[source]#

Get xarray shape for specific dimensions.

gpm.utils.manipulations.integrate_profile_concentration(dataarray, name, scale_factor=None, units=None)[source]#

Utility to convert LWC or IWC to LWP or IWP.

Input data have unit g/m³. Output data will have unit kg/m² if scale_factor=1000

height a list or array of corresponding heights for each level.

gpm.utils.manipulations.select_radar_frequency(xr_obj, radar_frequency)[source]#

Select data related to a specific radar frequency.

gpm.utils.manipulations.select_spatial_2d_variables(ds, strict=False, squeeze=True)[source]#

Return xr.Dataset with only 2D spatial variables.

gpm.utils.manipulations.select_spatial_3d_variables(ds, strict=False, squeeze=True)[source]#

Return xr.Dataset with only 3D spatial variables.

gpm.utils.manipulations.select_transect_variables(ds, strict=False, squeeze=True)[source]#

Return xr.Dataset with only transect variables.

gpm.utils.manipulations.slice_range_at_height(xr_obj, height)[source]#

Slice the 3D array at a given height.

gpm.utils.manipulations.slice_range_at_max_value(xr_obj, variable=None)[source]#

Slice the 3D arrays where the variable values are at maximum.

gpm.utils.manipulations.slice_range_at_min_value(xr_obj, variable=None)[source]#

Slice the 3D arrays where the variable values are at minimum.

gpm.utils.manipulations.slice_range_at_temperature(ds, temperature, variable_temperature='airTemperature')[source]#

Slice the 3D arrays along a specific isotherm.

gpm.utils.manipulations.slice_range_at_value(xr_obj, value, variable=None)[source]#

Slice the 3D arrays where the variable values are close to value.

gpm.utils.manipulations.slice_range_where_values(xr_obj, variable=None, vmin=-inf, vmax=inf)[source]#

Select the ‘range’ interval where values are within the [vmin, vmax] interval.

gpm.utils.manipulations.slice_range_with_valid_data(xr_obj, variable=None)[source]#

Select the ‘range’ interval with valid data.

gpm.utils.parallel module#

This module contains utilities for parallel processing.

gpm.utils.parallel.compute_list_delayed(list_delayed, max_concurrent_tasks=None)[source]#

Compute the list of Dask delayed objects in blocks of max_concurrent_tasks.

Parameters:
  • (list) (list_results) –

  • (int) (max_concurrent_task) –

Returns:

list

Return type:

List of computed results.

gpm.utils.pyresample module#

This module contains pyresample utility functions.

gpm.utils.pyresample.get_pyresample_area(xr_obj)[source]#

It returns the corresponding pyresample area.

gpm.utils.pyresample.remap(src_ds, dst_ds, radius_of_influence=20000, fill_value=nan)[source]#

Remap data from one dataset to another one.

gpm.utils.slices module#

This module contains utilities for list of slices processing.

gpm.utils.slices.enlarge_slice(slc, min_size, min_start=0, max_stop=inf)[source]#

Enlarge a slice object to have at least a size of min_size.

The function enforces the left and right bounds of the slice by max_stop and min_start. If the original slice size is larger than min_size, the original slice will be returned.

Parameters:
  • slc (slice) – The original slice object to be enlarged.

  • min_size (min_size) – The desired minimum size of the new slice.

  • min_start (int, optional) – The minimum value for the start of the new slice. The default is 0.

  • max_stop (int) – The maximum value for the stop of the new slice. The default is np.inf.

Returns:

The new slice object with a size of at least min_size and respecting the left and right bounds.

Return type:

slice

gpm.utils.slices.enlarge_slices(list_slices, min_size, valid_shape)[source]#

Enlarge a list of slice object to have at least a size of min_size.

The function enforces the left and right bounds of the slice to be between 0 and valid_shape. If the original slice size is larger than min_size, the original slice will be returned.

Parameters:
  • list_slices (list) – List of slice objects.

  • min_size ((int or tuple)) – Minimum size of the output slice.

  • valid_shape ((int or tuple)) – The shape of the array which the slices should be valid on.

Returns:

list_slices – The list of slices after enlarging it (if necessary).

Return type:

list

gpm.utils.slices.ensure_is_slice(slc)[source]#
gpm.utils.slices.get_indices_from_list_slices(list_slices, check_non_intersecting=True)[source]#

Return a numpy array of indices from a list of slices.

gpm.utils.slices.get_list_slices_from_bool_arr(bool_arr, include_false=True, skip_consecutive_false=True)[source]#

Return the slices corresponding to sequences of True in the input arrays.

If include_false=True, the last element of each slice sequence (except the last) will be False. If include_false=False, no element in each slice sequence will be False. If skip_consecutive_false=True (default), the first element of each slice must be a True. If skip_consecutive_false=False, it returns also slices of size 1 which selects just the False value. If include_false = False, skip_consecutive_false is automatically True.

Examples

If include_false=True and skip_consecutive_false=False: –> [False, False] –> [slice(0,1), slice(1,2)] If include_false=True and skip_consecutive_false=True: –> [False, False] –> [] –> [False, False, True] –> [slice(2,3)] –> [False, False, True, False] –> [slice(2,4)] If include_false=False: –> [False, False, True, False] –> [slice(2,3)]

gpm.utils.slices.get_list_slices_from_indices(indices)[source]#

Return a list of slices from a list/array of integer indices.

Example:#

[0,1,2,4,5,8] –> [slices(0,3),slice(4,6), slice(8,9)]

gpm.utils.slices.get_slice_from_idx_bounds(idx_start, idx_end)[source]#

Return the slice required to include the idx bounds.

gpm.utils.slices.get_slice_size(slc)[source]#

Get size of the slice.

Note: The actual slice size must not be representative of the true slice if slice.stop is larger than the length of object to be sliced.

gpm.utils.slices.list_slices_combine(*args)[source]#

Combine together a list of list_slices, without any additional operation.

gpm.utils.slices.list_slices_difference(list_slices1, list_slices2)[source]#

Return the list of slices covered by list_slices1 not intersecting list_slices2.

gpm.utils.slices.list_slices_filter(list_slices, min_size=None, max_size=None)[source]#

Filter list of slices by size.

gpm.utils.slices.list_slices_flatten(list_slices)[source]#

Flatten out list of slices with 2 nested level.

Examples

[[slice(1, 7934, None)], [slice(1, 2, None)]] –> [slice(1, 7934, None), slice(1, 2, None)] [slice(1, 7934, None), slice(1, 2, None)] –> [slice(1, 7934, None), slice(1, 2, None)]

gpm.utils.slices.list_slices_intersection(*args, min_size=1)[source]#

Return the intersecting slices from multiple list of slices.

gpm.utils.slices.list_slices_simplify(list_slices)[source]#

Simplify list of of sequential slices.

Example 1: [slice(0,2), slice(2,4)] –> [slice(0,4)]

gpm.utils.slices.list_slices_sort(*args)[source]#

Sort a single or multiple list of slices by slice.start.

It output a single list of slices!

gpm.utils.slices.list_slices_union(*args)[source]#

Return the union slices from multiple list of slices.

gpm.utils.slices.pad_slice(slc, padding, min_start=0, max_stop=inf)[source]#

Increase/decrease the slice with the padding argument.

Does not ensure that all output slices have same size.

Parameters:
  • slc (slice) – Slice objects.

  • padding (int) – Padding to be applied to the slice.

  • min_start (int, optional) – The minimum value for the start of the new slice. The default is 0.

  • max_stop (int) – The maximum value for the stop of the new slice. The default is np.inf.

Returns:

list_slices – The list of slices after applying padding.

Return type:

TYPE

gpm.utils.slices.pad_slices(list_slices, padding, valid_shape)[source]#

Increase/decrease the list of slices with the padding argument.

Parameters:
  • list_slices (list) – List of slice objects.

  • padding ((int or tuple)) – Padding to be applied on each slice.

  • valid_shape ((int or tuple)) – The shape of the array which the slices should be valid on.

Returns:

list_slices – The list of slices after applying padding.

Return type:

TYPE

gpm.utils.time module#

This module contains utilities for time processing.

gpm.utils.time.ensure_time_validity(xr_obj, limit=10)[source]#

Attempt to correct the time coordinate if less than ‘limit’ consecutive NaT values are present.

It raise a ValueError if more than consecutive NaT occurs.

Parameters:

xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.

Returns:

xr_obj – GPM xarray object.

Return type:

(xr.Dataset, xr.DataArray)

gpm.utils.time.get_dataset_start_end_time(ds: Dataset, time_dim='time')[source]#

Retrieves dataset starting and ending time.

Parameters:
  • ds (xr.Dataset) – Input dataset

  • time_dim (str) – Name of the time dimension. The default is “time”.

Returns:

(starting_time, ending_time)

Return type:

tuple

gpm.utils.time.has_nat(timesteps)[source]#

Return True if any of the timesteps is NaT.

gpm.utils.time.infill_timesteps(timesteps, limit)[source]#

Infill missing timesteps if less than <limit> consecutive.

gpm.utils.time.interpolate_nat(timesteps, method='linear', limit=5, limit_direction=None, limit_area=None)[source]#

Fill NaT values using an interpolation method.

Parameters:
  • method (str, default 'linear') – Interpolation technique to use. One of: * ‘linear’: Treat the timesteps as equally spaced. * ‘pad’: Fill in NaTs using existing values. * ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g. interpolate_nat(method='polynomial', order=5). * ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes in https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html

  • limit (int, optional) – Maximum number of consecutive NaTs to fill. Must be greater than 0.

  • limit_direction ({{'forward', 'backward', 'both'}}, Optional) –

    Consecutive NaTs will be filled in this direction.

    If limit is specified:
    • If ‘method’ is ‘pad’ or ‘ffill’, ‘limit_direction’ must be ‘forward’.

    • If ‘method’ is ‘backfill’ or ‘bfill’, ‘limit_direction’ must be ‘backwards’.

    If ‘limit’ is not specified:
    • If ‘method’ is ‘backfill’ or ‘bfill’, the default is ‘backward’ else the default is ‘forward’

  • limit_area ({{None, ‘inside’, ‘outside’}}, default None) – If limit is specified, consecutive NaTs will be filled with this restriction. * None: No fill restriction. * ‘inside’: Only fill NaTs surrounded by valid values (interpolate). * ‘outside’: Only fill NaTs outside valid values (extrapolate).

Notes

Depending on the interpolation method (i.e. linear) the infilled values could have ns resolution. For further information refers to https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html

Returns:

Timesteps array of type datetime64[ns]

Return type:

timesteps, np.array

gpm.utils.time.is_nat(timesteps)[source]#

Return a boolean array indicating timesteps which are NaT.

gpm.utils.time.regularize_dataset(ds: ~xarray.core.dataset.Dataset, freq: str, time_dim: str = 'time', method: str | None = None, fill_value=<NA>)[source]#

Regularize a dataset across time dimension with uniform resolution.

Parameters:
  • ds (xr.Dataset) – xarray Dataset.

  • time_dim (str, optional) – The time dimension in the xr.Dataset. The default is "time".

  • freq (str) – The freq string to pass to pd.date_range to define the new time coordinates. Examples: freq="2min".

  • method (str, optional) – Method to use for filling missing timesteps. If None, fill with fill_value. The default is None. For other possible methods, see https://docs.xarray.dev/en/stable/generated/xarray.Dataset.reindex.html

  • fill_value (float, optional) – Fill value to fill missing timesteps. The default is dtypes.NA.

Returns:

ds_reindexed – Regularized dataset.

Return type:

xr.Dataset

gpm.utils.time.subset_by_time(xr_obj, start_time=None, end_time=None)[source]#

Filter a GPM xarray object by start_time and end_time.

Parameters:
  • xr_obj – A xarray object.

  • start_time (datetime.datetime) – Start time. By default is None

  • end_time (datetime.datetime) – End time. By default is None

Returns:

xr_obj – GPM xarray object

Return type:

(xr.Dataset, xr.DataArray)

gpm.utils.time.subset_by_time_slice(xr_obj, slice)[source]#

gpm.utils.timing module#

This module contains decorators which measure the function time of execuution.

gpm.utils.timing.print_elapsed_time(fn)[source]#
gpm.utils.timing.print_task_elapsed_time(prefix=' - ')[source]#

gpm.utils.warnings module#

This module defines GPM Warning classes.

exception gpm.utils.warnings.GPMDownloadWarning(message)[source]#

Bases: Warning

exception gpm.utils.warnings.GPM_Warning(message)[source]#

Bases: Warning

gpm.utils.yaml module#

This module defines a YAML file reader and writer.

gpm.utils.yaml.read_yaml(filepath: str) dict[source]#

Read a YAML file into a dictionary.

Parameters:

filepath (str) – Input YAML file path.

Returns:

Dictionary with the attributes read from the YAML file.

Return type:

dict

gpm.utils.yaml.write_yaml(dictionary, filepath, sort_keys=False)[source]#

Write a dictionary into a YAML file.

Parameters:

dictionary (dict) – Dictionary to write into a YAML file.

Module contents#

This directory contains the GPM-API utility functions.