gpm.utils package#
Submodules#
gpm.utils.archive module#
This module contains utilities for GPM Data Archiving.
- gpm.utils.archive.check_archive_completeness(product, start_time, end_time, version=None, product_type='RS', download=True, transfer_tool='WGET', n_threads=4, verbose=True)[source]#
Check that the GPM product archive is not missing granules over a given period.
This function does not require connection to the PPS to search for the missing files. However, the start and end period are based on the first and last file found on disk !
If download=True, it attempt to download the missing granules.
- Parameters:
product (str) – GPM product acronym.
start_time (datetime.datetime) – Start time.
end_time (datetime.datetime) – End time.
product_type (str, optional) – GPM product type. Either
RS
(Research) orNRT
(Near-Real-Time).version (int, optional) – GPM version of the data to retrieve if
product_type = "RS"
. GPM data readers currently support version 4, 5, 6 and 7.download (bool, optional) – Whether to download the missing files. The default is
True
.n_threads (int, optional) – Number of parallel downloads. The default is set to 10.
transfer_tool (str, optional) – Whether to use
curl
orwget
for data download. The default iscurl
.verbose (bool, optional) – Whether to print processing details. The default is
False
.
- gpm.utils.archive.check_no_duplicated_files(product, start_time, end_time, version=None, product_type='RS', verbose=True)[source]#
Check that there are not duplicated files based on granule number.
- gpm.utils.archive.check_time_period_coverage(filepaths, start_time, end_time, raise_error=False)[source]#
Check time period start_time, end_time is covered.
If raise_error=True, raise error if time period is not covered. If raise_error=False, it raise a GPM warning.
- gpm.utils.archive.get_time_period_with_missing_files(filepaths)[source]#
It returns the time period where the are missing granules.
It assumes the input filepaths are for a single GPM product.
- Parameters:
filepaths (list) – List of GPM file paths.
- Returns:
list_missing – List of tuple (start_time, end_time).
- Return type:
list
gpm.utils.area module#
- gpm.utils.area.get_quadmesh_vertices(x, y, order='counterclockwise')[source]#
Convert (x, y) 2D centroid coordinates array to (N*M, 4, 2) QuadMesh vertices.
The output vertices can be passed directly to a matplotlib.PolyCollection. For plotting with cartopy, the polygon order must be “counterclockwise”
Vertices are defined from the top left corner.
gpm.utils.checks module#
This module contains utilities to check GPM-API Dataset coordinates.
- gpm.utils.checks.apply_on_valid_geolocation(function)[source]#
Decorator appliying the input function on valid geolocation GPM ORBIT slices.
- gpm.utils.checks.check_contiguous_granules(xr_obj)[source]#
Check no missing granules in the GPM Dataset.
It assumes xr_obj is a GPM ORBIT object.
- Parameters:
xr_obj (xr.Dataset or xr.DataArray) – xarray object.
- gpm.utils.checks.check_contiguous_scans(xr_obj, verbose=True)[source]#
Check no missing scans across the along_track direction.
Note: - This sometimes occurs between orbit granules - This sometimes occurs within a orbit granule
- Parameters:
xr_obj (xr.Dataset or xr.DataArray) – xarray object.
verbose (bool) – If True, it prints the time interval when the non contiguous scans occurs
- Return type:
None.
- gpm.utils.checks.check_missing_granules(xr_obj)[source]#
Check no missing granules in the GPM Dataset.
It assumes xr_obj is a GPM ORBIT object.
- Parameters:
xr_obj (xr.Dataset or xr.DataArray) – xarray object.
- gpm.utils.checks.check_regular_time(xr_obj, tolerance=None, verbose=True)[source]#
Check no missing timesteps for longer than ‘tolerance’ seconds.
Note: - This sometimes occurs between orbit/grid granules - This sometimes occurs within a orbit granule
- Parameters:
xr_obj (xr.Dataset or xr.DataArray) – xarray object.
tolerance (np.timedelta, optional) – The timedelta tolerance to define regular vs. non-regular timesteps. The default is
None
. If GPM GRID object, it uses the first 2 timesteps to derive the tolerance timedelta. If GPM ORBIT object, it uses the ORBIT_TIME_TOLERANCEverbose (bool) – If True, it prints the time interval when the non contiguous scans occurs. The default is
True
.
- gpm.utils.checks.check_valid_geolocation(xr_obj, verbose=True)[source]#
Check no geolocation errors in the GPM Dataset.
- Parameters:
xr_obj (xr.Dataset or xr.DataArray) – xarray object.
- gpm.utils.checks.get_missing_granule_numbers(xr_obj)[source]#
Return ID numbers of missing granules.
It assumes xr_obj is a GPM ORBIT object.
- gpm.utils.checks.get_slices_contiguous_granules(xr_obj, min_size=2)[source]#
Return a list of slices ensuring contiguous granules.
The minimum size of the output slices is 2.
Note: for GRID (i.e. IMERG) products, it checks for regular timesteps ! Note: No granule_id is provided for GRID products.
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.
min_size (int) – Minimum size for a slice to be returned.
- Returns:
list_slices – List of slice object to select contiguous granules. Output format: [slice(start,stop), slice(start,stop),…]
- Return type:
list
- gpm.utils.checks.get_slices_contiguous_scans(xr_obj, min_size=2, min_n_scans=3)[source]#
Return a list of slices ensuring contiguous scans (and granules).
It checks for contiguous scans only in the middle of the cross-track ! If a scan geolocation is NaN, it will be considered non-contiguous.
An input with less than 3 scans (along-track) returns an empty list, since scan contiguity can’t be verified. Consecutive non-contiguous scans are discarded and not included in the outputs. The minimum size of the output slices is 2.
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.
min_size (int) – Minimum size for a slice to be returned.
- Returns:
list_slices – List of slice object to select contiguous scans. Output format: [slice(start,stop), slice(start,stop),…]
- Return type:
list
- gpm.utils.checks.get_slices_non_contiguous_scans(xr_obj)[source]#
Return a list of slices where the scans discontinuity occurs.
An input with less than 2 scans (along-track) returns an empty list.
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.
- Returns:
list_slices – List of slice object to select discontiguous scans. Output format: [slice(start,stop), slice(start,stop),…]
- Return type:
list
- gpm.utils.checks.get_slices_non_regular_time(xr_obj, tolerance=None)[source]#
Return a list of slices where there are supposedly missing timesteps.
The output slices have size 2. An input with less than 2 scans (along-track) returns an empty list.
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.
tolerance (np.timedelta, optional) – The timedelta tolerance to define regular vs. non-regular timesteps. The default is
None
. If GPM GRID object, it uses the first 2 timesteps to derive the tolerance timedelta. If GPM ORBIT object, it uses the ORBIT_TIME_TOLERANCE. It is discouraged to use this function for GPM ORBIT objects !
- Returns:
list_slices – List of slice object to select intervals with non-regular timesteps. Output format: [slice(start,stop), slice(start,stop),…]
- Return type:
list
- gpm.utils.checks.get_slices_non_valid_geolocation(xr_obj)[source]#
Return a list of GPM ORBIT along-track slices with non-valid geolocation.
The minimum size of the output slices is 2.
If at a given cross-track index, there are always wrong geolocation, it discards such cross-track index(es) before identifying the along-track slices.
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.
min_size (int) – Minimum size for a slice to be returned. The default is 1.
- Returns:
list_slices – List of slice object with non-valid geolocation. Output format: [slice(start,stop), slice(start,stop),…]
- Return type:
list
- gpm.utils.checks.get_slices_non_wobbling_swath(xr_obj, threshold=100)[source]#
Return the GPM ORBIT along-track slices along which the swath is not wobbling.
For wobbling, we define the occurrence of changes in latitude directions in less than threshold scans. The function extract the along-track boundary on both swath sides and identify where the change in orbit direction occurs.
- gpm.utils.checks.get_slices_regular(xr_obj, min_size=None, min_n_scans=3)[source]#
Return a list of slices to select regular GPM objects.
For GPM ORBITS, it returns slices to select contiguous scans with valid geolocation. For GPM GRID, it returns slices to select periods with regular timesteps.
For more information, read the documentation of: - gpm.utils.checks.get_slices_contiguous_scans - gpm.utils.checks.get_slices_regular_time
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.
min_size (int) – Minimum size for a slice to be returned. If
None
, default to 1 for GRID objects, 2 for ORBIT objects.min_n_scans (int) – Minimum number of scans to be able to check for scan contiguity. For visualization purpose, this value might want to be set to 2. This parameter applies only to ORBIT objects.
- Returns:
list_slices – List of slice object to select regular portions. Output format: [slice(start,stop), slice(start,stop),…]
- Return type:
list
- gpm.utils.checks.get_slices_regular_time(xr_obj, tolerance=None, min_size=1)[source]#
Return a list of slices ensuring timesteps to be regular.
Output format: [slice(start,stop), slice(start,stop),…]
Consecutive non-regular timesteps leads to slices of size 1. An xarray object with a single timestep leads to a slice of size 1. If min_size=1 (the default), such slices are returned.
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.
tolerance (np.timedelta, optional) – The timedelta tolerance to define regular vs. non-regular timesteps. The default is
None
. If GPM GRID object, it uses the first 2 timesteps to derive the tolerance timedelta. If GPM ORBIT object, it uses the ORBIT_TIME_TOLERANCE.min_size (int) – Minimum size for a slice to be returned.
- Returns:
list_slices – List of slice object to select regular time intervals. Output format: [slice(start,stop), slice(start,stop),…]
- Return type:
list
- gpm.utils.checks.get_slices_valid_geolocation(xr_obj, min_size=2)[source]#
Return a list of GPM ORBIT along-track slices with valid geolocation.
The minimum size of the output slices is 2.
If at a given cross-track index, there are always wrong geolocation, it discards such cross-track index(es) before identifying the along-track slices.
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM ORBIT xarray object.
min_size (int) – Minimum size for a slice to be returned. The default is 2.
- Returns:
list_slices – List of slice object with valid geolocation. Output format: [slice(start,stop), slice(start,stop),…]
- Return type:
list
- gpm.utils.checks.get_slices_var_between(da, dim, vmin=-inf, vmax=inf, criteria='all')[source]#
Return a list of slices along the dim dimension where values are between the interval.
If the DataArray has additional dimensions, the “criteria” parameter is used to determine whether all values within each slice index must be between the interval (if set to “all”) or if at least one value within the slice index must be between the interval (if set to “any”).
- gpm.utils.checks.get_slices_var_equals(da, dim, values, union=True, criteria='all')[source]#
Return a list of slices along the dim dimension where values occurs.
The function is applied recursively to each value in values. If the DataArray has additional dimensions, the “criteria” parameter is used to determine whether all values within each slice index must be equal to value (if set to “all”) or if at least one value within the slice index must be equal to value (if set to “any”).
If values are a list of values: - if union=True, it return slices corresponding to the sequence of consecutive values. - if union=False, it return slices for each value in values.
If union=False [0,0, 1, 1] with values=[0,1] will return [slice(0,2), slice(2,4)] If union=True [0,0, 1, 1] with values=[0,1] will return [slice(0,4)]
union matters when multiple values are specified criteria matters when the DataArray has multiple dimensions.
- gpm.utils.checks.get_slices_wobbling_swath(xr_obj, threshold=100)[source]#
Return the GPM ORBIT along-track slices along which the swath is wobbling.
For wobbling, we define the occurrence of changes in latitude directions in less than threshold scans. The function extract the along-track boundary on both swath sides and identify where the change in orbit direction occurs.
- gpm.utils.checks.has_contiguous_granules(xr_obj)[source]#
Checks GPM object is composed of consecutive granules.
For ORBIT objects, it checks the gpm_granule_id. For GRID objects, it checks timesteps regularity.
- gpm.utils.checks.has_contiguous_scans(xr_obj)[source]#
Return True if all scans are contiguous. False otherwise.
- gpm.utils.checks.has_missing_granules(xr_obj)[source]#
Checks GPM object has missing granules.
For ORBIT objects, it checks the gpm_granule_id. For GRID objects, it checks timesteps regularity.
gpm.utils.collocation module#
This module contains utilities for GPM product collocation.
gpm.utils.dask module#
This module contains utilities for Dask Distributed processing.
- gpm.utils.dask.clean_memory(client)[source]#
Call the garbage collector on each process.
See https://distributed.dask.org/en/latest/worker-memory.html#manually-trim-memory
gpm.utils.decorators module#
This module contains functions decorators checking GPM-API object type.
- gpm.utils.decorators.check_has_along_track_dimension(function)[source]#
Check that the along-track dimension is available.
If not available, raise an error.
- gpm.utils.decorators.check_has_cross_track_dimension(function)[source]#
Check that the cross-track dimension is available.
If not available, raise an error.
- gpm.utils.decorators.check_is_gpm_object(function)[source]#
Decorator function to check if input is a GPM object. Raise ValueError if not.
gpm.utils.geospatial module#
This module contains functions for geospatial processing.
- class gpm.utils.geospatial.Extent(xmin, xmax, ymin, ymax)#
Bases:
tuple
- xmax#
Alias for field number 1
- xmin#
Alias for field number 0
- ymax#
Alias for field number 3
- ymin#
Alias for field number 2
- gpm.utils.geospatial.crop(xr_obj, extent)[source]#
Crop a xarray object based on the provided bounding box.
- Parameters:
xr_obj (xr.DataArray or xr.Dataset) – xarray object.
extent (list or tuple) – The bounding box over which to crop the xarray object. extent must follow the matplotlib and cartopy extent conventions: extent = [x_min, x_max, y_min, y_max]
- Returns:
xr_obj – Cropped xarray object.
- Return type:
xr.DataArray or xr.Dataset
- gpm.utils.geospatial.crop_by_continent(xr_obj, name: str)[source]#
Crop an xarray object based on the specified continent name.
- Parameters:
xr_obj (xr.DataArray or xr.Dataset) – xarray object.
name (str) – Continent name.
- Returns:
xr_obj – Cropped xarray object.
- Return type:
xr.DataArray or xr.Dataset
- gpm.utils.geospatial.crop_by_country(xr_obj, name: str)[source]#
Crop an xarray object based on the specified country name.
- Parameters:
xr_obj (xr.DataArray or xr.Dataset) – xarray object.
name (str) – Country name.
- Returns:
xr_obj – Cropped xarray object.
- Return type:
xr.DataArray or xr.Dataset
- gpm.utils.geospatial.extend_geographic_extent(extent, padding: int | float | tuple | list = 0)[source]#
Extend the lat/lon extent by x degrees in every direction.
- Parameters:
extent ((tuple)) – A tuple of four values representing the lat/lon extent. The extent format must be [xmin, xmax, ymin, ymax]
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom).
- Returns:
The extended extent.
- Return type:
new_extent, tuple
- gpm.utils.geospatial.get_continent_extent(name: str, padding: int | float | tuple | list = 0)[source]#
Retrieves the extent of a continent.
- Parameters:
name (str) – The name of the continent.
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). Default is 0.
- Returns:
extent – A tuple containing the longitude and latitude extent of the continent.
- Return type:
tuple
- Raises:
TypeError: – If the continent name is not provided as a string.
ValueError: – If the provided continent name is not valid or does not match any continent. If a similar continent name is found and suggested as a possible match.
- gpm.utils.geospatial.get_country_extent(name, padding=0.2)[source]#
Retrieves the extent of a country.
- Parameters:
name (str) – The name of the country.
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). Default is 0.2.
- Returns:
extent – A tuple containing the longitude and latitude extent of the country.
- Return type:
tuple
- Raises:
TypeError – If the country name is not provided as a string.
ValueError – If the country name is not valid or if there is no matching country.
Notes
This function retrieves the extent of a country from a dictionary of country extents. The country extent is defined as the longitude and latitude range that encompasses the country’s borders. The extent is returned as a tuple of four values: (xmin, xmax, ymin, ymax). The extent can be optionally padded by specifying the padding parameter.
- gpm.utils.geospatial.get_crop_slices_by_continent(xr_obj, name)[source]#
Compute the xarray object slices which are within the specified continent.
If the input is a GPM Orbit, it returns a list of along-track slices If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.
- Parameters:
xr_obj (xr.DataArray or xr.Dataset) – xarray object.
name (str) – Continent name.
- gpm.utils.geospatial.get_crop_slices_by_country(xr_obj, name)[source]#
Compute the xarray object slices which are within the specified country.
If the input is a GPM Orbit, it returns a list of along-track slices If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.
- Parameters:
xr_obj (xr.DataArray or xr.Dataset) – xarray object.
name (str) – Country name.
- gpm.utils.geospatial.get_crop_slices_by_extent(xr_obj, extent)[source]#
Compute the xarray object slices which are within the specified extent.
If the input is a GPM Orbit, it returns a list of along-track slices If the input is a GPM Grid, it returns a dictionary of the lon/lat slices.
- Parameters:
xr_obj (xr.DataArray or xr.Dataset) – xarray object.
extent (list or tuple) – The extent over which to crop the xarray object. extent must follow the matplotlib and cartopy conventions: extent = [x_min, x_max, y_min, y_max]
- gpm.utils.geospatial.get_extent(xr_obj, padding: int | float | tuple | list = 0)[source]#
Get the geographic extent from an xarray object.
- Parameters:
xr_obj (xr.DataArray or xr.Dataset) – xarray object.
padding (int, float, tuple, list) – The number of degrees to extend the extent in each direction. If padding is a single number, the same padding is applied in all directions. If padding is a tuple or list, it must contain 2 or 4 elements. If two values are provided (x, y), they are interpreted as longitude and latitude padding, respectively. If four values are provided, they directly correspond to padding for each side (left, right, top, bottom). Default is 0.
- Returns:
extent – A tuple containing the longitude and latitude extent of the xarray object. The extent follows the matplotlib/cartopy format (xmin, xmax, ymin, ymax)
- Return type:
tuple
- gpm.utils.geospatial.read_continents_extent_dictionary()[source]#
Read and return a dictionary containing the extents of continents.
- Returns:
dict
- Return type:
A dictionary containing the extents of continents.
gpm.utils.list module#
This module contains functions for list processing.
gpm.utils.manipulations module#
This module contains functions for manipulating GPM-API Datasets.
- gpm.utils.manipulations.create_bin_idx_data_array(xr_obj)[source]#
Create a 3D DataArray with the bin index along the range dimension.
The GPM bin index start at 1 ! GPM bin index is equivalent to gpm_range_id + 1
- gpm.utils.manipulations.get_bright_band_mask(ds)[source]#
Retrieve bright band mask defined by binBBBottom and binBBTop.
The bin is numerated from top to bottom. binBBTop has lower values than binBBBottom. binBBBottom and binBBTop are 0 when bright band limit is not detected !
- gpm.utils.manipulations.get_dims_without(da, dims)[source]#
Remove specified ‘dims’ for list of DataArray dimensions.
- gpm.utils.manipulations.get_height_at_temperature(da_height, da_temperature, temperature)[source]#
Retrieve height at a specific temperature.
- gpm.utils.manipulations.get_liquid_phase_mask(ds)[source]#
Retrieve the mask of the liquid phase profile.
- gpm.utils.manipulations.get_range_index_at_max(da)[source]#
Retrieve index along the range dimension where the DataArray has maximum values.
- gpm.utils.manipulations.get_range_index_at_min(da)[source]#
Retrieve index along the range dimension where the DataArray has minimum values.
- gpm.utils.manipulations.get_range_index_at_value(da, value)[source]#
Retrieve index along the range dimension where the DataArray values is closest to value.
- gpm.utils.manipulations.get_range_slices_with_valid_data(xr_obj, variable=None)[source]#
Get the vertical (‘range’/’height’) slices with valid data.
- gpm.utils.manipulations.get_range_slices_within_values(xr_obj, variable=None, vmin=-inf, vmax=inf)[source]#
Get the ‘range’ slices with data within a given data interval.
- gpm.utils.manipulations.get_solid_phase_mask(ds)[source]#
Retrieve the mask of the solid phase profile.
- gpm.utils.manipulations.get_variable_at_bin(xr_obj, bin, variable=None)[source]#
Retrieve variable values at range bin provided by bin_variable.
Assume bin values goes from 1 to 176.
- gpm.utils.manipulations.get_xr_shape(xr_obj, dims)[source]#
Get xarray shape for specific dimensions.
- gpm.utils.manipulations.integrate_profile_concentration(dataarray, name, scale_factor=None, units=None)[source]#
Utility to convert LWC or IWC to LWP or IWP.
Input data have unit g/m³. Output data will have unit kg/m² if scale_factor=1000
height a list or array of corresponding heights for each level.
- gpm.utils.manipulations.select_radar_frequency(xr_obj, radar_frequency)[source]#
Select data related to a specific radar frequency.
- gpm.utils.manipulations.select_spatial_2d_variables(ds, strict=False, squeeze=True)[source]#
Return xr.Dataset with only 2D spatial variables.
- gpm.utils.manipulations.select_spatial_3d_variables(ds, strict=False, squeeze=True)[source]#
Return xr.Dataset with only 3D spatial variables.
- gpm.utils.manipulations.select_transect_variables(ds, strict=False, squeeze=True)[source]#
Return xr.Dataset with only transect variables.
- gpm.utils.manipulations.slice_range_at_height(xr_obj, height)[source]#
Slice the 3D array at a given height.
- gpm.utils.manipulations.slice_range_at_max_value(xr_obj, variable=None)[source]#
Slice the 3D arrays where the variable values are at maximum.
- gpm.utils.manipulations.slice_range_at_min_value(xr_obj, variable=None)[source]#
Slice the 3D arrays where the variable values are at minimum.
- gpm.utils.manipulations.slice_range_at_temperature(ds, temperature, variable_temperature='airTemperature')[source]#
Slice the 3D arrays along a specific isotherm.
- gpm.utils.manipulations.slice_range_at_value(xr_obj, value, variable=None)[source]#
Slice the 3D arrays where the variable values are close to value.
gpm.utils.parallel module#
This module contains utilities for parallel processing.
gpm.utils.pyresample module#
This module contains pyresample utility functions.
gpm.utils.slices module#
This module contains utilities for list of slices processing.
- gpm.utils.slices.enlarge_slice(slc, min_size, min_start=0, max_stop=inf)[source]#
Enlarge a slice object to have at least a size of min_size.
The function enforces the left and right bounds of the slice by max_stop and min_start. If the original slice size is larger than min_size, the original slice will be returned.
- Parameters:
slc (slice) – The original slice object to be enlarged.
min_size (min_size) – The desired minimum size of the new slice.
min_start (int, optional) – The minimum value for the start of the new slice. The default is 0.
max_stop (int) – The maximum value for the stop of the new slice. The default is np.inf.
- Returns:
The new slice object with a size of at least min_size and respecting the left and right bounds.
- Return type:
slice
- gpm.utils.slices.enlarge_slices(list_slices, min_size, valid_shape)[source]#
Enlarge a list of slice object to have at least a size of min_size.
The function enforces the left and right bounds of the slice to be between 0 and valid_shape. If the original slice size is larger than min_size, the original slice will be returned.
- Parameters:
list_slices (list) – List of slice objects.
min_size ((int or tuple)) – Minimum size of the output slice.
valid_shape ((int or tuple)) – The shape of the array which the slices should be valid on.
- Returns:
list_slices – The list of slices after enlarging it (if necessary).
- Return type:
list
- gpm.utils.slices.get_indices_from_list_slices(list_slices, check_non_intersecting=True)[source]#
Return a numpy array of indices from a list of slices.
- gpm.utils.slices.get_list_slices_from_bool_arr(bool_arr, include_false=True, skip_consecutive_false=True)[source]#
Return the slices corresponding to sequences of True in the input arrays.
If include_false=True, the last element of each slice sequence (except the last) will be False. If include_false=False, no element in each slice sequence will be False. If skip_consecutive_false=True (default), the first element of each slice must be a True. If skip_consecutive_false=False, it returns also slices of size 1 which selects just the False value. If include_false = False, skip_consecutive_false is automatically True.
Examples
If include_false=True and skip_consecutive_false=False: –> [False, False] –> [slice(0,1), slice(1,2)] If include_false=True and skip_consecutive_false=True: –> [False, False] –> [] –> [False, False, True] –> [slice(2,3)] –> [False, False, True, False] –> [slice(2,4)] If include_false=False: –> [False, False, True, False] –> [slice(2,3)]
- gpm.utils.slices.get_list_slices_from_indices(indices)[source]#
Return a list of slices from a list/array of integer indices.
Example:#
[0,1,2,4,5,8] –> [slices(0,3),slice(4,6), slice(8,9)]
- gpm.utils.slices.get_slice_from_idx_bounds(idx_start, idx_end)[source]#
Return the slice required to include the idx bounds.
- gpm.utils.slices.get_slice_size(slc)[source]#
Get size of the slice.
Note: The actual slice size must not be representative of the true slice if slice.stop is larger than the length of object to be sliced.
- gpm.utils.slices.list_slices_combine(*args)[source]#
Combine together a list of list_slices, without any additional operation.
- gpm.utils.slices.list_slices_difference(list_slices1, list_slices2)[source]#
Return the list of slices covered by list_slices1 not intersecting list_slices2.
- gpm.utils.slices.list_slices_filter(list_slices, min_size=None, max_size=None)[source]#
Filter list of slices by size.
- gpm.utils.slices.list_slices_flatten(list_slices)[source]#
Flatten out list of slices with 2 nested level.
Examples
[[slice(1, 7934, None)], [slice(1, 2, None)]] –> [slice(1, 7934, None), slice(1, 2, None)] [slice(1, 7934, None), slice(1, 2, None)] –> [slice(1, 7934, None), slice(1, 2, None)]
- gpm.utils.slices.list_slices_intersection(*args, min_size=1)[source]#
Return the intersecting slices from multiple list of slices.
- gpm.utils.slices.list_slices_simplify(list_slices)[source]#
Simplify list of of sequential slices.
Example 1: [slice(0,2), slice(2,4)] –> [slice(0,4)]
- gpm.utils.slices.list_slices_sort(*args)[source]#
Sort a single or multiple list of slices by slice.start.
It output a single list of slices!
- gpm.utils.slices.list_slices_union(*args)[source]#
Return the union slices from multiple list of slices.
- gpm.utils.slices.pad_slice(slc, padding, min_start=0, max_stop=inf)[source]#
Increase/decrease the slice with the padding argument.
Does not ensure that all output slices have same size.
- Parameters:
slc (slice) – Slice objects.
padding (int) – Padding to be applied to the slice.
min_start (int, optional) – The minimum value for the start of the new slice. The default is 0.
max_stop (int) – The maximum value for the stop of the new slice. The default is np.inf.
- Returns:
list_slices – The list of slices after applying padding.
- Return type:
TYPE
- gpm.utils.slices.pad_slices(list_slices, padding, valid_shape)[source]#
Increase/decrease the list of slices with the padding argument.
- Parameters:
list_slices (list) – List of slice objects.
padding ((int or tuple)) – Padding to be applied on each slice.
valid_shape ((int or tuple)) – The shape of the array which the slices should be valid on.
- Returns:
list_slices – The list of slices after applying padding.
- Return type:
TYPE
gpm.utils.time module#
This module contains utilities for time processing.
- gpm.utils.time.ensure_time_validity(xr_obj, limit=10)[source]#
Attempt to correct the time coordinate if less than ‘limit’ consecutive NaT values are present.
It raise a ValueError if more than consecutive NaT occurs.
- Parameters:
xr_obj ((xr.Dataset, xr.DataArray)) – GPM xarray object.
- Returns:
xr_obj – GPM xarray object.
- Return type:
(xr.Dataset, xr.DataArray)
- gpm.utils.time.get_dataset_start_end_time(ds: Dataset, time_dim='time')[source]#
Retrieves dataset starting and ending time.
- Parameters:
ds (xr.Dataset) – Input dataset
time_dim (str) – Name of the time dimension. The default is “time”.
- Returns:
(
starting_time
,ending_time
)- Return type:
tuple
- gpm.utils.time.infill_timesteps(timesteps, limit)[source]#
Infill missing timesteps if less than <limit> consecutive.
- gpm.utils.time.interpolate_nat(timesteps, method='linear', limit=5, limit_direction=None, limit_area=None)[source]#
Fill NaT values using an interpolation method.
- Parameters:
method (str, default 'linear') – Interpolation technique to use. One of: * ‘linear’: Treat the timesteps as equally spaced. * ‘pad’: Fill in NaTs using existing values. * ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘spline’, ‘barycentric’, ‘polynomial’: Passed to scipy.interpolate.interp1d. Both ‘polynomial’ and ‘spline’ require that you also specify an order (int), e.g.
interpolate_nat(method='polynomial', order=5)
. * ‘krogh’, ‘piecewise_polynomial’, ‘spline’, ‘pchip’, ‘akima’, ‘cubicspline’: Wrappers around the SciPy interpolation methods of similar names. See Notes in https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.htmllimit (int, optional) – Maximum number of consecutive NaTs to fill. Must be greater than 0.
limit_direction ({{'forward', 'backward', 'both'}}, Optional) –
Consecutive NaTs will be filled in this direction.
- If limit is specified:
If ‘method’ is ‘pad’ or ‘ffill’, ‘limit_direction’ must be ‘forward’.
If ‘method’ is ‘backfill’ or ‘bfill’, ‘limit_direction’ must be ‘backwards’.
- If ‘limit’ is not specified:
If ‘method’ is ‘backfill’ or ‘bfill’, the default is ‘backward’ else the default is ‘forward’
limit_area ({{None, ‘inside’, ‘outside’}}, default None) – If limit is specified, consecutive NaTs will be filled with this restriction. *
None
: No fill restriction. * ‘inside’: Only fill NaTs surrounded by valid values (interpolate). * ‘outside’: Only fill NaTs outside valid values (extrapolate).
Notes
Depending on the interpolation method (i.e. linear) the infilled values could have ns resolution. For further information refers to https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.interpolate.html
- Returns:
Timesteps array of type datetime64[ns]
- Return type:
timesteps, np.array
- gpm.utils.time.is_nat(timesteps)[source]#
Return a boolean array indicating timesteps which are NaT.
- gpm.utils.time.regularize_dataset(ds: ~xarray.core.dataset.Dataset, freq: str, time_dim: str = 'time', method: str | None = None, fill_value=<NA>)[source]#
Regularize a dataset across time dimension with uniform resolution.
- Parameters:
ds (xr.Dataset) – xarray Dataset.
time_dim (str, optional) – The time dimension in the xr.Dataset. The default is
"time"
.freq (str) – The
freq
string to pass topd.date_range
to define the new time coordinates. Examples:freq="2min"
.method (str, optional) – Method to use for filling missing timesteps. If
None
, fill withfill_value
. The default isNone
. For other possible methods, see https://docs.xarray.dev/en/stable/generated/xarray.Dataset.reindex.htmlfill_value (float, optional) – Fill value to fill missing timesteps. The default is
dtypes.NA
.
- Returns:
ds_reindexed – Regularized dataset.
- Return type:
xr.Dataset
- gpm.utils.time.subset_by_time(xr_obj, start_time=None, end_time=None)[source]#
Filter a GPM xarray object by start_time and end_time.
- Parameters:
xr_obj – A xarray object.
start_time (datetime.datetime) – Start time. By default is
None
end_time (datetime.datetime) – End time. By default is
None
- Returns:
xr_obj – GPM xarray object
- Return type:
(xr.Dataset, xr.DataArray)
gpm.utils.timing module#
This module contains decorators which measure the function time of execuution.
gpm.utils.warnings module#
This module defines GPM Warning classes.
gpm.utils.yaml module#
This module defines a YAML file reader and writer.
Module contents#
This directory contains the GPM-API utility functions.