gpm.bucket package

gpm.bucket package#

Submodules#

gpm.bucket.analysis module#

This module provide utilities for GPM Geographic Bucket Analysis.

gpm.bucket.analysis.get_cut_lat_breaks_labels(bin_spacing)[source]#

gpm.bucket.analysis.get_cut_lon_breaks_labels(bin_spacing)[source]#

gpm.bucket.analysis.get_lat_bins(bin_spacing)[source]#

gpm.bucket.analysis.get_lat_labels(bin_spacing)[source]#

gpm.bucket.analysis.get_lon_bins(bin_spacing)[source]#

gpm.bucket.analysis.get_lon_labels(bin_spacing)[source]#

gpm.bucket.analysis.get_n_decimals(number)[source]#

gpm.bucket.analysis.pl_add_geographic_bins(df, xbin_column, ybin_column, bin_spacing, x_column='lon', y_column='lat')[source]#

gpm.bucket.analysis.pl_df_to_xarray(df, xbin_column, ybin_column, bin_spacing)[source]#

gpm.bucket.dataset module#

This module provide to write a GPM Geographic Bucket Apache Parquet Dataset.

gpm.bucket.dataset.write_partitioned_dataset(df, base_dir, partitioning, filename_prefix='part', **writer_kwargs)[source]#

gpm.bucket.io module#

This module provide utilities to search GPM Geographic Buckets files.

gpm.bucket.io.get_filepaths_by_bin(base_dir, parallel=True)[source]#: Retrieve a dictionary with the list of filepaths for each bucket bin.

gpm.bucket.processing module#

This module provide utilities for the creation of GPM Geographic Buckets.

gpm.bucket.processing.assign_spatial_partitions(df, xbin_name, ybin_name, xbin_size, ybin_size, x_column='lat', y_column='lon')[source]#

Add partitioning bin columns to dataframe.

Works for both dask.dataframe and pandas.dataframe.

gpm.bucket.processing.convert_size_to_bytes(size)[source]#

gpm.bucket.processing.drop_undesired_columns(df)[source]#: Drop undesired columns like dataset dimensions without coordinates.

gpm.bucket.processing.ds_to_dask_df_function(ds)[source]#

Convert an xr.Dataset to a dask.Dataframe.

This function expects a xr.Dataset with only 2D spatial DataArrays.

gpm.bucket.processing.ds_to_pd_df_function(ds)[source]#

Convert an xr.Dataset to a pandas.Dataframe.

This function expects a xr.Dataset with only 2D spatial DataArrays.

gpm.bucket.processing.ensure_pyarrow_string_columns(df)[source]#: Convert ‘object’ type columns to pyarrow strings.

gpm.bucket.processing.ensure_unique_chunking(ds)[source]#

Ensure the dataset has unique chunking.

Conversion to dask.dataframe requires unique chunking. If the xr.Dataset does not have unique chunking, perform ds.unify_chunks.

Variable chunks can be visualized with:

for var in ds.data_vars:: print(var, ds[var].chunks)

gpm.bucket.processing.estimate_row_group_size(df, size='200MB')[source]#

Estimate row_group_size parameter based on the desired row group memory size.

row_group_size is a Parquet argument controlling the number of rows in each Apache Parquet File Row Group.

gpm.bucket.processing.get_bin_partition(values, bin_size)[source]#

Compute the bins partitioning values.

Parameters:

values (float or array-like) – Values.
bin_size (float) – Bin size.

Returns:

Bin value – DESCRIPTION.

Return type:

float or array-like

gpm.bucket.processing.get_df_object_columns(df)[source]#: Get the dataframe columns which have ‘object’ type.

gpm.bucket.processing.has_unique_chunking(ds)[source]#: Check if a dataset has unique chunking.

gpm.bucket.readers module#

This module provide utilities to read GPM Geographic Buckets Apache Parquet files.

gpm.bucket.readers.read_partitioned_dataset(filepath, columns=None)[source]#

gpm.bucket.utils module#

This module provide utilities to manipulate GPM Geographic Buckets.

gpm.bucket.utils.add_bin_column(df, column, bin_size, vmin, vmax, bin_name, add_midpoint=True)[source]#

gpm.bucket.utils.add_spatial_bins(df, x='x', y='y', xbin_size=1, ybin_size=1, xlim=(-180, 180), ylim=(-90, 90), xbin_name='xbin', ybin_name='ybin', add_bin_midpoint=True)[source]#

gpm.bucket.utils.create_spatial_bin_empty_df(xbin_size=1, ybin_size=1, xlim=(-180, 180), ylim=(-90, 90), xbin_name='xbin', ybin_name='ybin')[source]#: Create empty spatial bin DataFrame.

gpm.bucket.writers module#

This module provide utilities to write GPM Geographic Buckets Apache Parquet files.

gpm.bucket.writers.split_list_in_blocks(values, block_size)[source]#

gpm.bucket.writers.write_granule_bucket(src_filepath, bucket_base_dir, ds_to_df_converter, xbin_size=15, ybin_size=15, xbin_name='lonbin', ybin_name='latbin', row_group_size='500MB', **writer_kwargs)[source]#

Write a geographically partitioned Parquet Dataset of a GPM granules.