SizePartitioner#

class SizePartitioner(num_partitions: int, partition_id_to_size_fn: Callable)[source]#

Bases: Partitioner

Base class for the deterministic size partitioning based on the partition_id.

The client with partition_id has the following relationship regarding the number of samples.

partition_id_to_size_fn(partition_id) ~ number of samples for partition_id

If the function doesn’t transform the partition_id it’s a linear correlation between the number of sample for the partition and the value of partition_id. For instance, if the partition ids range from 1 to M, partition with id 1 gets 1 unit of data, client 2 gets 2 units, and so on, up to partition M which gets M units.

Note that size corresponding to the partition_id is deterministic, yet in case of different dataset shuffling the assignment of samples to partition_id will vary.

Parameters:

num_partitions (int) – The total number of partitions that the data will be divided into.
partition_id_to_size_fn (Callable) – Function that defines the relationship between partition id and the number of samples.

Methods

`is_dataset_assigned`()	Check if a dataset has been assigned to the partitioner.
`load_partition`(partition_id)	Load a single partition based on the partition index.

Attributes

`dataset`	Dataset property.
`num_partitions`	Total number of partitions.
`partition_id_to_indices`	Node id to the list of indices.
`partition_id_to_size`	Node id to the number of samples.

property dataset: Dataset#: Dataset property.

is_dataset_assigned() → bool#

Check if a dataset has been assigned to the partitioner.

This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.

Returns:: dataset_assigned – True if a dataset is assigned, otherwise False.
Return type:: bool

load_partition(partition_id: int) → Dataset[source]#

Load a single partition based on the partition index.

The number of samples is dependent on the partition partition_id.

Parameters:: partition_id (int) – the index that corresponds to the requested partition
Returns:: dataset_partition – single dataset partition
Return type:: Dataset

property num_partitions: int#: Total number of partitions.

property partition_id_to_indices: Dict[int, List[int]]#: Node id to the list of indices.

property partition_id_to_size: Dict[int, int]#: Node id to the number of samples.