InnerDirichletPartitioner#

class InnerDirichletPartitioner(partition_sizes: List[int] | ndarray[Any, dtype[int64]], partition_by: str, alpha: int | float | List[float] | ndarray[Any, dtype[float64]], shuffle: bool = True, seed: int | None = 42)[source]#

Bases: Partitioner

Partitioner based on Dirichlet distribution.

Each partition is created based on the Dirichlet distribution, where the probability corresponds to the fractions of samples of specific classes. This process is iterative (sample by sample assignment), where first, the partition ID to which the class will be assigned is chosen (at random, uniformly), and then the class is decided based on the Dirichlet probabilities (note that when a class gets exhausted - no more samples exists to sample from - the probability of sampling this class is set as zero and the remaining probabilities renormalized).

Implementation based on: Federated Learning Based on Dynamic Regularization (https://arxiv.org/abs/2111.04263).

Parameters:
  • partition_sizes (Union[List[int], NDArrayInt]) – The sizes of all partitions.

  • partition_by (str) – Column name of the labels (targets) based on which Dirichlet sampling works.

  • alpha (Union[int, float, List[float], NDArrayFloat]) – Concentration parameter to the Dirichlet distribution (a single value for symmetric Dirichlet distribution, or a list/NDArray of length equal to the number of unique classes)

  • shuffle (bool) – Whether to randomize the order of samples. Shuffling applied after the samples assignment to partitions.

  • seed (int) – Seed used for dataset shuffling. It has no effect if shuffle is False.

Examples

>>> from flwr_datasets import FederatedDataset
>>> from flwr_datasets.partitioner import InnerDirichletPartitioner
>>>
>>> partitioner = InnerDirichletPartitioner(
>>>     partition_sizes=[6_000] * 10, partition_by="label", alpha=0.5
>>> )
>>> fds = FederatedDataset(dataset="mnist", partitioners={"train": partitioner})
>>> partition = fds.load_partition(0)
>>> print(partition[0])  # Print the first example

Methods

is_dataset_assigned()

Check if a dataset has been assigned to the partitioner.

load_partition(partition_id)

Load a partition based on the partition index.

Attributes

dataset

Dataset property.

num_partitions

Total number of partitions.

property dataset: Dataset#

Dataset property.

is_dataset_assigned() bool#

Check if a dataset has been assigned to the partitioner.

This method returns True if a dataset is already set for the partitioner, otherwise, it returns False.

Returns:

dataset_assigned – True if a dataset is assigned, otherwise False.

Return type:

bool

load_partition(partition_id: int) Dataset[source]#

Load a partition based on the partition index.

Parameters:

partition_id (int) – the index that corresponds to the requested partition

Returns:

dataset_partition – single partition of a dataset

Return type:

Dataset

property num_partitions: int#

Total number of partitions.