Use with NumPy#

Let’s integrate flwr-datasets with NumPy.

Create a FederatedDataset:

from flwr_datasets import FederatedDataset

fds = FederatedDataset(dataset="cifar10", partitioners={"train": 10})
partition = fds.load_partition(0, "train")
centralized_dataset = fds.load_split("test")

Inspect the names of the features:

partition.features

In case of CIFAR10, you should see the following output.

{'img': Image(decode=True, id=None),
'label': ClassLabel(names=['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog',
'frog', 'horse', 'ship', 'truck'], id=None)}

We will use the keys in the partition features in order to apply transformations to the data or pass it to a ML model. Let’s move to the transformations.

NumPy#

Transform to NumPy:

partition_np = partition.with_format("numpy")
X_train, y_train = partition_np["img"], partition_np["label"]

That’s all. Let’s check the dimensions and data types of our X_train and y_train:

print(f"The shape of X_train is: {X_train.shape}, dtype: {X_train.dtype}.")
print(f"The shape of y_train is: {y_train.shape}, dtype: {y_train.dtype}.")

You should see:

The shape of X_train is: (500, 32, 32, 3), dtype: uint8.
The shape of y_train is: (500,), dtype: int64.

Note that the X_train values are of type uint8. It is not a problem for the TensorFlow model when passing the data as input, but it might remind us to normalize the data - global normalization, pre-channel normalization, or simply rescale the data to [0, 1] range:

X_train = (X_train - X_train.mean()) / X_train.std() # Global normalization

CNN Keras model#

Here’s a quick example of how you can use that data with a simple CNN model:

import tensorflow as tf
from tensorflow.keras import datasets, layers, models

model = models.Sequential([
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D(2, 2),
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(10, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
            metrics=['accuracy'])
model.fit(X_train, y_train, epochs=20, batch_size=64)

You should see about 98% accuracy on the training data at the end of the training.

Note that we used "sparse_categorical_crossentropy". Make sure to keep it that way if you don’t want to one-hot-encode the labels.