Google Summer of Code: Project Ideas

Dr. Maria Börner
Flower Core Architecture

Flower is an agnostic federated learning framework that works with any machine learning framework (Keras, Tensorflow, PyTorch, MXNet, ...), that is programming language independent (Python, Java, C++, Swift, ...), and that works on any operating system (iOS, Android, Linux, Windows, macOS). The Google Summer of Code program is a great way to get started with federated learning, Flower, and open source work in general! This blog post proposes Google Summer of Code project ideas which will help to improve the usability of Flower on different platforms, add new federated learning algorithms to the core framework, and help to make federated learning more accessible in general.

Possible mentors for Google Summer of Code project ideas are:

The following list shows some suitable project ideas, but students can also suggest their own ideas.

JAX + Flower Federated Learning Quickstart Example

Description

JAX is a higher performance machine learning framework based on Python and NumPy. In this project, you will develop a federated learning example with JAX and Flower. The example can be based on one of the centralised JAX ML training projects and is intended to showcase how Flower can be used to federate such projects. First, you can choose the model (sequential, CNN) as well as the dataset for the training (MNIST, Cifar10, ... ). You create then a training and evaluation process running centralized. Afterwards, Flower takes the dataset, model, training and evaluation process from the centralized example to run the JAX example federated. You also need to create a Flower-based code to extract and modify the model weights from the previously created model.

Expected Outcome:

  • Available JAX example running federated
  • Write documentation and README for the GitHuB repository
  • Write blog post about new available example

Required Skills:

  • Intermediate knowledge of Python
  • Basic knowledge with Git
  • Basic understanding of JAX is an advantage but not necessary

Mentor: Dr. Maria Börner

Difficulty: easy

Monitoring tools for FL/Flower

Description

When training ML models, being able to track training statistics such as accuracy/loss curves or ranges of distributions can be very insightful: it helps to choose better hyperparameters, helps introducing learning rate decay schedulers, etc. For FL, especially when using large pools of clients, it can become challenging to monitor and track such statistics when each client is running on different machines. This project involves building a centralized monitoring system to track training statistics from all clients participating in the experiment. The design should be framework-agnostic, so it works with existing frameworks such as Tensorboard or W&B, if the user wants to rely on those.

Expected Outcome:

  • Add support to clients for (optional) user-defined monitoring callbacks.
  • Create example demonstrating support for a variety of scenarios.
  • A blog post describing the example above.

Required Skills:

  • Intermediate knowledge of Python
  • Basic experience training ML models with Pytorch and/or Tensorflow
  • Experience with tools for monitoring training: Tensorboard, W&B, etc

Mentor: Javier Fernandez-Marques

Difficulty: Medium

Federated Analytics

Description

Flower’s main focus is Federated Learning, but the framework is general enough to implement related approaches, such as Federated Analytics. It allows to exchange ML models within a set of connected devices, each holding their own data partition. Federated analytics is somewhat similar to federated learning, but instead of model updates the client sends analytical results based on the local data partition. In this project, you will implement a prototype for Federated Analytics using Flower. This prototype will help to shape future API changes to make Federated Analytics a first-class citizen in the Flower ecosystem.

Expected Outcome:

  • Create a prototype for federated analytics in Flower
    • Define the general approach
    • Create a code example
    • Implement changes to the underlying Flower core framework if necessary
  • Document the final approach
  • Write Blog post about the approach

Required Skills:

  • Knowledge of python programming
  • Knowledge of analytics tools in the Python ecosystem
  • Basic understanding of federated analytics

Mentor: Daniel Beutel

Difficulty: hard

Secure Aggregation

Description

Federated learning allows us to train a model over a set of connected devices, each holding their own data partition. Each connected device trains the model on their local dataset and sends the updated model to a central server thereafter. Secure Aggregation is a way to protect these model updates from being analysed by the server by only allowing the server to see the actual model parameters after aggregation. This prevents the server from being able to “peek” into the model update from a single device. In this project, you will implement Secure Aggregation in the Flower federated learning framework and create a code example that demonstrates the usage of Secure Aggregation with Flower.

Expected Outcome:

  • Define an implementation proposal for Secure Aggregation
  • Implement the required changed to the message protocol (i.e. messages exchanged between server and clients)
  • Implement the required server-side and client-side logic
  • Create a code example using the new functionality
  • A documentation for the new feature to use
  • Optional: a blog post about the new feature

Required Skills:

  • Knowledge of python programming
  • Good understanding of security principles (encryption/decryption)
  • Basic understanding of machine learning

Mentors: Daniel Beutel and Nicholas Lane, PhD

Difficulty: hard

Differential Privacy using Opacus

Description

Federated learning offers increased data privacy since only the weights of ML models are shared with a server, not the underlying data used to train these models. However, this paper (https://arxiv.org/abs/1906.08935) shows that it is possible to obtain private data by leaking the shared weights. Therefore, an additional layer can help to improve data privacy. In this project, you will implement a Flower example that demonstrated how Differential Privacy can be used on the client.

Expected Outcome:

  • Create a PyTorch-based code example that implements Differential Privacy using Opacus (https://opacus.ai/).
  • Test and document the new example.
  • Write Blog post describing the new code example.

Required Skills

  • Basic understanding of differential privacy
  • Experience with PyTorch
  • Experience with Python

Mentor: Daniel Beutel

Difficulty: medium

FedProx

Description

Implement FedProx Federated Learning Strategy for Flower. Federated learning involves training on thousands of devices having different amounts of data and different data distributions. In this project, you will implement FedProx, a proven aggregation Strategy that helps mitigate the problem of heterogeneity by suggesting a generalization and re-parametrization of FedAvg.

Expected outcomes:

  • Source code for the FedProx implementatio
  • Unit test for the code above
  • A Flower example that uses the Strategy
  • A blogpost describing the Strategy

Skills required:

  • Intermediate knowledge of Python
  • Basic knowledge with Git

Mentor: Pedro Porto Buarque de Gusmão, PhD

Sources: https://arxiv.org/pdf/1905.10497.pdf

per-FedAvg

Description

Implement a Personalized Federated Learning Strategy. Federated Learning trains on thousands of clients having different data distributions. Is it really possible to find one model that will fit them all? In this project you will be developing a new aggregation strategy called Personalized FedAvg (per-FedAvg), which tries to find an initial shared model that current or new users can easily adapt to their local datasets.

Expected outcomes:

  • Source code for the per-FedAvg implementation.
  • Unit test for the code above.
  • A Flower example that uses the Strategy.
  • A blogpost describing the Strategy.

Skills required:

  • Intermediate knowledge of Python
  • Basic knowledge with Git

Mentor: Pedro Porto Buarque de Gusmão, PhD

Sources: https://proceedings.neurips.cc/paper/2020/file/24389bfe4fe2eba8bf9aa9203a44cdad-Paper.pdf

Reinforcement Learning

Description

Reinforcement Learning (RL) helps us find solutions to problems where some notion of cumulative reward needs to be maximized, e.g. video games. Given the vast amounts of data being generated in mobile devices, FL offers a good alternative to centralized training where the agents can now be trained directly on the edge. In this project you will develop a Flower example that trains a RF Agent using Federated Learning.

Expected Outcome:

  • Code for a FLower example that shows how to train a Reinforcement Learning model using Federated Learning
  • A blog post describing the example above

Required Skills:

  • Basic understanding of Reinforcement Learning
  • Intermediate knowledge of Python
  • Acquaintance with Pytorch or Tensorflow

Mentor: Pedro Porto Buarque de Gusmão, PhD

Difficulty: hard

Sources:

Dart/Flutter SDK

Description

One of the use cases for federated learning and Flower is to connect a fleet of devices (server, phone, edge devices, ...) and train AI models on them. Flutter can be used to easily build mobile applications on multiple platforms. In this project, you will build a Flutter SDK for Flower. The SDK will use gRPC to communicate with the server.

Expected Outcome:

  • Set up gRPC compilation for Dart/Flutter
  • Define the user-facing API of the Dart/Flutter SDK
  • Implement the API
  • Test and document the new module
  • Build a Dart/Flutter library and publish it on https://pub.dev
  • Build a code example using the SDK
  • Write Blog post about the available feature

Required Skills:

  • Good understanding of Dart
  • Basic understanding of Flutter
  • Interest in gRPC

Mentor: Taner Topal

Difficulty: medium

Java/Android SDK

Description

Android is one of the two important mobile platforms for federated learning and mobile app users will benefit considerably from protecting their data. An Android SDK allows for easy integration of federated learning in mobile apps. In this project, you will build an Android SDK for Flower, publish the resulting library, and build a usage example demonstrating how to use this library.

Expected Outcome:

  • Set up gRPC compilation for Java/Android
  • Define the user-facing API of the Java/Android SDK
  • Implement the API
  • Test and document the new SDK
  • Build a Java/Android library and publish it
  • Build a code example using the SDK
  • Write Blog post about the available feature

Required Skills

  • Android programming
  • Basic understanding of gRPC
  • Interest in machine learning / federated learning

Mentors: Daniel Beutel and Taner Topal

Difficulty: medium

Swift/iOS SDK

Description

iOS is one of the two important mobile platforms for federated learning and mobile app users will benefit considerably from protecting their data. An iOS SDK allows for easy integration of federated learning in mobile apps. In this project, you will build a Swift/iOS SDK for Flower, publish the resulting library, and build a usage example demonstrating how to use this library.

Expected Outcome:

  • Set up gRPC compilation for Swift/iOS
  • Define the user-facing API of the Swift/iOS SDK
  • Implement the API
  • Test and document the new SDK
  • Build a Swift/iOS library and publish it
  • Build a code example using the SDK
  • Write Blog post about the available feature

Required Skills:

  • iOS programming
  • Basic understanding of gRPC
  • Interest in machine learning / federated learning

Mentors: Daniel Beutel and Taner Topal

Difficulty: medium

C++ SDK

Description

C++ is one of the most defining programming languages of our time. It is used in many critical applications and the go-to language for performance-sensitive applications, such as robotics or automotive. Federated Learning can enable entirely new platforms in these domains and we thus want to support C++ by providing a Flower C++ SDK. Flower communicates between the server and the client using gRPC. At the moment, every C++ user needs to build their own integration with the gRPC message protocol to run Flower. In this project, you will create a Flower SDK for C++.

Expected Outcome:

  • Set up gRPC compilation for C++
  • Define the user-facing API of the C++ SDK
  • Implement the API
  • Test and document the new SDK
  • Build a C++ library and publish it
  • Build a code example using the C++ SDK and libtorch (PyTorch C++ API)
  • Write Blog post about the available feature

Required Skills:

  • Strong experience with C++
  • Interest in gRPC
  • Basic understanding of machine learning
  • Optional: Basic libtorch (PyTorch C++ API) understanding

Mentor: Taner Topal

Difficulty: medium