Flower Example using XGBoost (Comprehensive)#
This example demonstrates a comprehensive federated learning setup using Flower with XGBoost. We use HIGGS dataset to perform a binary classification task. It differs from the xgboost-quickstart example in the following ways:
Arguments parsers of server and clients for hyperparameters selection.
Customised FL settings.
Customised number of partitions.
Customised partitioner type (uniform, linear, square, exponential).
Bagging/cyclic training methods.
Start by cloning the example project. We prepared a single-line command that you can copy into your shell which will checkout the example for you:
git clone --depth=1 https://github.com/adap/flower.git && mv flower/examples/xgboost-comprehensive . && rm -rf flower && cd xgboost-comprehensive
This will create a new directory called
xgboost-comprehensive containing the following files:
-- README.md <- Your're reading this right now -- server.py <- Defines the server-side logic -- client.py <- Defines the client-side logic -- dataset.py <- Defines the functions of data loading and partitioning -- utils.py <- Defines the arguments parser for clients and server -- run_bagging.sh <- Commands to run bagging experiments -- run_cyclic.sh <- Commands to run cyclic experiments -- pyproject.toml <- Example dependencies (if you use Poetry) -- requirements.txt <- Example dependencies
Project dependencies (such as
flwr) are defined in
requirements.txt. We recommend Poetry to install those dependencies and manage your virtual environment (Poetry installation) or pip, but feel free to use a different way of installing dependencies and managing virtual environments if you have other preferences.
poetry install poetry shell
Poetry will install all your dependencies in a newly created virtual environment. To verify that everything works correctly you can run the following command:
poetry run python3 -c "import flwr"
If you don’t see any errors you’re good to go!
Write the command below in your terminal to install the dependencies according to the configuration file requirements.txt.
pip install -r requirements.txt
Run Federated Learning with XGBoost and Flower#
We have two scripts to run bagging and cyclic (client-by-client) experiments.
run_cyclic.sh will start the Flower server (using
sleep for 15 seconds to ensure that the server is up,
and then start 5 Flower clients (using
client.py) with a small subset of the data from exponential partition distribution.
You can simply start everything in a terminal as follows:
poetry run ./run_bagging.sh
poetry run ./run_cyclic.sh
The script starts processes in the background so that you don’t have to open eleven terminal windows.
If you experiment with the code example and something goes wrong, simply using
CTRL + C on Linux (or
CMD + C on macOS) wouldn’t normally kill all these processes,
which is why the script ends with
trap "trap - SIGTERM && kill -- -$$" SIGINT SIGTERM EXIT and
This simply allows you to stop the experiment using
CTRL + C (or
CMD + C).
If you change the script and anything goes wrong you can still use
killall python (or
to kill all background processes (or a more specific command if you have other Python processes running that you don’t want to kill).
You can also manually run
poetry run python3 server.py --train-method=bagging/cyclic --pool-size=N --num-clients-per-round=N
poetry run python3 client.py --train-method=bagging/cyclic --node-id=NODE_ID --num-partitions=N for as many clients as you want,
but you have to make sure that each command is run in a different terminal window (or a different computer on the network).
In addition, we provide more options to customise the experimental settings, including data partitioning and centralised/distributed evaluation (see
Look at the code
and tutorial for a detailed explanation.
Expected Experimental Results#
Bagging aggregation experiment#
The figure above shows the centralised tested AUC performance over FL rounds on 4 experimental settings. One can see that all settings obtain stable performance boost over FL rounds (especially noticeable at the start of training). As expected, uniform client distribution shows higher AUC values (beyond 83% at the end) than square/exponential setup. Feel free to explore more interesting experiments by yourself!