BEARD: Benchmarking the Adversarial Robustness for Dataset Distillation

The goal of BEARD is to systematically explore the adversarial robustness in Dataset Distillation (DD). A critical gap exists in the field, where the adversarial robustness of models trained on distilled datasets remains underexplored. Despite advancements in DD, deep neural networks continue to be vulnerable to adversarial attacks, posing significant security risks across various applications. Current research shows that while DD can partially improve adversarial robustness, it falls short of fully addressing these vulnerabilities. Additionally, the lack of a unified, open-source benchmark for evaluating adversarial robustness in DD has slowed progress in the field.

In response, we introduce BEARD, an open benchmark designed to systematically evaluate the adversarial robustness of DD methods. BEARD provides a comprehensive framework for testing DD methods across various datasets using different adversarial attacks, introduces new evaluation metrics, and offers open-source tools to support ongoing research and development.

TO DO List:

Supplement More DD Methods: Our benchmark evaluates the following distillation methods: DC, DSA, DM, MTT, IDM, BACON, TESLA, RDED, and et cetera. We will periodically update this list as new methods are developed and evaluated.

✓ Completed methods
✗ Not yet completed methods

We will periodically update this list as new methods are developed and evaluated.
Improve the Code We welcome contributions and participation from the community to help enhance our benchmark. If you have suggestions or want to contribute, please feel free to get involved and make a difference!

Training phase with
6+ distillation methods
and 3+ datasets

Evaluation phase with
5+ adversarial attacks
to assess model robustness

Training Configuration

Check out the dataset pool and model pool. Below is the JSON file used for the training stage:

                                # Configuration for model training
                  config = {
                      "dataset": "CIFAR10",
                      "model": "ConvNet",
                      "method": "XX",
                      "ipc": "50",
                      "dsa_strategy": "color_crop_cutout_flip_scale_rotate",
                      "syn_ce": true,
                      "ce_weight": 1,
                      "aug": false,
                      "data_file": "./data_pool/XX/XXX.pt",
                      "save_path": "./model_pool/XX/",
                      "train_attack": "PGD",
                      "target_attack": false,
                      "test_attack": "None",
                      "src_dataset": false
                  }
              
          


              
            

Evaluating Configuration

Check out the evaluation results. Below is the JSON file used for the evaluating stage:

                            # Configuration for model evaluation
                config = {
                    "dataset": "CIFAR10",
                    "model": "ConvNet",
                    "method": "XX",
                    "ipc": "50",
                    "dsa_strategy": "color_crop_cutout_flip_scale_rotate",
                    "syn_ce": true,
                    "ce_weight": 1,
                    "aug": false,
                    "load_file": "。/model_pool/XX/XXX.pth",
                    "save_path": "。/model_pool/XX/",
                    "train_attack": "None",
                    "test_attack": ["Clean", "FGSM", "PGD", XXX],
                    "target_attack": [true, false],
                    "src_dataset": true,
                    "pgd_eva": false
                }
            
        

Available Leaderboards

CIFAR-10 (Unified) CIFAR-100 (Unified) TinyImageNet (Unified) CIFAR-10 (IPC-1) CIFAR-10 (IPC-10) CIFAR-10 (IPC-50) CIFAR-100 (IPC-1) CIFAR-100 (IPC-10) CIFAR-100 (IPC-50) TinyImageNet (IPC-1) TinyImageNet (IPC-10) TinyImageNet (IPC-50)

1
1
1
1

Leaderboard: CIFAR-10 (Unified), untargeted attack

1
1
1
1

Leaderboard: CIFAR-100 (Unified), untargeted attack

1
1
1
1

Leaderboard: TinyImageNet (Unified), untargeted attack

1
1
1
1

Leaderboard: CIFAR-10 (IPC-1), untargeted attack

1
1
1
1

Leaderboard: CIFAR-10 (IPC-10), untargeted attack

1
1
1
1

Leaderboard: CIFAR-10 (IPC-50), untargeted attack

1
1
1
1

Leaderboard: CIFAR-100 (IPC-1), untargeted attack

1
1
1
1

Leaderboard: CIFAR-100 (IPC-10), untargeted attack

1
1
1
1

Leaderboard: CIFAR-100 (IPC-50), untargeted attack

1
1
1
1

Leaderboard: TinyImageNet (IPC-1), untargeted attack

1
1
1
1

Leaderboard: TinyImageNet (IPC-10), untargeted attack

1
1
1
1

Leaderboard: TinyImageNet (IPC-50), untargeted attack

FAQ

➤ How does the BEARD leaderboard differ from the DC-Bench leaderboard? 🤔
The DC-Bench leaderboard is a leaderboard to evaluate the distillation performance of DD methods. BEARD aims to evaluate the adversarial robustness of DD methods.

➤ How does the BEARD leaderboard differ from RobustBench? 🤔
RobustBench focuses on adversarial robustness evaluations of general methods, but we provide the evaluation of adversarial robustness for DD methods.

Contribute to BEARD!

We welcome contributions in the form of both new robust models and evaluations to our BEARD. Feel free to contact us.

Maintainers

Anonymous