Learning Responsibility-Attributed Adversarial Scenarios
for Testing Autonomous Vehicles

Yizhuo Xiao1  ·   Haotian Yan2  ·   Ying Wang3  ·   Zhongpan Zhu2,4  ·   Yuxin Zhang5  ·   Xintao Yan6  ·   Mustafa Suphi Erden1  ·   Cheng Wang1,*

1 School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, U.K.
2 State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University, Shanghai, China
3 College of Computer Science and Technology, Jilin University, Changchun, China
4 University of Shanghai for Science and Technology, Shanghai, China
5 National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun, China
6 Department of Civil Engineering, The University of Hongkong, Hongkong, China
* Corresponding author: cheng.wang@hw.ac.uk

Abstract

Establishing trustworthy safety assurance for autonomous driving systems (ADSs) requires evidence that failures arise from avoidable system deficiencies rather than unavoidable traffic conflicts. Current adversarial simulation methods can efficiently expose collisions, but generally lack mechanisms to distinguish these fundamentally different failure modes.

Here we present CARS (Context-Aware, Responsibility-attributed Scenario generation), a framework that integrates responsibility attribution directly into adversarial scenario generation. CARS combines context-aware adversary selection with a generative adversarial policy optimized in closed-loop simulation to construct collision scenarios that are both physically feasible and diagnostically attributable.

Across benchmark datasets spanning heterogeneous national traffic environments, CARS consistently discovers feasible collision scenarios with high attribution rates under multiple regulation-prescribed careful and competent driver models. By coupling adversarial generation with normative responsibility assessment, CARS moves simulation testing beyond collision discovery toward the construction of interpretable, regulation-aligned safety evidence for scalable ADS validation.

Method Overview

CARS Closed-loop Pipeline

CARS responsibility-attributed adversarial scenario generation framework.
(a) Context-aware adversary selection. At each simulation step, CARS scores surrounding agents by their kinematic relationship to the ADS and stabilizes the selected agent over time, allowing the threat assignment to track the evolving conflict.
(b) Adversarial trajectory generation. The selected agent is initialized from a Gaussian-mixture diffusion prior. Closed-loop reinforcement learning fine-tunes the agent to approach the ADS and create safety-critical interactions.
(c) Attribution-aware objective. CARS targets ADS-attributable collisions under a CCD reference. A collision is attributable when it occurs for the ADS under test but would have been avoided by the CCD in the same encounter.

Evaluation Datasets

CARS is evaluated across three datasets spanning three continents, three road topologies, and contrasting driving cultures.

Three evaluation datasets spanning three continents and three road topologies

Three evaluation datasets spanning three continents and three road topologies.
(a) nuScenes front-facing camera frame at a Boston (US) intersection (1,000 urban scenes across Boston and Singapore).
(b) RounD overhead drone frame at the Neuweiler (Germany) roundabout (22 recordings of a four-arm yield-priority roundabout).
(c) AD4CHE overhead drone frame on a multi-lane highway (68 recordings across four Chinese cities).
(d) Geographic locations of the four data-collection countries; pins are coloured by dataset.

Demo Scenarios

Adversary (CARS)
Target (AD system under test)
Other traffic agents
(a) Rear-end approach
Adversary closes from behind; target fails to brake in time. FSM: Hard criticality.
(b) Adversary switch
The adversary role switches between agents mid-scenario via context-aware selection.
(c) Lateral cut-in
Adversary merges laterally into target's lane at high closing speed. FSM: Hard criticality.
(d) Rear-end collision
Adversary decelerates ahead of target, forcing a rear-end collision. FSM: Hard criticality.
(e) Front braking
Adversary brakes sharply in front of target with a slight angular offset.
(f) Adversary switch
The adversary role switches between agents mid-scenario via context-aware selection.

Cross-dataset Transfer

The same frozen CARS policy, trained only on nuScenes urban traffic, is applied to two naturalistic drone datasets with contrasting scene geometries. The diffusion generator and target-model checkpoints are held fixed; only a lightweight agent-selection classifier is refitted on each dataset's labelled target–adversary pairs. Both videos are rendered on the original drone photo backgrounds provided with each dataset.

AD4CHE highway traffic
Frozen nuScenes policy applied to a Chinese highway drone recording. The context-aware module continues to identify the most threatening surrounding vehicle, and 76.4% of the generated collisions on AD4CHE pass the FSM responsibility check.
RounD roundabout
Frozen nuScenes policy applied to a four-arm roundabout drone recording at Neuweiler, Germany, a scene geometry absent from the training distribution. 57.5% of the generated collisions on RounD still pass the FSM responsibility check.

Context-aware Adversary Selection

A histogram gradient-boosting classifier re-evaluates all surrounding vehicles at every simulation step; a temporal confirmation gate (Kconf=5) suppresses transient ranking errors so the adversary role tracks the evolving scene. The same selection mechanism transfers to unseen scene geometries when retrained on each dataset's labelled target–adversary pairs.

Context-aware adversary re-selection across datasets

Context-aware adversary re-selection across datasets.
(a) nuScenes.
(b) AD4CHE.
(c) RounD.
Each row shows three bird's-eye-view snapshots from one rollout scenario at t=0, the adversary-switch frame, and the collision frame, with a heatmap strip beneath giving the per-step adversarial probability Padv(t) for candidates A1 (upper) and A2 (lower); darker red indicates higher probability. The target is shown in blue, the currently active adversary in red, and other agents in grey.

Responsibility Attribution

Each generated collision is replayed with a reference driver model controlling the target while the adversary trajectory is held fixed; the scenario is retained only if the reference driver still fails to avoid impact. Attribution is cross-validated under three reference models: FSM (UN R157 primary CCDM), CC-JP (Japanese careful-driver reference), and RSS (formal safety envelope).

a. FSM fails
FSM fails
b. CC-JP fails
CC-JP fails
c. RSS fails
RSS fails
Adversary Target Other agents FSM CC-JP RSS
d. Physical-margin distribution
Max braking deficit by FSM criticality tier
e. Adv speed at collision
Adversary speed at collision
f. Approach urgency
Longitudinal closing speed at collision

Responsibility attribution and collision kinematics on CARS-generated nuScenes scenarios.
(a–c) Representative attribution disagreements in which FSM fails (a), CC-JP fails (b), and RSS fails (c). Adversary trajectory in red; target trajectory in blue; the target's stop position under each reference model is shown as a front-edge line (purple, FSM; orange, CC-JP; green, RSS).
(d) Max braking deficit BD, box-and-whisker per FSM criticality tier (Easy / Medium / Hard); box spans the interquartile range, whiskers extend to the 5th–95th percentiles, points are outliers, and median is annotated. BD > 0 indicates that required stopping distance exceeded the available gap.
(e) Adversary speed at collision, raincloud plot per FSM criticality tier (median annotated).
(f) Longitudinal closing speed at collision, kernel-density estimate per criticality tier (median annotated).

Main Results

Benchmark comparison against existing adversarial generators and ADS-planner robustness tests on the nuScenes validation split.

Benchmark comparison with existing adversarial generation methods

Method Responsibility validity (%) ↑ Diversity ↑ Kinematic risk ↓ Feasibility ↓
FSM CC-JP RSS Hcrit BD+% IP%
Adversarial methods on nuScenes
STRIVE 7.3 5.8 6.1 0.528 53.8 36.39
SafeSim 44.8 44.8 44.8 0.628 48.3 12.94
Bezier-CAT 21.1 36.4 15.0 0.260 84.1 73.08
CARS (K=1 adv) 45.2 35.5 53.2 0.797 66.1 27.40
CARS (Ours) 88.7 79.7 97.1 0.798 22.5 0.04
ADS-planner robustness (fixed CARS adv)
One-component diffusion planner 87.8 80.0 96.7 0.834 21.7 0.04
CTG planner 86.0 80.0 92.4 0.745 29.2 0.02

Validity columns report the percentage of each method's collision scenarios classified as attributable to the target under the three reference models. Hcrit is the normalised Shannon entropy of the FSM Hard/Medium/Easy distribution (higher is more balanced). BD+% is the fraction of scenarios with a positive braking deficit during the encounter. IP% is the scenario-averaged fraction of time steps exceeding any UN R157 feasibility bound (|a|>7 m/s², |jerk|>12.65 m/s³, |alat|>3.0 m/s²). CARS simultaneously achieves the highest attribution (88.7% FSM), balanced severity coverage (Hcrit=0.798), and near-zero kinematic infeasibility (IP=0.04%). Each baseline fails on a different axis: Bezier-CAT and STRIVE drive the adversary beyond physical limits; SafeSim concentrates severity into a narrower tier band; STRIVE also loses most of its collisions under FSM as unattributable. The ADS-planner rows fix the CARS adversary and only replace the target planner, showing that CARS does not depend on the planner architecture used during training.

Cross-dataset Responsibility Attribution

The same frozen CARS policy, evaluated under three CCDMs (FSM, CC-JP, RSS), on three datasets with contrasting scene geometries.

Dataset Scene geometry Episodes FSM valid% ↑ CC-JP valid% ↑ RSS valid% ↑
nuScenes Training urban intersections 408 88.7 79.7 97.1
AD4CHE multi-lane highway 470 76.4 63.8 80.9
RounD four-arm roundabout 927 57.5 52.0 70.7

Valid% = fraction of collisions classified as attributable to the target by the reference driver model (higher is better). AD4CHE and RounD numbers are from the same frozen CARS generator, with only the lightweight agent-selection classifier refitted to each dataset's labelled target–adversary pairs.

362/408
nuScenes collisions
attributable to target
76.4%
AD4CHE highway
cross-dataset attribution
57.5%
RounD roundabout
cross-dataset attribution

Citation

@article{xiao2026cars, title = {Learning Responsibility-Attributed Adversarial Scenarios for Testing Autonomous Vehicles}, author = {Xiao, Yizhuo and Yan, Haotian and Wang, Ying and Zhu, Zhongpan and Zhang, Yuxin and Yan, Xintao and Erden, Mustafa Suphi and Wang, Cheng}, journal = {Under review}, year = {2026}, url = {https://arxiv.org/abs/2605.13751}, note = {\url{https://github.com/RoboSafe-Lab/CARS-code.git}} }