Learning Responsibility-Attributed Adversarial Scenarios
for Testing Autonomous Vehicles

Yizhuo Xiao¹ · Haotian Yan² · Ying Wang³ · Zhongpan Zhu^2,4 · Yuxin Zhang⁵ · Xintao Yan⁶ · Mustafa Suphi Erden¹ · Cheng Wang^1,*

¹ School of Engineering and Physical Sciences, Heriot-Watt University, Edinburgh, U.K.
² State Key Laboratory of Autonomous Intelligent Unmanned Systems, Tongji University, Shanghai, China
³ College of Computer Science and Technology, Jilin University, Changchun, China
⁴ University of Shanghai for Science and Technology, Shanghai, China
⁵ National Key Laboratory of Automotive Chassis Integration and Bionics, Jilin University, Changchun, China
⁶ Department of Civil Engineering, The University of Hongkong, Hongkong, China
^* Corresponding author: cheng.wang@hw.ac.uk

Paper Code arXiv

Abstract

Establishing trustworthy safety assurance for autonomous driving systems (ADSs) requires evidence that failures arise from avoidable system deficiencies rather than unavoidable traffic conflicts. Current adversarial simulation methods can efficiently expose collisions, but generally lack mechanisms to distinguish these fundamentally different failure modes.

Here we present CARS (Context-Aware, Responsibility-attributed Scenario generation), a framework that integrates responsibility attribution directly into adversarial scenario generation. CARS combines context-aware adversary selection with a generative adversarial policy optimized in closed-loop simulation to construct collision scenarios that are both physically feasible and diagnostically attributable.

Across benchmark datasets spanning heterogeneous national traffic environments, CARS consistently discovers feasible collision scenarios with high attribution rates under multiple regulation-prescribed careful and competent driver models. By coupling adversarial generation with normative responsibility assessment, CARS moves simulation testing beyond collision discovery toward the construction of interpretable, regulation-aligned safety evidence for scalable ADS validation.

Method Overview

Evaluation Datasets

CARS is evaluated across three datasets spanning three continents, three road topologies, and contrasting driving cultures.

Demo Scenarios

Adversary (CARS)

Target (AD system under test)

Other traffic agents

(a) Rear-end approach
Adversary closes from behind; target fails to brake in time. FSM: Hard criticality.

(b) Adversary switch
The adversary role switches between agents mid-scenario via context-aware selection.

(c) Lateral cut-in
Adversary merges laterally into target's lane at high closing speed. FSM: Hard criticality.

(d) Rear-end collision
Adversary decelerates ahead of target, forcing a rear-end collision. FSM: Hard criticality.

(e) Front braking
Adversary brakes sharply in front of target with a slight angular offset.

(f) Adversary switch
The adversary role switches between agents mid-scenario via context-aware selection.

Cross-dataset Transfer

The same frozen CARS policy, trained only on nuScenes urban traffic, is applied to two naturalistic drone datasets with contrasting scene geometries. The diffusion generator and target-model checkpoints are held fixed; only a lightweight agent-selection classifier is refitted on each dataset's labelled target–adversary pairs. Both videos are rendered on the original drone photo backgrounds provided with each dataset.

AD4CHE highway traffic
Frozen nuScenes policy applied to a Chinese highway drone recording. The context-aware module continues to identify the most threatening surrounding vehicle, and 76.4% of the generated collisions on AD4CHE pass the FSM responsibility check.

RounD roundabout
Frozen nuScenes policy applied to a four-arm roundabout drone recording at Neuweiler, Germany, a scene geometry absent from the training distribution. 57.5% of the generated collisions on RounD still pass the FSM responsibility check.

Context-aware Adversary Selection

A histogram gradient-boosting classifier re-evaluates all surrounding vehicles at every simulation step; a temporal confirmation gate (K_conf=5) suppresses transient ranking errors so the adversary role tracks the evolving scene. The same selection mechanism transfers to unseen scene geometries when retrained on each dataset's labelled target–adversary pairs.

Context-aware adversary re-selection across datasets

Responsibility Attribution

Each generated collision is replayed with a reference driver model controlling the target while the adversary trajectory is held fixed; the scenario is retained only if the reference driver still fails to avoid impact. Attribution is cross-validated under three reference models: FSM (UN R157 primary CCDM), CC-JP (Japanese careful-driver reference), and RSS (formal safety envelope).

Main Results

Benchmark comparison against existing adversarial generators and ADS-planner robustness tests on the nuScenes validation split.

Benchmark comparison with existing adversarial generation methods

Method	Responsibility validity (%) ↑			Diversity ↑	Kinematic risk ↓	Feasibility ↓
Method	FSM	CC-JP	RSS	H_crit	BD⁺%	IP%
Adversarial methods on nuScenes
STRIVE	7.3	5.8	6.1	0.528	53.8	36.39
SafeSim	44.8	44.8	44.8	0.628	48.3	12.94
Bezier-CAT	21.1	36.4	15.0	0.260	84.1	73.08
CARS (K=1 adv)	45.2	35.5	53.2	0.797	66.1	27.40
CARS (Ours)	88.7	79.7	97.1	0.798	22.5	0.04
ADS-planner robustness (fixed CARS adv)
One-component diffusion planner	87.8	80.0	96.7	0.834	21.7	0.04
CTG planner	86.0	80.0	92.4	0.745	29.2	0.02

Validity columns report the percentage of each method's collision scenarios classified as attributable to the target under the three reference models. H_crit is the normalised Shannon entropy of the FSM Hard/Medium/Easy distribution (higher is more balanced). BD⁺% is the fraction of scenarios with a positive braking deficit during the encounter. IP% is the scenario-averaged fraction of time steps exceeding any UN R157 feasibility bound (|a|>7 m/s², |jerk|>12.65 m/s³, |a_lat|>3.0 m/s²). CARS simultaneously achieves the highest attribution (88.7% FSM), balanced severity coverage (H_crit=0.798), and near-zero kinematic infeasibility (IP=0.04%). Each baseline fails on a different axis: Bezier-CAT and STRIVE drive the adversary beyond physical limits; SafeSim concentrates severity into a narrower tier band; STRIVE also loses most of its collisions under FSM as unattributable. The ADS-planner rows fix the CARS adversary and only replace the target planner, showing that CARS does not depend on the planner architecture used during training.

Cross-dataset Responsibility Attribution

The same frozen CARS policy, evaluated under three CCDMs (FSM, CC-JP, RSS), on three datasets with contrasting scene geometries.

Dataset	Scene geometry	Episodes	FSM valid% ↑	CC-JP valid% ↑	RSS valid% ↑
nuScenes Training	urban intersections	408	88.7	79.7	97.1
AD4CHE	multi-lane highway	470	76.4	63.8	80.9
RounD	four-arm roundabout	927	57.5	52.0	70.7

Valid% = fraction of collisions classified as attributable to the target by the reference driver model (higher is better). AD4CHE and RounD numbers are from the same frozen CARS generator, with only the lightweight agent-selection classifier refitted to each dataset's labelled target–adversary pairs.

362/408

nuScenes collisions
attributable to target

76.4%

AD4CHE highway
cross-dataset attribution

57.5%

RounD roundabout
cross-dataset attribution

Citation

@article{xiao2026cars, title = {Learning Responsibility-Attributed Adversarial Scenarios for Testing Autonomous Vehicles}, author = {Xiao, Yizhuo and Yan, Haotian and Wang, Ying and Zhu, Zhongpan and Zhang, Yuxin and Yan, Xintao and Erden, Mustafa Suphi and Wang, Cheng}, journal = {Under review}, year = {2026}, url = {https://arxiv.org/abs/2605.13751}, note = {\url{https://github.com/RoboSafe-Lab/CARS-code.git}} }