Kshitij Goel Robotics Researcher

This page is kept for archival purposes only. The actual project webpage is at rislab.org/projects/catchrl.html.

Learning Agile Intruder Interception using Differentiable Quadrotor Dynamics

arXiv · 2026

How can a quadrotor intercept an agile intruder using only a monocular direction measurement, without knowing the intruder's position or distance?

This paper presents a methodology for learning a control policy to intercept an intruder using the 3D direction unit vector to the intruder and the interceptor state. Prior deep reinforcement learning approaches assume either relative position or distance to the intruder is available, but this information is not readily accessible in real-world applications that employ passive, monocular camera sensors. Instead, we propose a solution that leverages an analytical policy gradient method using differentiable quadrotor dynamics to learn agile interception at speeds up to 10 m/s. The proposed approach outperforms baseline methods that utilize simplified point mass dynamics by an average of 30%.

Figures

Interception Rollout
Interception Rollout Plot of a rollout in simulation using the proposed control policy. The interceptor (blue) aggressively pursues the intruder (red) flying at 5.8 m/s and collides with it at the star (yellow) labeled interception. The acceleration along the trajectories taken by each agent is plotted from blue (low acceleration) to yellow (high acceleration). Both vehicles are assumed to operate in open air environments in the absence of obstacles.
Policy Architecture
Policy Architecture Overview of the network architecture for the interception control policy. The 3D direction vector to the intruder from the interceptor is used as an input to the policy along with the interceptor linear velocity and rotation matrix flattened to a 9-dimensional vector. MLP encoders process the intruder and interceptor data separately to produce 192-dimensional embeddings, which are summed and passed to a GRU with 192 hidden units. The output of the GRU is fed through a single hidden layer before a linear head to generate the interceptor control commands.
Intruder Trajectories
Intruder Trajectories Example (a) ellipse, (b) spiral, and (c) lemniscate intruder trajectories. The ellipse trajectories are used for training while all are used in evaluation. The parameters of the trajectories are randomly sampled (Appendix A.1). A subset of these parameters, raxis, ρar, and zrate, are visualized and denote the semi-axis, aspect ratio, and vertical ascent rate for the spiral, respectively. The trajectories are colorized according to the magnitude of the acceleration. (a)–(c) present increasingly challenging interception tasks.
APG vs. PPO Training
APG vs. PPO Training Training success rate and episode length variation with environment steps demonstrate that APG is more sample-efficient than PPO.
APG vs. PPO Success Rates
APG vs. PPO Success Rates The success rates obtained while varying intruder speeds during evaluation demonstrate that using APG enables a higher interception accuracy than PPO for all three types of intruder trajectories.
Dynamics Model Training
Dynamics Model Training Training success rate and episode length variation with environment steps show a similar rate of convergence when simplified point mass dynamics and nonlinear quadrotor dynamics models are used with APG.
Dynamics Model Success Rates
Dynamics Model Success Rates The success rates obtained while varying intruder speeds during evaluation demonstrate that using the nonlinear quadrotor dynamics model enables a higher interception accuracy compared to using simplified point mass dynamics for all three types of intruder trajectories.
Quad APG Rollouts
Quad APG Rollouts Example rollouts with acceleration heatmaps of the proposed Quad APG policy. Collision between the intruder (red) and the interceptor (blue) occurs at the point marked with a star.
Control Effort — APG vs. PPO
Control Effort — APG vs. PPO Average acceleration (top row) and jerk (bottom row) across intruder speeds for successful rollouts using APG and PPO. We observe that using the PPO-trained policy leads to higher acceleration and jerk for the interceptor compared to APG.
Control Effort — Quad vs. Point Mass
Control Effort — Quad vs. Point Mass Average acceleration (top row) and jerk (bottom row) across intruder speeds for successful rollouts using the simplified point mass (PMD) and nonlinear quadrotor dynamics (Quad) models with APG. Leveraging the Quad dynamics model provides interception with lower control effort than the PMD case.

The training objective decomposes classical parallel-navigation guidance into two terms: a line-of-sight alignment loss that forces the relative velocity to point along the line of sight, and a closing-velocity loss that drives the interceptor to aggressively close the gap. Intruder state details are privileged — used only inside the loss during training and never at inference, where the policy sees only its own state and the 3D unit direction to the intruder. A GRU-based policy network implicitly estimates the target’s velocity and acceleration from temporal sequences of unit directions, and outputs mass-normalized thrust and yaw commands executed by an onboard PD attitude controller. The learned policy generalizes from ellipse training trajectories to out-of-distribution spiral and lemniscate targets.

Acknowledgments

This research was supported in part by an AI2C Seed grant and the NVIDIA Academic Grant Program.

BibTeX

@article{intruder-interception-2026,
  title={Learning Agile Intruder Interception using Differentiable Quadrotor Dynamics},
  author={Michael Anoruo, Xiaoyu Tian, Abhishek Rathod, Timothy Naudet, Thomas Canchola, Eric Sturzinger, Kshitij Goel, and Wennie Tabib},
  journal={arXiv preprint arXiv:2607.02472},
  year={2026}
}