This project explores reinforcement learning for continuous control within a foosball simulation built in MuJoCo. We trained agents using Truncated Quantile Critics (TQC) and benchmarked against Soft Actor-Critic (SAC) to study policy robustness. Our goal was to build a MuJoCo foosball simulation grounded in real video, then train TQC agents that outperform a SAC baseline in interception rate and control stability.
Both SAC and TQC are continuous-control reinforcement learning algorithms. However, TQC replaces scalar Q-value estimates with quantile distributions and discards the highest quantiles when computing targets, statistically filtering out overestimated samples to produce smoother and more conservative value predictions. This is important because in foosball, the ball's movements are stochastic and the environment is noisy.
K, rvec/tvec) with OpenCV.We extracted ball trajectories from overhead gameplay video to obtain ground-truth (x,y,t) data for validating and tuning the simulator.
We built a custom MuJoCo environment to match our physical table and support training TQC and SAC agents. The simulation is calibrated so that ball dynamics and rod control transfer meaningfully from sim to real.
We found that without shaping, the sparse “goal only” signal wasn’t enough to learn reasonable play in our time budget.
This project was completed for Computational Aspects of Robotics at Columbia University. The paper below was our final paper, completed as a group.