Reinforcement Learning for Foosball

Position:Backend Engineer, Software Engineer, Project Manager Type:Team Project Duration:September - December 2025 Stack:Python, C++, MuJoCo, TQC, SAC, OpenCV, NumPy, PyTorch, GCP

Overview

This project explores reinforcement learning for continuous control within a foosball simulation built in MuJoCo. We trained agents using Truncated Quantile Critics (TQC) and benchmarked against Soft Actor-Critic (SAC) to study policy robustness. Our goal was to build a MuJoCo foosball simulation grounded in real video, then train TQC agents that outperform a SAC baseline in interception rate and control stability.

Summary

Algorithm integration Implemented TQC and revived SAC from the baseline; trained and compared both on the foosball environment.
Vision & trajectory pipeline OpenCV + ArUco calibration and ball tracking for (x,y,t) trajectories from real gameplay as ground truth for the sim.
Custom MuJoCo environment Table from CAD, rods with sliding/rotation, ball in plane; tuned mass, friction, and damping to match real trajectories.
Reward design & training Shaped rewards and termination rules; compared SAC vs TQC (stability, goal rate, episode length); full metrics in the paper.

Why TQC?

Both SAC and TQC are continuous-control reinforcement learning algorithms. However, TQC replaces scalar Q-value estimates with quantile distributions and discards the highest quantiles when computing targets, statistically filtering out overestimated samples to produce smoother and more conservative value predictions. This is important because in foosball, the ball's movements are stochastic and the environment is noisy.

Data Pipeline Details

Ball Trajectory from OpenCV

We extracted ball trajectories from overhead gameplay video to obtain ground-truth (x,y,t) data for validating and tuning the simulator.

OpenCV ball tracking: red ball detected and tracked frame-by-frame on foosball table

Simulation Environment

We built a custom MuJoCo environment to match our physical table and support training TQC and SAC agents. The simulation is calibrated so that ball dynamics and rod control transfer meaningfully from sim to real.

MuJoCo foosball simulation: table with rods, players, and ball in motion

Reward & Episode Termination

We found that without shaping, the sparse “goal only” signal wasn’t enough to learn reasonable play in our time budget.

Results

Course

This project was completed for Computational Aspects of Robotics at Columbia University. The paper below was our final paper, completed as a group.

Research Paper (PDF)