Reinforcement Learning for Foosball

Position: Backend Engineer, Software Engineer, Project Manager
Type: Team project (in progress)
Duration: September 2025 - Present
Stack: Python, C++, MuJoCo, TQC, SAC, OpenCV, YOLOv11, NumPy, PyTorch, GCP

GitHub Repository

Summary

Algorithm integration Implemented TQC (distributional critics with quantile truncation) and revived SAC (Soft Actor-Critic) from existing baseline codebase.
Data pipeline Calibration + undistortion + ball-state extraction (OpenCV) from real gameplay footage.

Objective

This project explores reinforcement learning for continuous control within a foosball simulation built in MuJoCo (Advanced Physics Simulation). We trained agents using Truncated Quantile Critics (TQC) and benchmarked against Soft Actor-Critic (SAC) from an external codebase to study policy robustness. We want to build a MuJoCo foosball simulation grounded in real video, then train TQC agents that outperform a SAC baseline in interception rate and control stability.

Why TQC?

Both SAC and TQC are continuous-control reinforcement learning algorithms. However, TQC replaces scalar Q-value estimates with quantile distributions and discards the highest quantiles when computing targets, statistically filtering out overestimated samples to produce smoother and more conservative value predictions. This is important because in foosball, the ball's movements are stochastic and the environment is noisy.

Data Pipeline Details

Implementation Progress

Preliminary Results

Next Steps