Reinforcement Learning for Foosball

Position: Backend Engineer, Software Engineer, Project Manager
Type: Team project (in progress)
Duration: September 2025 - Present
Stack: Python, C++, MuJoCo, TQC, SAC, OpenCV, YOLOv11, NumPy, PyTorch, GCP

GitHub Repository

Summary

Algorithm integration Implemented TQC (distributional critics with quantile truncation) and revived SAC (Soft Actor-Critic) from existing baseline codebase.

Data pipeline Calibration + undistortion + ball-state extraction (OpenCV) from real gameplay footage.

Objective

This project explores reinforcement learning for continuous control within a foosball simulation built in MuJoCo (Advanced Physics Simulation). We trained agents using Truncated Quantile Critics (TQC) and benchmarked against Soft Actor-Critic (SAC) from an external codebase to study policy robustness. We want to build a MuJoCo foosball simulation grounded in real video, then train TQC agents that outperform a SAC baseline in interception rate and control stability.

Why TQC?

Both SAC and TQC are continuous-control reinforcement learning algorithms. However, TQC replaces scalar Q-value estimates with quantile distributions and discards the highest quantiles when computing targets, statistically filtering out overestimated samples to produce smoother and more conservative value predictions. This is important because in foosball, the ball's movements are stochastic and the environment is noisy.

Data Pipeline Details

Calibration: Collected checkerboard frames; estimated camera intrinsics/extrinsics (K, rvec/tvec) with OpenCV.
Undistortion: Applied lens undistortion to gameplay video using estimated parameters.
Ball state extraction: Computed per‑frame ball position and velocity from recordings.

Implementation Progress

Baseline environment (external environment): Compiled and ran the public RoboFoosball codebase to get MuJoCo environment.
SAC baseline (external environment): Train from the baseline codebase environment to establish control metrics on GCP.
TQC training (external environment): Implemented TQC and trained on the baseline codebase environment on GCP.
Data collection (for our own environment): Overhead videos on a physical table with varied play styles; initial calibration and ball position/velocity extraction pipeline for foosball data.

Preliminary Results

SAC baseline: Successfully trained on external codebase environment.
TQC on baseline: Successfully trained on external codebase environment.
Calibration: RMS reprojection error 4.56 px; mean error 0.59 px, which is adequate for plane mapping.

Camera intrinsics matrix:

K = [2665.6,    0.0, 1577.0
     0.0, 2693.4, 2260.7
     0.0,    0.0,    1.0]

Next Steps

Build & validate our own MuJoCo environment: encode table geometry, friction/restitution; implement multi-rod control; add reset invariants.
Finalize data: re-collect higher-contrast footage; lock calibration/homography; utilize extracted ball position/velocity data for training.
Train in our own environment: run matched TQC and SAC in our own environment with extracted ball position/velocity data; report quantitative metrics.