Exploiting Failure in Evolution
Position:Undergraduate Researcher under Professor Lisa Soros
Institute:Soros Lab, Summer Research Institute
Duration:May 2024 - May 2025
Tools:Python (NumPy, multiprocessing, heapq), Matplotlib/Seaborn, OpenAI Gymnasium, Git
Overview
Algorithms typically try to minimize errors and failures. However, this research seeks to challenge this paradigm by investigating the potential advantages of failure. We explore failure-preserving evolutionary algorithms in the context of the Lunar Lander game environment, where the purple lander is the agent. Variants of FI-2Pop in the Lunar Lander environment are tested whether preserving "failed" individuals improves evolutionary search, revealing reward-hacking behaviors.
Summary
Scalable experiment framework
Modular runners for RS, ES, MAP-Elites (+mortality), FI-2Pop, and FI-2Pop with MAP-Elites.
Performance
Python multiprocessing, batched rollouts (≈6.7× output speedup).
Algorithmic engineering
Mortality in MAP-Elites; FI-2Pop with MAP-Elites (novel variant).

Methods
We ran these algorithms with 100,000 agents and selected the highest fitness agents:
- Random Search (RS): (baseline 1) Search a randomly-initialized group of agents.
- Evolution Strategy (ES): (baseline 2) Over successive generations, regenerate the entire population by mutating the previous generation’s fittest agents.
- MAP-Elites: Maintain “elites”, which represent the fittest agents of a feature niche in a 2D map. + mortality: old agents are removed from the map
- FI-2Pop: Modify ES by maintaining two populations (feasible/infeasible) based on some fitness-based threshold.
-
- MAP-Elites with FI-2Pop: Modify MAP-Elites by maintaining two feature maps (feasible/infeasible).
-
Findings
- Best agents routinely exceeded fitness 200; a characteristic “tap-dancing” reward hack maximized score without landing between flags.
- Top-performing agent achieved a fitness score of 237.278 with endpoint features
(0.591, −0.215).
- Aggregated heatmaps of elites across 20 trials revealed distinct coverage patterns for each evolutionary method (see poster figure set).
Key Contributions
- Added mortality to MAP-Elites; implemented heap-based parent selection for ES; built reproducible visualization pipelines.
- Parallelized evaluations with Python
multiprocessing, achieving approximately 6.7× faster trial throughput.
- Validated RS/ES/MAP-Elites baselines against prior results to ensure correctness.
Poster
Presented a research poster at the Lida Orzeck '68 Poster Session at Barnard College on July 31, 2024 and
at the Barnard Computer Science Senior Research Symposium on April 25, 2025.