Exploiting Failure in Evolution
Position: Undergraduate Researcher under Professor Lisa Soros
Institute: Soros Lab, Summer Research Institute
Duration: May 2024 - May 2025
Tools: Python (NumPy, multiprocessing, heapq), Matplotlib/Seaborn, OpenAI Gymnasium, Git
Summary
Scalable experiment framework
Modular runners for RS, ES, MAP-Elites (+mortality), FI-2Pop, and FI-2Pop with MAP-Elites.
Performance
Python multiprocessing, batched rollouts (≈6.7× output speedup).
Algorithmic engineering
Mortality in MAP-Elites; FI-2Pop with MAP-Elites (novel variant).
Overview
Algorithms typically try to minimize errors and failures. However, this research seeks to challenge this paradigm by investigating the
potential advantages of failure. We explore failure-preserving evolutionary algorithms in the
context of the Lunar Lander game environment, where the purple lander is the agent.
Variants of FI-2Pop in the Lunar Lander environment are tested whether preserving “failed”
individuals improves evolutionary search, revealing reward-hacking behaviors.
Methods
We ran these algorithms with 100,000 agents and selected the highest fitness agents:
- Random Search (RS): (baseline 1) Search a randomly-initialized group of agents.
- Evolution Strategy (ES): (baseline 2) Over successive generations, regenerate the entire population by mutating the previous generation’s fittest agents.
- MAP-Elites: Maintain “elites”, which represent the fittest agents of a feature niche in a 2D map. + mortality: old agents are removed from the map
- FI-2Pop: Modify ES by maintaining two populations (feasible/infeasible) based on some fitness-based threshold.
-
- MAP-Elites with FI-2Pop: Modify MAP-Elites by maintaining two feature maps (feasible/infeasible).
-
Findings
- Best agents routinely exceeded fitness 200; a characteristic “tap-dancing” reward hack maximized score without landing between flags.
- Top-performing agent achieved a fitness score of 237.278 with endpoint features
(0.591, −0.215).
- Aggregated heatmaps of elites across 20 trials revealed distinct coverage patterns for each evolutionary method (see poster figure set).
Key Contributions
- Added mortality to MAP-Elites; implemented heap-based parent selection for ES; built reproducible visualization pipelines.
- Parallelized evaluations with Python
multiprocessing, achieving approximately 6.7× faster trial throughput.
- Validated RS/ES/MAP-Elites baselines against prior results to ensure correctness.
Poster
Presented a research poster at the Lida Orzeck '68 Poster Session at Barnard College on July 31, 2024 and
at the Barnard Computer Science Senior Research Symposium on April 25, 2025.
View Research Poster (PDF)