Silicon Valley Races to Build New Training Grounds for Smarter AI Agents

Last Updated: September 22, 2025By

Silicon Valley is witnessing a surge of interest in creating simulated “environments” to train artificial intelligence (AI) agents, a technique increasingly seen as vital to advancing the next generation of AI systems.

For years, tech leaders have envisioned AI agents that can independently navigate software, carry out tasks, and streamline daily work. Yet, today’s consumer-facing agents—such as OpenAI’s ChatGPT Agent or Perplexity’s Comet—still fall short of those ambitions. Experts believe that progress hinges on reinforcement learning (RL) environments: virtual workspaces designed to help AI agents practice multi-step tasks.

According to Jennifer Li, general partner at Andreessen Horowitz, major AI labs are developing their own RL environments but are also turning to specialized startups. “Creating these datasets is very complex, so AI labs are also looking at third-party vendors that can create high-quality environments and evaluations. Everyone is looking at this space,” she explained.

The demand has already spurred the rise of startups like Mechanize and Prime Intellect, while established data-labeling firms such as Surge and Mercor are pivoting toward RL environments. Reports suggest that Anthropic, one of the leading AI labs, has even discussed investing over $1 billion in this field within the next year.

At their core, RL environments act as training simulations. They allow an AI agent, for example, to mimic using a browser to purchase socks on Amazon—testing whether it can navigate menus, avoid errors, and make sound decisions. Unlike static datasets, these environments must anticipate unexpected agent behavior, making them significantly harder to design.

This concept is not entirely new. OpenAI experimented with “RL Gyms” as early as 2016, while Google DeepMind’s AlphaGo relied on reinforcement learning within a simulated environment to beat a world champion. However, today’s efforts differ in scale and ambition. Modern environments aim to prepare transformer-based models for more general tasks across multiple applications, rather than specialized, closed systems.

The competition is heating up. Surge, which generated an estimated $1.2 billion in revenue last year working with OpenAI, Google, Anthropic, and Meta, has launched a dedicated division for RL environments. Mercor, valued at $10 billion, is pitching investors on industry-specific environments tailored to coding, healthcare, and law. “Few understand how large the opportunity around RL environments truly is,” said Mercor CEO Brendan Foody.

Even Scale AI, once dominant in data labeling, is trying to regain ground by investing in environments after losing major contracts with Google and OpenAI. Despite setbacks, the company remains active in this evolving race.

For investors and innovators, the ultimate question remains: will RL environments unlock the breakthroughs needed to push AI closer to human-level capability, or will they prove to be just another step in an ongoing, uncertain journey?

Source: Techcrunch

Mail Icon

news via inbox

Get the latest updates delivered straight to your inbox. Subscribe now!