Capability Evaluations

Modulo Research has partnered with Hidden Variable Limited on projects supporting Anthropic’s Frontier Red Team. This collaborative work includes developing an eval testing the degree to which models can build reinforcement learning pipelines, which we understand is in use at Anthropic and that they have shared with GDM. Hidden Variable & Modulo have also conducted investigations on Anthropic’s behalf of AI models’ capacity to generate economically valuable products, specifically focusing on simple computer games.

Additionally, we led the analysis team on a recent collaborative study investigating LLM persuasive capabilities (link), and are conducting other investigations of LLM capabilities to be announced in the coming months.

While not an evaluation of dangerous capabilities per se, see our recent preprint FindTheFlaws: Annotated Errors for Detecting Flawed Reasoning and Scalable Oversight Research for evaluations of frontier model abilities to detect and correctly characterize errors in long-form solutions to difficult problems.