Meta’s AI Benchmarks May Not Tell the Full Story
Benchmarking Controversy: Experimental vs. Public Models Meta deployed an “experimental chat version” of Maverick, optimized specifically for conversational tasks, to achieve high scores on LM Arena. This version was not the same as the one released to developers, leading to concerns about the validity of the benchmark results. LM Arena acknowledged the issue, stating that