NewsBytes Stage
    Hindi
    More
    In the news
    Narendra Modi
    Amit Shah
    Box Office Collection
    Bharatiya Janata Party (BJP)
    OTT releases
    Hindi
    NewsBytes Stage
    India
    Business
    World
    Politics
    Sports
    Technology
    Entertainment
    Auto
    Lifestyle
    Career
    Visual Stories
    Find Cricket Statistics

    Download Android App

    Follow us on
    • Facebook
    • Twitter
    • Linkedin
    Home / News / Technology News / Musk's xAI may have fudged Grok 3's AI benchmark results
    Summarize
    Next Article
    Musk's xAI may have fudged Grok 3's AI benchmark results
    Grok 3's performance on AIME 2025 under scrutiny

    Musk's xAI may have fudged Grok 3's AI benchmark results

    By Dwaipayan Roy
    Feb 23, 2025
    10:25 am

    What's the story

    Elon Musk's AI firm, xAI, has been accused by an OpenAI employee of releasing deceptive benchmark results for Grok 3.

    The controversy started when xAI shared a graph on its blog, showing Grok 3's performance on AIME 2025. The test is a compilation of math problems from a recent mathematics exam.

    The graph showed two versions of Grok 3, beating OpenAI's best model. However, the OpenAI employee pointed out that the graph missed a crucial performance metric for their model.

    Benchmark controversy

    xAI's graph under scrutiny

    The missing data point was the AIME 2025 score at "cons@64" for o3-mini-high, a metric that gives a model multiple attempts to solve each problem in a benchmark.

    Some experts even question the validity of AIME as an AI benchmark. However, it is often used to assess a model's mathematical capabilities.

    Metric omission

    Omission of 'cons@64' could distort comparison

    The term "cons@64" refers to "consensus@64," a metric that allows an AI model 64 tries to solve each problem in a benchmark.

    The most commonly generated responses are then considered the final ones.

    This metric can greatly improve models' benchmark scores and leaving it out of a graph could easily mislead people to believe in one model's superiority over another.

    Performance

    Grok 3 models trail behind OpenAI's in certain metrics

    When assessed at "@1" — the first score the models got on the benchmark — both Grok 3 Reasoning Beta and Grok 3 mini Reasoning performed worse than o3-mini-high.

    Grok 3 Reasoning Beta also trailed OpenAI's o1 model at "medium" computing by a small margin.

    Nevertheless, xAI still touts Grok 3 as the "world's smartest AI."

    Defense stance

    Defending company amid AI benchmark controversy

    In response to the accusations, xAI's Igor Babushkin defended his company's actions.

    He argued that OpenAI has previously released similarly misleading benchmark charts, albeit only comparing the performance of its own models.

    He said this in an attempt to justify xAI's omission of certain data points in their graph showcasing Grok 3's performance against OpenAI's models.

    Facebook
    Whatsapp
    Twitter
    Linkedin
    Related News
    Latest
    Elon Musk
    OpenAI
    xAI

    Latest

    Bangladesh Cricket Board pondering over Bangladesh's tour of Pakistan Bangladesh Cricket Board
    Why Virat Kohli's presence could lift India in England? Stats Virat Kohli
    Google Workspace accounts gain access to Gemini Live feature Google
    Adani Group deploys India's 1st hydrogen-powered truck in Chhattisgarh Adani Group

    Elon Musk

    Musk launching Grok 3 tomorrow: When, how to watch event X
    Elon Musk yet to disclose finances, potential conflicts of interest Donald Trump
    Musk-led DOGE scraps $21M funding for increasing India's voter turnout Indian Government
    Why xAI's Colossus supercomputer is raising environmental concerns in US xAI

    OpenAI

    Alibaba claims its new AI model can beat DeepSeek, ChatGPT Artificial Intelligence and Machine Learning
    SoftBank in talks to invest $25B in OpenAI SoftBank Group
    DeepSeek's AI model now available on Microsoft's Azure, GitHub platforms NVIDIA
    OpenAI's Sam Altman to visit India amid ongoing legal challenges Sam Altman

    xAI

    X unveils enhanced Communities feature to boost user engagement X
    Elon Musk's xAI is enhancing Grok chatbot with multimodal inputs ChatGPT
    Elon Musk's xAI nears completion of $24 billion valuation deal Elon Musk
    Elon Musk plans supercomputer to power next-gen Grok AI chatbot Elon Musk
    Indian Premier League (IPL) Celebrity Hollywood Bollywood UEFA Champions League Tennis Football Smartphones Cryptocurrency Upcoming Movies Premier League Cricket News Latest automobiles Latest Cars Upcoming Cars Latest Bikes Upcoming Tablets
    About Us Privacy Policy Terms & Conditions Contact Us Ethical Conduct Grievance Redressal News News Archive Topics Archive Download DevBytes Find Cricket Statistics
    Follow us on
    Facebook Twitter Linkedin
    All rights reserved © NewsBytes 2025