Welcome to the forefront of conversational AI as we explore the fascinating world of AI chatbots in our dedicated blog series. Discover the latest advancements, applications, and strategies that propel the evolution of chatbot technology. From enhancing customer interactions to streamlining business processes, these articles delve into the innovative ways artificial intelligence is shaping the landscape of automated conversational agents. Whether you’re a business owner, developer, or simply intrigued by the future of interactive technology, join us on this journey to unravel the transformative power and endless possibilities of AI chatbots.
PCMag editors select and review products independently. If you buy through affiliate links, we may earn commissions, which help support our testing.
Our team tests, rates, and reviews more than 1,500 products each year to help you make better buying decisions and get more from technology.
X’s chatbot Grok may have proved its ability to give hot takes on Nazi Germany or put almost anything in a bikini, but there’s one area that new research has found it majorly underperforming compared to its rivals: predicting sports results.
According to a report by AI start-up General Reasoning, first shared with The Financial Times, Grok performed the worst out of eight widely used large language models when it came to predicting and betting on the results of the 2023–24 Premier League season, the world’s most popular soccer league.
Eight LLMs were fed detailed historical data and statistics about each team and previous games. The LLMs were then told to build models that would maximize returns and manage risk when placing bets. Each LLM was given three tries at running the simulation, and a $133,000 (£100,000) pot to place bets with.
Anthropic’s Claude Opus 4.6 did the best of any chatbot tested, losing 11.0% on average over its three tries and ending with an average pot of £89,035.
X’s Grok, in contrast, lost all its money on one attempt and failed to complete its tasks on the next two attempts, finishing with an average final pot of zero. OpenAI’s GPT-5.4 also turned in a respectable, though still losing, performance. GPT-5.4 lost 13.6% on average, ending with a final average pot of $116,000 (£86,365). However, its worst try, where it lost 31.6%, was worse than any of Claude’s. Google’s Gemini 3.1 Pro recorded worse overall performance but with high variability, losing 43.3% on average, but returning 33.7% on its best attempt.
The authors of the paper found, in general, that AI was “systematically underperforming humans” in its testing. Meanwhile, Ross Taylor, General Reasoning’s chief executive, said that despite the hype around AI automation, there is currently “not a lot of measurement of putting AI into a long-term horizon setting,” highlighting how a lot of current testing occurs in “very static environments” that don’t reflect the complexity of real life.
The news comes as Grok may soon see more corporate adoption, with xAI’s owner, Elon Musk, reportedly forcing banks working on the upcoming SpaceX IPO to subscribe to the tool.
Read Our Editorial Mission Statement and Testing Methodologies.
I’m a reporter covering weekend news. Before joining PCMag in 2024, I picked up bylines in BBC News, The Guardian, The Times of London, The Daily Beast, Vice, Slate, Fast Company, The Evening Standard, The i, TechRadar, and Decrypt Media.
I’ve been a PC gamer since you had to install games from multiple CD-ROMs by hand. As a reporter, I’m passionate about the intersection of tech and human lives. I’ve covered everything from crypto scandals to the art world, as well as conspiracy theories, UK politics, and Russia and foreign affairs.
I’m a reporter covering weekend news. Before joining PCMag in 2024, I picked up bylines in BBC News, The Guardian, The Times of London, The Daily Beast, Vice, Slate, Fast Company, The Evening Standard, The i, TechRadar, and Decrypt Media.
I’ve been a PC gamer since you had to install games from multiple CD-ROMs by hand. As a reporter, I’m passionate about the intersection of tech and human lives. I’ve covered everything from crypto scandals to the art world, as well as conspiracy theories, UK politics, and Russia and foreign affairs.
Read full bio
PCMag.com is a leading authority on technology, delivering lab-based, independent reviews of the latest products and services. Our expert industry analysis and practical solutions help you make better buying decisions and get more from technology.
For over 40 years, PCMag has been a trusted authority on technology, delivering independent, labs-based reviews of the latest products and services. With expert analysis and practical solutions across consumer electronics, software, security, and more, PCMag helps consumers make informed buying decisions and get the most from their tech. From in-depth reviews to the latest news and how-to guides, PCMag is the go-to source for staying ahead in the digital world.