Welcome, AI Insiders!

This week, OpenAI demonstrated its AI prowess with a Math Olympiad gold medalist and a near-win against the world's best coder. However, its new ChatGPT Agent comically failed at the simple task of buying a lamp.

📌 In today’s Generative AI Newsletter:

  • OpenAI’s math model aces Olympiad with 35/42 score

  • Only one human coder beat OpenAI’s model in Tokyo

  • ChatGPT Agent goes live, but early tests show real flaws

Special highlight from our network

The easiest way to design an app? Just describe it.

App Alchemy is the AI-powered design tool built for non-designers, solo founders, and fast-moving teams.

Just describe your app idea in plain English, and App Alchemy instantly turns it into beautiful, editable screens.

Want to match the look of your favorite app?
Upload it as a style guide.
Need changes?
Just chat.
Update buttons, layouts, colors: no Figma skills required.

You can clone templates, collaborate with your team, and export production-ready designs anytime.

It’s the simplest way to bring your app idea to life, without hiring a designer or touching a single line of code.

Try App Alchemy for free. No credit card needed.
Start building today.

🏅 OpenAI’s Math Model Crushes Olympiad-Level Problems

Image source: OpenAI

OpenAI’s latest model has stunned the math world by scoring 35 out of 42 points on the International Math Olympiad a result that puts it on par with gold medal–winning human contestants. Tested under the same conditions as real participants, the model worked through two 4.5-hour exams using only natural language reasoning, no tools or internet access.

Here’s what the model just achieved:

  • Solved 5 of 6 problems from the IMO, the world’s toughest high-school math contest

  • Matched top human scores, evaluated by three former IMO medalists who reviewed the work

  • Used no external tools; only long-form reasoning and proof-writing in natural language

  • Remains unreleased for now, with OpenAI saying it will take months before public access

The score is historic, but it also signals something bigger: AI is entering a new phase of abstract, theoretical reasoning. While critics like Gary Marcus warn about transparency and cost, the result puts OpenAI at the frontier of what it means to “think” in mathematics and who, or what, gets to compete.

đŸ„ˆOne Human Still Stands Between OpenAI and Coding Supremacy

Image source: Psyho (@FakePsyho on X)

OpenAI’s autonomous coding model just competed at the AtCoder World Tour Finals in Tokyo and nearly won. In a historic first, the model faced off live against the world’s top human coders with no human help. Only one person beat it: PrzemysƂaw Dębiak, known as Psyho, who powered through exhaustion to claim victory.

Here’s what went down in Tokyo:

  • The contest lasted 10 hours, with challenges based on optimization and robot-guided mazes

  • OpenAI’s model came in second, just behind Psyho, who beat it by under 10 percent

  • It was the first AI to compete live, solving all problems autonomously with no manual input

  • Psyho celebrated online, saying “Humanity has prevailed (for now!)”

OpenAI CEO Sam Altman congratulated the winner, while reaffirming the company’s prediction that its models will become the world’s top programmers. That moment may not be far off. This time, a human won. The next time, the leaderboard might look very different.

🧠 ChatGPT Agent Wows on Paper, Stumbles in Practice

(Image: © OpenAI)

OpenAI’s new Agent tool transforms ChatGPT into a hands-on digital assistant, capable of completing complex tasks using its own virtual computer. It can shop, plan, browse, analyze, and generate docs — all from a single prompt. But early users say the tool still feels clunky, with real-world glitches slowing it down.

Here’s what reviewers found:

  • Agent uses its own virtual computer, combining browsing, code execution, file editing, and analysis

  • It merges Operator and Deep Research, enabling end-to-end task execution with context preserved throughout

  • Early reviews called it ambitious but buggy, with The Verge comparing it to a “day-one intern”

  • It’s live for Pro, Plus, and Team users, but access is already delayed due to overwhelming demand

The system can read emails, prep slide decks, and browse the web but early reports show it freezing on basic tasks and missing key steps. The vision is strong, but execution still lags. The real test will be how quickly OpenAI turns this early draft into a reliable digital worker that can truly handle your day.

🚀 Boost your business with us. Advertise where 12M+ AI leaders engage

🌟 Sign up for the first AI Hub in the world.

đŸ“Č Our Socials

Reply

or to participate

Keep Reading

No posts found