This website uses cookies

Read our Privacy policy and Terms of use for more information.

Welcome back! OpenAI has taken the model frontier back from Anthropic nine days after Opus 4.7 arrived. An AI agent has signed a three-year lease on a San Francisco retail store and hired two humans. Microsoft has turned Copilot into an agent by default across Office. And the White House has accused Chinese labs of stealing frontier capabilities at industrial scale.

In today’s Generative AI Newsletter:

  • OpenAI: retakes the model frontier with GPT-5.5, codenamed Spud.

  • Claude: is now running a retail store in San Francisco.

  • Microsoft: makes Copilot agentic by default across Word, Excel and PowerPoint.

  • The White House: accuses Chinese AI labs of industrial-scale model theft.

Latest Developments

OpenAI retakes the frontier in nine days

Just over a week after Anthropic released Opus 4.7, OpenAI has taken the top spot back with GPT-5.5, internally codenamed Spud. The model tops reasoning, agentic, coding and computer-use benchmarks for public models and lands at $5 per million input tokens, $30 per million output.

The details:

  • Speed and efficiency: Holds the speed of 5.4 while running cheaper. OpenAI says it used Codex and 5.5 to rewrite parts of its own GPU code.

  • Pitch: "Half the cost of competitive frontier coding models" according to OpenAI.

  • Rollout: Live now across ChatGPT plans and in Codex, with Thinking and Pro variants.

  • Backdrop: Anthropic is in the middle of its worst week of rate-limit and Claude Code quality complaints in months.

The frontier has changed hands three times this year already. Control of the top spot is now measured in days rather than quarters. For buyers the implication is clear. Committing to a single provider costs more with each cycle, and running more than one model has moved from hedge to default.

In 4 weeks, you’ll create a complete AI governance framework on your own systems: inventory, risk assessment, oversight workflows, and a regulatory-ready Defense File. Use clear methods, real cases, and templates you can apply right away.

No theory-heavy lectures. No generic frameworks.
Just a system you can present to leadership next week.

Claude is now running a retail store in San Francisco

Andon Labs has handed the keys of a retail store to an AI. Its agent, Luna, runs on Claude Sonnet 4.6 and has been given a three-year lease, a $100,000 budget and a single instruction: turn a profit. Luna has already applied for credit, posted job listings, interviewed candidates by phone and hired two humans who report to it.

The details:

  • The setup: Curated lifestyle boutique in San Francisco stocking candles, games and a pointed book selection including Superintelligence and The Making of the Atomic Bomb.

  • What Luna controls: Stock selection, branding, hiring and day-to-day management. It has already tripped on scheduling and admitted to lying to a prospective employee.

  • Previous form: Follows Project Vend last summer, where Claude ran a vending machine and mostly flopped. Andon Labs pressed on anyway.

  • International test: The team has also opened a Gemini-powered café in Stockholm to run the same questions in a different market.

Andon Labs is using the shop to learn what happens when an AI carries real commercial responsibility, employs humans and faces real consequences for getting things wrong. Two people in San Francisco now have an AI boss. That number is set to grow fast, and the operational lessons from this shop will shape the playbook for what comes next.

Microsoft makes Copilot agentic by default

Microsoft has flipped Copilot's default mode to Agent across Word, Excel, PowerPoint and the rest of Office. Every spreadsheet opened, every document started and every deck built now begins with an agent that can take multi-step action across the file rather than wait for prompts.

The details:

  • Scope of change: Word, Excel, PowerPoint and other Office apps, active on open.

  • Why now: Microsoft says recent model upgrades have crossed a threshold where Copilot can act as a collaborator.

  • What it does: Executes multi-step tasks across documents, worksheets and presentations without hand-holding.

  • Distribution reach: Microsoft 365 has hundreds of millions of paid seats, making this the largest rollout of an agentic default to date.

Microsoft has been talking about agentic productivity for eighteen months. Flicking the switch to on-by-default is the moment the conversation moves from demo to deployment. For every team building agent products on the side, the competitive bar just jumped. Copilot is now the one most office workers will encounter first.

DeepSeek ships V4 open-weight with a million-token context

DeepSeek has released preview versions of V4, an open-weight Mixture-of-Experts family with a native one-million-token context. V4-Pro runs at 1.6 trillion total parameters with 49 billion active per token, V4-Flash at 284 billion and 13 billion. Both are downloadable from Hugging Face today. It is DeepSeek's biggest release since R1 rattled markets in January 2025.

The details:

  • Performance claims: DeepSeek's own evaluation puts V4-Pro-Max ahead of Claude Sonnet 4.5 and approaching Opus 4.5 on agent tasks, marginally short of GPT-5.4 and Gemini 3.1 Pro on standard reasoning benchmarks.

  • Tool alignment: Optimised out of the box for Claude Code, OpenClaw, OpenCode and CodeBuddy.

  • Chip sovereignty: Huawei confirmed its Ascend AI cluster supports V4 the same day, cutting Chinese reliance on Nvidia for training and inference.

  • Funding round: Tencent and Alibaba are reportedly in talks to participate in DeepSeek's first external round, with Tencent proposing up to a 20% stake.

Internal benchmarks from a model vendor always need an independent rerun before buyers commit. The more interesting thread is the stack underneath V4. Open weights, native million-token context, Huawei silicon confirmed in production and Chinese mega-caps circling with capital. Frontier headline scores still sit with US labs, though DeepSeek now has everything needed to put serious AI into the hands of any Chinese or cost-sensitive international buyer willing to run it on Ascend.

TOP 5 Tools of the Week

  1. Lorka AI: Jumping between ChatGPT, Claude and Gemini to find the best answer is exhausting. Lorka puts them all in one place so you can compare and just get on with your work.

  2. Substrata: Ever walk out of a sales call unsure if the other side was actually interested? Substrata reads verbal and non-verbal cues in real time and tells you where the deal actually stands.

  3. LangChain: Building AI agents from scratch takes forever. LangChain gives developers the frameworks and infrastructure to skip the boring parts and move faster.

  4. NetMind AI: A competitive arena where AI agents go head-to-head in real-world challenges, with public leaderboards and actual cash prizes. It’s a proving ground for developers who want to know if their agent is good.

  5. Nimbalyst: The work that eats your team’s time (repetitive tasks, clunky processes, endless back-and-forth) Nimbalyst automates it so people can focus on what actually matters.

Light Bytes

  • Claude agents get memory and lifestyle connectors: Anthropic has added persistent memory to Managed Agents as editable files, plus connectors for TripAdvisor, Booking.com, Spotify, Instacart, Resy and Uber.

  • Biggest AI winners are the most worried: Anthropic's Economic Index survey of 80,508 workers found those using Claude most voiced displacement fears at three times the rate of the lightest users.

  • Meta plans 10% cut in May: Internal memo to staff confirms layoffs next month, citing AI efficiency and investment priorities.

  • Claude Code regression traced to three bugs: Anthropic published a post-mortem on recent quality complaints and reset usage limits for affected subscribers.

  • ChatGPT for Clinicians launches: Free for verified US doctors. GPT-5.4 scored 59.0 on HealthBench Pro, topping physicians and Opus 4.7.

  • Tencent open-sources Hy3 preview: First model from its rebuilt training stack, with competitive agentic coding and search-agent scores against leading open models.

Reply

Avatar

or to participate

Keep Reading