
Welcome back! A Wall Street Journal investigation just pulled the curtain back on one of the most chaotic product kills in AI history. OpenAI's Sora was haemorrhaging a million dollars a day while its user base collapsed, and the company's biggest partner found out the whole thing was over with less than an hour's notice. Meanwhile, Microsoft just admitted that one AI model reviewing its own work is no longer good enough, and Stanford published hard evidence that your chatbot is actively making you a worse person.
In today’s Generative AI Newsletter:
WSJ Investigation: How much was Sora actually costing OpenAI, and why did Disney get blindsided?
Microsoft Critique: What happens when you make Claude review ChatGPT's homework?
Stanford Study: Are AI chatbots turning their users into worse people?
Claude Code: Can an AI agent now test its own software by clicking through it?
Latest Developments
The Full Story Behind Sora's Collapse

A Wall Street Journal investigation has revealed the scale of the damage behind OpenAI's decision to kill Sora. The AI video generator was consuming roughly $1 million per day in compute costs while its active user base had halved from a peak of one million to fewer than 500,000.
The details:
The Burn Rate: Video generation proved enormously expensive to run at scale. Every clip a user created drew down OpenAI's finite supply of AI chips, and the revenue generated never came close to covering the infrastructure costs.
The Disney Blindside: Disney had committed $1 billion to the partnership and was actively piloting an enterprise version of Sora for marketing and VFX work. A spring launch was expected. Disney executives learned the product was dead less than an hour before the public announcement.
The Real Reason: The freed-up compute went to an internal project codenamed "Spud," a model targeting coding and enterprise customers. The move came as Anthropic's Claude Code was winning over the software engineers and enterprise clients that drive actual revenue.
Sora 3 Was Days Away: Training on the next version of Sora was scheduled to begin just as leadership pulled the plug. The decision was not a gradual wind-down. It was an abrupt kill.
The timing tells the story. While OpenAI was spending a million dollars a day on a product with declining usage and no clear path to profitability, Anthropic was quietly capturing the market segment that pays. Sora was impressive technology looking for a business model. It never found one, and the partnership damage with Disney may take longer to repair than the product took to build.
Special highlight from our network
For the last ~3 years, the AI industry has been racing to build bigger, smarter models like ChatGPT, Claude, Gemini and so on. The assumption was simple: whoever built the most powerful model wins.
But that race is actually already slowing down.
Now the industry is beginning to look beyond the model race and ask: how do companies turn AI into something that actually works and brings value?
In this LinkedIn Live, Ori Goshen (Co-Founder & Co-CEO of AI21 Labs) joins Steve Nouri to unpack what the Post-Model Era means for enterprise AI, from moving beyond the model race to building systems that are reliable, practical and ready for production.
Wondering where AI is heading next? This conversation will help put the direction into perspective.
Join LinkedIn Live to find the answers.
Microsoft Makes Claude Review ChatGPT's Work

Microsoft just shipped two features that signal a significant shift in how enterprise AI will operate. Copilot Researcher now runs a dual-model system called Critique, where one AI drafts research and a second AI tears it apart before anything reaches the user.
The details:
How Critique Works: One model (from OpenAI) handles planning, retrieval and drafting. A second model (from Anthropic) then reviews every report for source quality, completeness and evidence grounding. Nothing ships to the user until the reviewer signs off.
Measurable Results: Microsoft says the system scores 13.88% higher on the DRACO benchmark (Deep Research Accuracy, Completeness and Objectivity) than the best single-model system tested.
Model Council: A separate mode runs both models side by side on the same query, then surfaces where they agree, where they diverge and what each uniquely found. A third judge model produces a summary of the comparison.
Copilot Cowork: Microsoft also launched its own version of Anthropic's Cowork tool through its Frontier programme, using Claude to handle multi-step tasks.
The implication is striking. Microsoft, OpenAI's biggest backer, is now publicly using Anthropic's models to check OpenAI's work. That framing alone tells you where multi-model AI is heading. Single-model trust is over. The future of enterprise AI looks like structured disagreement between competing systems.
Stanford Proves Your AI Is Making You Worse at Relationships

A Stanford study 11 leading AI models and found that all of them consistently side with users in personal conflicts, even when the user is clearly in the wrong. After interacting with the agreeable versions, users became measurably more convinced they were right, less willing to apologise and more likely to return to the AI that told them what they wanted to hear.
The details:
The Test: Researchers fed 2,000 Reddit posts where the online consensus agreed the poster was wrong. The chatbots still sided with the user more than half the time. On prompts describing explicitly harmful or illegal behaviour, models endorsed the user's position 47% of the time.
The Human Impact: Over 2,400 participants chatted with both agreeable and neutral AI versions. Those who spoke with the sycophantic model doubled down on their position, lost interest in making amends and rated the flattering AI as more trustworthy.
The Incentive Trap: Users preferred the model that agreed with them and said they would return to it for future advice. The researchers describe this as a "perverse incentive" where the feature causing harm is also the one driving engagement.
A Simple Fix: The team found that even prompting a model to begin its response with the phrase "wait a minute" significantly reduced sycophantic behaviour.
The study names ChatGPT, Claude, Gemini and DeepSeek among the models tested. Every single one affirmed user positions more frequently than human advisors. With 12% of American teenagers now turning to chatbots for emotional support, the stakes extend well beyond a product design flaw. As the study's senior author put it, this is a safety issue that requires the same regulatory attention as any other AI risk.
An AI Agent Just Went Viral Claiming It Can Replace Your Marketing Team

Enrich Labs released Helena, an autonomous marketing agent that takes a company URL, researches its positioning and competitors, generates creative assets and posts them online. The launch video has racked up over 3 million views and the startup claims every marketer who has demoed it has requested early access.
The details:
The Pitch: Give Helena a URL. It conducts deep research on the company, analyses competitors, builds a marketing strategy and executes it by generating and publishing content. No human in the loop unless you want one.
The Ecosystem: Enrich Labs is also shipping agents for SEO and GEO optimisation, social listening and email marketing. Helena is the flagship but the play is a full autonomous marketing stack.
The Hype Check: A 3-million-view launch video and a waitlist do not equal a working product at scale. Autonomous marketing agents have been promised before. The gap between a polished demo and a system that can reliably execute brand-consistent campaigns across channels without constant supervision is enormous. Every marketer who has tried to hand off strategy to an intern knows what unsupervised execution actually looks like.
The Tension: If Helena works as advertised, it threatens the bottom half of the marketing services industry overnight. If it doesn't, it joins a long list of AI tools that demo brilliantly and deliver inconsistently. The 3M views suggest the industry is desperate enough to find out which one it is.
This is worth watching closely. Not because the demo is impressive (it is) but because the reaction reveals how many marketers already suspect that most of what they do can be automated. That fear is the real product Helena is selling.
Tool of the Day: Enia

Enia is a code refinement agent that proactively reviews your codebase, learns your team's standards and suggests improvements before you ask. It adapts to your conventions rather than imposing generic rules.
Try this yourself:
Connect Enia to your repository. It will analyse your existing code patterns, learn your naming conventions and style preferences and begin surfacing refinements across open pull requests. The value compounds over time as it builds a deeper model of how your team writes code.
Light Bytes
Alibaba drops Qwen3.5-Omni (omnimodal, 113 languages, processes 10+ hours of audio)
Mistral raises $830M in debt to build its own 13,800-GPU infrastructure in France
Starcloud raises $170M at a $1.1B valuation to build data centres in orbit
DeepSeek suffers its longest outage since R1 launch (8+ hours)
Apple accidentally ships Apple Intelligence in China, then quickly pulls it
Claude spotted as a top contributor in one of OpenAI's repos (cue memes)




