The image model that thinks for itself

Welcome back {{firstname}}! OpenAI has pushed Nano Banana off the top with an image model that plans, searches the web and checks its own work before generating. Google has turned Gemini 3.1 Pro into a research agent that ships with PitchBook and S&P plugged in. Meta is recording its own employees to train AI agents. And SpaceX has put a $60 billion option on Cursor.

In today’s Generative AI Newsletter:

Images 2.0: Why has text inside AI-generated images suddenly started to work?
Gemini goes Max: What happens when a research agent arrives with PitchBook and FactSet already wired in?
Meta MCI: What exactly is your employer allowed to do with a month of your keystrokes?
SpaceX and Cursor: What does a $60 billion option actually buy?

Latest Developments

OpenAI's new image model plans before it paints

OpenAI released ChatGPT Images 2.0 today. The model plans, searches the web for reference material and self-checks its outputs before generating an image. It also swept Arena's text-to-image leaderboard by a wide margin, ending the long run Google's Nano Banana 2 had held at the top.

The details:

Resolution and throughput: Up to 2K resolution, eight images from a single prompt, aspect ratios from 3:1 ultrawide to 1:3 tall.
Text rendering: Multilingual output handles non-Latin scripts and fine-grained small text, which previous models consistently mangled.
Thinking mode: Searches the web live for references before generating. Paid plans only.
Where to use it: ChatGPT, Codex and the gpt-image-2 API. Sam Altman compared the jump to going from GPT-3 to GPT-5 in one release.

The reasoning approach to image generation is new. Every leading model until now has been one-shot: prompt in, image out. Images 2.0 separates planning from rendering, which is why text inside images finally looks like text. For anyone producing visual creative at scale, this closes the last gap between AI output and production-ready design. Expect the Nano Banana era to end quickly.

Special highlight from our network

Turn your next meeting into a slide deck in seconds

Supernormal captures every call automatically. No bot in the call, no complex setup.

Ready-to-use slide templates turn what was said into structured, client-ready decks:

Agency pitch deck built from your discovery call
Consulting pitch deck built from client discussion
Project kickoff presentation built from your planning meeting

Pick a template, connect your meeting as a reference, and prompt to tailor to your needs.

Try slide templates for free

Special highlight from our network

Did Voice AI just fix calls, finally?

Waiting on hold, repeating yourself, navigating menus – it’s a broken experience.

The good news is that voice agents fix this problem easily. They listen, understand and respond instantly, making it feel like a real conversation.

Tools like ElevenLabs make this easy to build: fast, scalable agents that don’t just talk, they book, update, they act.

It just makes calling way less painful (for everyone involved).

Build your voice agent

Google's research agent prices the analyst by the prompt

Google launched Deep Research and Deep Research Max this morning. Both run on Gemini 3.1 Pro and sit inside NotebookLM, generating research reports from the open web, uploaded files or any Model Context Protocol server. The benchmarks claim a meaningful jump in retrieval and reasoning over Opus 4.6 and GPT 5.4.

The details:

Same engine: The research backbone that powered NotebookLM is now an agent, replacing Google's December preview of Deep Research.
Max tier: Deep Research Max combines open-web search with MCP connectors and private file uploads. Charts and infographics generate natively in the report.
Paid data partners: PitchBook, S&P and FactSet are building MCP servers that pipe premium financial data straight into the research output.
Private-only mode: Users can turn off the open web entirely and run the agent across only their private corpus.

Research-heavy work is the softest target AI has in enterprise. Analysts at banks, consultants at strategy firms and junior lawyers at magic-circle outfits all bill hours that mostly involve reading and synthesising documents. Google has now turned that into an API call any developer can wire into a product. The next round of vertical-specific partnerships will tell us which industries move first.

Meta is logging employee keystrokes to train AI agents

Meta has begun recording screenshots, keystrokes, mouse movements and app activity on US employees' work laptops, according to internal memos seen by Business Insider. The programme is called the Model Capability Initiative. There is no opt-out.

The details:

Scope: Logs activity in VSCode, Google Chat, Gmail and Metamate, Meta's internal AI assistant. Developers are the heaviest targets.
No opt-out: CTO Andrew Bosworth reportedly told staff there is no option to decline participation.
Layoffs in scope: Around 8,000 employees are due to exit on 20 May. MCI began logging their activity a month before their end date.
Stated purpose: Meta framed the programme as helping its models "get better simply by doing their daily work."

Robotics labs have spent years filming humans performing physical tasks to teach systems when to grasp or step. Meta is running the same playbook for software, except the demo subjects are its own staff and many are being captured a month out from losing their jobs. Expect lawsuits first, then union-style agreements about AI training data rights. This is the first enterprise-scale example of knowledge work being turned directly into training data.

SpaceX puts a $60 billion option on Cursor

SpaceX claims it has secured the rights to acquire Cursor for $60 billion later this year, or to pay $10 billion for ongoing joint work. The two are building SpaceXAI, Elon Musk's attempt to catch OpenAI's Codex and Anthropic's Claude Code in agentic coding. Cursor is separately raising a $2 billion round with a16z, Nvidia and Thrive Capital reportedly in the mix.

The details:

Stated goal: SpaceX framed the arrangement as a path to "the world's best coding and knowledge work AI."
Cursor's trajectory: Valuation has climbed sharply over the past year on the strength of its IDE and coding agent.
Independence path: The new round would give Cursor enough capital to remain independent if SpaceX does not exercise the buy option.
xAI reboot: Musk is using SpaceXAI to restart his developer story after ceding ground to Codex and Claude Code.

Musk's pattern is to buy reach when he cannot build it fast enough. He did it with Twitter. He did it with xAI and x.ai. This one would be bigger by an order of magnitude. The optionality matters more than the deal itself. Either Cursor stays independent with a nuclear war chest, or Musk ends up owning the sharpest coding interface on the market. Either way, the top of the coding-AI table gets a third serious player.

Tool of the Day: Exa Deep Max

Exa released Deep Max today. It is a SOTA agentic search tool that tops rival deep-search agents on accuracy while running 20 times faster. The agent plans queries, runs multiple searches in parallel and delivers a synthesised answer with full source attribution. Built for analysts, researchers and anyone buried in browser tabs.

Try this yourself: Go to exa.ai and open Deep Max. Paste in a research brief that normally takes you an afternoon. Something like "Compare the five biggest differences between Gemini 3.1 Pro and Opus 4.7 for agentic coding." Check the output and sources against what you would usually put together yourself.

Light Bytes

Bitcoin: Killing Satoshi premieres at Cannes: The first studio-quality AI feature film debuts in May with Gal Gadot, Pete Davidson and Casey Affleck. 154 crew, 107 actors and 55 AI artists shot 200 locations from one soundstage in 20 days. $70M budget against a $300M conventional projection.
Genspark launches Build on Opus 4.7: A vibe-coding tool that generates apps and websites from text prompts. Goes up against Lovable, v0 and Replit Agent, with Claude's coding lead baked in.
Jerry Tworek starts Core Automation: OpenAI's former research VP has launched a new lab with founders poached from OpenAI, Anthropic and DeepMind. The stated mission is "an AI to build AI."
Meta takes three more from Thinking Machines Lab: Another round of departures from Mira Murati's lab brings the total to seven founding members now at Meta.
Google open-sources DESIGN.md: A portable config file from Stitch that teaches AI agents a project's colours, accessibility rules and brand system.
Deezer hits 75K AI tracks a day: That is 44% of all new uploads, but they draw between 1 and 3% of streams. Deezer says 85% are flagged as fraudulent.

The image model that thinks for itself

Latest Developments

OpenAI's new image model plans before it paints

Turn your next meeting into a slide deck in seconds

Did Voice AI just fix calls, finally?

Google's research agent prices the analyst by the prompt

Meta is logging employee keystrokes to train AI agents

SpaceX puts a $60 billion option on Cursor

Tool of the Day: Exa Deep Max

Light Bytes

Reply

Keep Reading

GenAI.community