Welcome back! China is open-sourcing math engines that outperform humans, NVIDIA is proving small models can rival giants, and OpenAI is learning how a single weak link in its analytics pipeline can expose an entire ecosystem. The breakthroughs are impressive, yet the vulnerabilities look just as sophisticated. This is what maturity feels like in AI.

In today’s Generative AI Newsletter:

DeepSeek releases an open math system that challenges closed labs
NVIDIA shows small orchestrators can match frontier reasoning
OpenAI reports a security incident that exposed sensitive user metadata
Klariqo introduces an AI receptionist for calls and business leads

Latest Developments

A Chinese lab has released DeepSeek-Math-V2, an open-weight math model that scored 118 over 120 in Putnam, an undergraduate contest in the US. It beat a recent International Math Olympiad gold medalist. One model writes full proofs and another checks them because, as the team says, “correct answers don’t guarantee correct reasoning.” That line matters when you picture this system inside trading desks and research labs, approving or rejecting the math behind big decisions.

Here is what is behind the neat benchmark headlines:

  • Mechanism: One model makes several proofs and a verifier scores them, choosing the one with the clearest logic instead of trusting the first attempt.

  • Access: The weights are released under the Apache 2.0 license, which means that anyone with enough space and GPUs can download them. 

  • Limits: Some problems from the testing environment might be included in the training data, so remarkable scores could mix reasoning with recall. 

  • Context: Both OpenAI and DeepMind achieved similar results at the Olympiad with their closed models, while DeepSeek shows comparable skill in public.

It seems less like a guide for those who control these tools. Self-verifying systems can reduce errors in science, finance and engineering. However, the cost of the hardware means only a few institutions can operate both “the genius” and “the examiner.” People who say that this is “democratisation of AI and knowledge at its best” are right about the license but the question of power bill ownership still remains unanswered.

Special highlight from our network

What if slide formatting was no longer your team’s job?

Most teams spend hours turning content into presentations. It’s repetitive work that slows down client delivery and cuts into strategy time.

Plus AI flips that. It converts Word docs into ready-to-use slides using structured templates, layout rules, and built-in charts. No design work needed.

It supports auto-formatting, translations, and custom slide types. You can create internal briefings, client decks, or case studies in minutes.

It integrates with Google Slides and PowerPoint, and outputs clean files your team can edit or share.

Want to try it yourself? → Free trials are now open

NVIDIA and the University of Hong Kong released research showing that progress in AI may come from orchestration rather than scale. Their system, ToolOrchestra, trains a small model to choose when to think on its own and when to call the right tool for the problem. The result is a lightweight conductor that reaches frontier-level accuracy while keeping costs low.

What ToolOrchestra delivers:

  • Task-aware routing: A compact 8B orchestrator decides when internal reasoning is enough and when external tools are needed.

  • Frontier-level scores: It reached 37.1 percent on Humanity’s Last Exam, ahead of GPT 5 and Claude Opus 4.1, with a 2.5x efficiency gain

  • Strong generalization: The orchestrator handled unseen tools and pricing setups without losing accuracy.

  • Selective tool use: Prior agents overused heavy models. ToolOrchestra kept usage tight and targeted, which reduced total cost.

NVIDIA’s work arrives at a moment when the field is rethinking the value of scale. ToolOrchestra turns the focus to coordination and efficiency, and the early signals suggest this approach can accelerate reasoning without massive infrastructure. It hints at a future where the leading systems are not the biggest models but the smartest conductors guiding a growing library of tools.

Special highlight from our network

ToneUp was key in helping GenAI Works grow to millions reached and thousands engaged.

From ideas to scripts, images and polished social posts, it keeps tone, style and messaging unified letting anyone create with the same consistency we used to scale.

We’re now taking ToneUp global, and you can join that journey.

Back GenAI Works and help fuel the future of ToneUp.

Reg CF offering via DealMaker Securities. Investing is risky.

OpenAI has announced a new security issue that originated outside its organization. The attackers sent fraudulent text messages posing as Mixpanel support to trick users into clicking on malicious links. OpenAI now warns people to “be cautious of unexpected communications and verify the sender before taking any action.” Attackers got into the system and exported a dataset that had names, email addresses and rough locations of some API users, as well as information about the devices they used and the websites that sent them there. OpenAI stated that “no ChatGPT conversations or API keys were exposed" but there was enough context to create a convincing email appearing to be from OpenAI.

Here’s how the incident unfolded:

  • Alert: OpenAI is notified when Mixpanel notices strange access after staff receive phishing texts.

  • Leak: The exported information connects real people to real businesses using the API, making a ready-made list for spear phishing.

  • Fix: OpenAI removes Mixpanel from production and makes it harder for monitoring providers to meet certain standards.

  • Cost: Users still like how simple the panels are, but this has made them easier to attack.

In 2023, a ChatGPT bug in an open-source library briefly revealed chat titles and partial billing information. At a larger scale, this situation appears less like an isolated incident and more like a trend. This highlights the importance of continuously monitoring and updating security measures to stay ahead of potential threats. Supporters feel that each threat strengthens the AI supply chain, while critics argue that attackers are merely mapping it more quickly. Ultimately, it is crucial for developers to prioritize security and regularly assess potential vulnerabilities. This ongoing effort will be essential in ensuring the reliability and safety of AI systems in the future.

Klariqo takes care of all of your business chats and phone calls 24 hours a day, 7 days a week. It gives you basic information, tests your knowledge and schedules real appointments in your calendar. You get a front desk without hiring or training anyone or checking your phone or email all the time.

Main functions (and how to use them):

  • 24/7 call pickup: Forward your main number to Klariqo for instant, friendly caller responses, avoiding voicemail or busy tones.

  • Website leads: Add the chat widget to your site to greet visitors, answer common questions and direct serious inquiries to your booking link.

  • Lead qualification: Create a few key questions and let Klariqo tag and prioritize leads that are worth your time.

  • Calendar booking: Connect Google or Outlook so the assistant can offer available time slots and schedule appointments directly in your calendar.

  • Call and chat review: Open transcripts in the dashboard to see exactly what people asked. You can adjust your script based on patterns you notice. 

Try this yourself: 

Select one problem area, such as missed calls after hours or delayed responses to online inquiries. Set up Klariqo, connect your calendar and create three qualification questions for new leads. Review the transcripts after 2-3 days and compare how many conversations would have gone to voicemail or been missed previously. This is how you can decide which ones you want to follow up on personally.

Reply

Avatar

or to participate

Keep Reading