Welcome, AI Builders!

This one touches what you make and how you make it. Apple’s new vision models run locally with near-instant results. Microsoft’s VibeVoice turns a script into a multi-voice show. Tencent pushes translation records and one-photo 3D. Oxford funds AI to speed vaccine design. What changes first for you: camera, mic, workflow, or health?

📌 In today’s Generative AI Newsletter:

  • Apple on device vision models

  • Microsoft VibeVoice 10B multi speaker audio

  • Tencent wins WMT25 and builds 3D worlds

  • Oxford and EIT fund AI vaccine science

Special highlight from our network

Intro to Generative AI mini course: From Colorado University Boulder to the GenAI community, free!

🚀 To kick off the GenAI Academy, GenAI.Works is offering free access to an exclusive University of Colorado Boulder course. 🔓

AI in 5: Intro to Generative AI - mini course

🗓️ Sept 8–12 | 9 AM PT

1h live session per day

Certificate + digital badge

$100 → now FREE

Learn from Prof. Tom Yeh (MIT PhD, CU Boulder) and Larissa Schwartz (UX researcher, PhD student).

Explore image, video, sound, research, and vibe coding.

For anyone starting their AI journey.

🍏 Apple Drops FastVLM and MobileCLIP2 Ahead of iPhone 17 Event

Image Credit: Apple

One week before its “Awe Dropping” event on September 9th, Apple released two new vision language models, FastVLM and MobileCLIP2, built to run locally on Apple devices with near real-time output. Both are available on Hugging Face and highlight Apple’s push to make powerful AI lightweight while keeping data private.

Here’s what Apple introduced:

  • FastVLM family: A visual language model for high-resolution processing, available in 0.5B, 1.5B and 7B parameter versions. The smallest variant runs directly in the browser.

  • MobileCLIP2 performance: 85 times faster and 3.4 times smaller than earlier versions, tuned for Apple silicon to deliver instant captioning, object detection and scene analysis.

  • Local execution: Both models process on-device, reducing latency and keeping content secure without relying on cloud servers.

  • Everyday use cases: From video captioning to text recognition in images, the models act as modular tools for Apple’s expanding AI ecosystem.

With the iPhone 17 event just days away, these releases hint at a tighter integration of Apple silicon, AI and privacy-first design. While rivals chase scale, Apple is positioning its edge: models slim enough to run on your phone, but smart enough to see and understand the world in real time.

🎧 Microsoft Releases 10B-Parameter VibeVoice for Long-Form Speech

Image Credit: Microsoft

Microsoft has introduced a 10B parameter version of VibeVoice, its open-source text-to-speech framework, built to generate multi-speaker, long-form audio. Available under the MIT license, the model can create podcasts up to 45 minutes long in just minutes, with support for contexts as large as 32K.

What makes VibeVoice different?

  • Scalable speech: Synthesizes up to 90 minutes of audio with as many as 4 speakers, a leap over the 1–2 speaker cap of prior systems.

  • Core innovation: Uses continuous acoustic and semantic tokenizers at 7.5 Hz, preserving fidelity while improving computational efficiency.

  • Hybrid design: Combines an LLM for dialogue flow with a diffusion head for high-fidelity acoustic detail, keeping conversations natural and consistent.

  • Safety measures: All generated files include an audible disclaimer and imperceptible watermark to prevent misuse and confirm provenance.

The release follows a growing push to extend AI speech beyond short clips into multi-voice, natural-sounding dialogue. By making VibeVoice open-source and research-focused, Microsoft is staking a claim in long-form generative audio while keeping commercial applications at bay, at least for now.

🌏 Tencent’s New Hunyuan Models for Translation and 3D Worlds

Image Credit: Tencent

Tencent has introduced two major additions to its Hunyuan AI series, one setting new records in global translation benchmarks and the other expanding 3D spatial modeling. Together, they showcase how China’s tech giant is building both linguistic and spatial intelligence into its stack.

This is what Tencent launched:

  • Hunyuan-MT-7B: A 7B parameter translation model that ranked first in 30 of 31 categories at WMT25, the top global machine-translation competition.

  • Chimera edition: A joint system that layers multiple translators into one pipeline for higher accuracy across 33 languages and 5 minority languages.

  • HunyuanWorld-Voyager: It generates 3D-consistent reconstructions from a single photo, exports point clouds directly and supports joystick-guided exploration of generated spaces.

  • Scaled efficiency: Small enough to deploy widely, from heavy servers to lighter edge devices, while keeping performance high.

Voyager currently leads Stanford’s WorldScore benchmark for 3D video generation, while MT-7B has redefined what a small model can achieve in translation. Tencent is positioning Hunyuan as both a linguistic and spatial platform, open-sourced and competitive on benchmarks that matter.

💉 Oxford Secures £118M to Merge AI and Vaccine Science

Samples from coronavirus vaccine trials are handled at an Oxford Vaccine Group laboratory © John Cairns/AP

The University of Oxford has launched a £118M programme with the Ellison Institute of Technology (EIT) to use AI and human challenge trials to fight antibiotic resistance. The initiative, called CoI-AI (Correlates of Immunity–Artificial Intelligence), will combine Oxford’s vaccine expertise with EIT’s advanced AI systems to accelerate vaccine design against hard-to-treat infections.

The program focuses on:

  • Major threats: Streptococcus pneumoniae, Staphylococcus aureus and E. coli, which drive antibiotic resistance worldwide.

  • Human challenge trials: Volunteers are safely exposed to bacteria under controlled conditions, allowing immune responses to be studied in real time.

  • AI integration: EIT’s AI models, supported by Oracle’s computing infrastructure, will analyze immune data to pinpoint the responses that predict protection.

  • Global impact: Backed by a £118M investment, the programme aims to create faster, smarter vaccines while training future leaders in infectious disease research.

Oxford’s Andrew Pollard called it a “new frontier in vaccine science,” while Larry Ellison framed it as laying the groundwork for faster discovery during outbreaks. With antibiotic resistance killing over a million people annually, CoI-AI could mark a turning point in how the world prepares for future health crises.

🚀 Boost your business with us. Advertise where 13M+ AI leaders engage!

🌟 Sign up for the first AI Hub in the world.

Reply

or to participate

Keep Reading

No posts found