NVIDIA just released a multimodal model that can caption exactly what you point to in images and videos — think clickable explainability for machines. UPS is exploring humanoid robots from Figure AI to automate logistics. Alibaba launched Qwen3, a suite of open-source language models that challenge top US labs. And Hugging Face unveiled a $100 robotic arm built for tinkerers and powered by reinforcement learning.
• NVIDIA’s new model captions regions with pinpoint precision
• UPS could deploy humanoid robots powered by Figure AI
• Alibaba’s Qwen3 outperforms top models and goes open source
• Hugging Face releases a cheap, printable AI-powered robot arm
Special highlight from out Network
Heading to RSA Conference? Here's where the real talk is happening.
Tomorrow: join Tumeryk (makers of the AI Trust Score™) for an off-the-record happy hour just 5 minutes from Moscone — with generous support from Redport, SandboxAQ, and SoftServe.
🧠 Expect real conversations around GenAI risk, trust, and accountability — with the people building the future of enterprise AI security.
🫱 Big thanks to our partners:
💼 PR Partner: SparkPR
📢 Media Partner: GenAI.Works
📍 April 30 | 4–6 PM | Hawthorn SF
🥂 Premium cocktails. Limited spots.
👉 Request your invite
(RSA badge and confirmed RSVP required)
Can’t make it?
No worries: you can still see what we’re building.
Image Credits: NVIDIA
NVIDIA released Describe Anything 3B, a new multimodal AI model built for generating detailed captions about specific regions in images and videos. Unlike typical captioning tools, this model was designed from the ground up to handle focused, localized descriptions using inputs like points, boxes, scribbles, or masks.
Details:
• Region targeting: Accepts custom visual inputs to specify what to describe
• Dual-view input: Merges the full image with a high-resolution crop of the region
• Efficient model design: Uses gated cross-attention without increasing token size
• Works on video: Tracks regions across time, handling motion and occlusion
• Custom training pipeline: Built from 1.5 million region-labeled examples using segmentation and web images
• New evaluation benchmark: Scores captions by accuracy of attributes, not wording
• Strong performance: Outperforms GPT-4o and VideoRefer on seven datasets with 67.3 percent on DLC-Bench
Describe Anything marks a shift toward more controllable and precise multimodal models. As AI systems are increasingly asked to explain the world around them, whether for self-driving cars, assistive tech, or robotics, being able to describe exactly what a user points to is critical.
Image Credits: Figure AI
United Parcel Service is in ongoing talks with California startup Figure AI to potentially deploy humanoid robots in parts of its logistics network, Bloomberg reported. While no formal deal has been announced, the discussions started last year and reflect UPS’s growing focus on advanced automation.
What’s happening:
• Ongoing talks: UPS and Figure have been discussing use cases since 2024
• Robot video: Figure posted a video of its robot sorting parcels near a conveyor, hinting at logistics applications
• UPS automation push: Company already uses robotic arms and AI sorting in its Velocity hubs
• Prior robotics partners: UPS has worked with Dexterity, Pickle Robot, and others for tasks like unloading and motor control
• Advanced AI model: Figure released Helix, a Vision-Language-Action system for natural-language tasking
• Human-like mobility: Figure 02 now walks with learned, realistic motion, trained in hours instead of years
The race to automate physical labor is heating up. From Amazon to BMW, companies are eyeing humanoid robots to bridge labor gaps and reduce costs. Figure’s robots, powered by general-purpose AI and natural movement, represent a shift from fixed automation to flexible, adaptive machines.
Image Credit: Alibaba
Alibaba launched Qwen3, a new family of eight open-weight language models, available under the Apache 2.0 license. The models were released via platforms like Hugging Face and reflect China’s push to match US-led AI labs.
What’s new:
• Eight new models: Range from 600M to 235B parameters, trained on 36 trillion tokens
• Flagship performance: Qwen3-235B rivals OpenAI’s o1, Grok-3, and DeepSeek-R1 in benchmark tests
• Hybrid thinking: Switches between fast replies and deep reasoning to balance speed and accuracy
• MoE architecture: Uses specialized expert models for more efficient answers
• 119-language support: Designed for wide international reach
• Strong code and tool use: Outperforms in tool-calling, coding, and instruction following
• Public access: Largest public model is Qwen3-32B, already beating OpenAI’s o1 in coding benchmarks
• Cloud ready: Deployed on services like Fireworks AI and Hyperbolic
Qwen3 is Alibaba’s boldest open-source move so far, and part of a broader shift toward state-backed AI labs competing globally. It comes as China faces chip export bans and increasing scrutiny from US policymakers. Open models like Qwen3 offer a powerful workaround, giving developers access to advanced systems outside of US companies.
Image Credits: HuggingFace
Hugging Face unveiled the SO-101, a 3D-printable robotic arm that starts at $100, developed in partnership with French robotics firms and hardware suppliers. This follows last year's SO-100 and signals Hugging Face's growing robotics ambitions.
Key Developments:
• SO-101 launch: Follow-up to SO-100, built with The Robot Studio, WowRobo, Seeed Studio, and PartaBot
• Low-cost robotics: Base price is $100, with assembled versions priced up to $500 depending on tariffs and suppliers
• Easier to build: Faster assembly with improved motors that reduce friction and support arm weight
• AI training: Uses reinforcement learning to perform tasks like sorting Lego blocks
• Camera equipped: Allows vision-based interaction and learning
• Robotics expansion: Hugging Face recently acquired Pollen Robotics and plans to sell Reachy 2 humanoids
• Open development: Developers can download, modify, and improve code for both arms and humanoids
This launch ties into a growing trend of low-cost, developer-friendly robotics powered by AI. With major labs focused on high-end humanoids, Hugging Face is betting on community-driven, affordable robotics as a new frontier.
🚀 Boost your business with us—advertise where 10M+ AI leaders engage
🌟 Sign up for the first AI Hub in the world.
Reply