News
Anthropic brings Artifacts to Claude Code, letting teams share live pages from coding sessions
6+ day, 2+ hour ago (197+ words) the-decoder. com Anthropic brings Artifacts to Claude Code, letting teams share live pages from coding sessions Claude Code now supports artifacts, a feature already familiar from the Claude chat. It lets you turn the results of a Claude Code session…...
Nvidia research shows robots that train themselves through AI coding agents
1+ week, 6+ hour ago (420+ words) Researchers from Nvidia, Carnegie Mellon University, and UC Berkeley are using AI coding agents to teach robots dexterous grasping in the real world. A fleet of eight robots hits up to 99 percent success on tricky tasks. The core idea is…...
Open AI researchers want to predict how often AI models will fail before launch
1+ week, 6+ hour ago (507+ words) Open AI researchers propose a method for predicting how often a new AI model will make mistakes after release. It could fill gaps left by standard safety testing. Before an AI model ships, it goes through safety testing. These tests…...
AI coding agents find the right file but miss the exact lines that matter, study shows
1+ week, 3+ day ago (676+ words) A new benchmark separates code search from the actual fix and exposes a hidden weakness of AI coding agents. They land in the right neighborhood but miss the crucial spots. Until now, AI coding has mostly been judged by the…...
New AI model called "Count Anything" does exactly what it says, and that's harder than it sounds
1+ week, 4+ day ago (310+ words) Getting those counts right has real consequences, whether it's a doctor reading a scan, a farmer estimating crop yields, or a city planner analyzing traffic. Until now, each of these tasks has required its own specialized system. It's a familiar…...
Microsoft CEO Satya Nadella admits he's a token-maxer, too: "It's addictive"
1+ week, 4+ day ago (186+ words) the-decoder. com Microsoft CEO Satya Nadella admits he's a token-maxer, too: "It's addictive" Microsoft CEO Satya Nadella is now warning against "token-maxing," the uncritical use of the most powerful AI models for every task. "The hard truth is that the…...
Google Research's Gemini-SQL2 tops text-to-SQL benchmarks by a wide margin
1+ week, 4+ day ago (187+ words) the-decoder. com Google Research's Gemini-SQL2 tops text-to-SQL benchmarks by a wide margin Google Research unveiled Gemini-SQL2, a new text-to-SQL system built on Gemini 3. 1 Pro. It translates natural language into executable SQL database queries. On the BIRD benchmark, which measures how…...
Microsoft's Skill Opt boosts GPT-5. 5 by using nothing but a trained Markdown file
1+ week, 4+ day ago (752+ words) A simple Markdown file is apparently enough to boost GPT-5. 5 by more than 20 points on procedural tasks. That's the promise of Skill Opt, a method from Microsoft and three Chinese universities that trains instruction documents for AI agents the same…...
Moonshot's open model Kimi K2. 7 Code undercuts GPT-5. 5 and Claude by up to 12x on price per token
1+ week, 4+ day ago (645+ words) Moonshot AI has released Kimi K2. 7 Code, a new AI model built specifically for programming tasks and agent-based coding workflows. The model builds on its predecessor, Kimi K2. 6, and is available as an open-weights version on Hugging Face. According to Moonshot AI,…...
The AI industry's platform trap is starting to look a lot like Microsoft's
1+ week, 5+ day ago (410+ words) the-decoder. com The AI industry's platform trap is starting to look a lot like Microsoft's - Anthropic silently throttled its Claude Fable 5 model for users trying to train competing AI models, only partially reversing course after backlash. - The company also recruited…...