GPT‑5 is smarter across the board, providing more useful responses across math, science, finance, law, and more. It also produces high-quality code, generates front-end UI with minimal prompting, and shows improvements to personality, steerability, and executing long chains of tool calls.
Anthropic released Claude Opus 4.1, an upgrade with state-of-the-art performance in coding, reasoning, and agentic tasks. Available now for paid users and via the API, it offers notable gains for developers, with more updates coming soon.
OpenAI releases gpt-oss-120b and gpt-oss-20b: Apache-2.0 open-weight models with strong tool use and 128k context. 120b nears o4-mini and runs on one 80GB GPU; 20b matches o3-mini and fits 16GB devices. Weights (MXFP4), tokenizer, and tools ship with a safety-vetted model card.
Google DeepMind unveils Genie 3, a real-time world model that generates interactive 720p environments at 24 fps from text prompts, keeping them consistent for minutes. It adds promptable world events, supports embodied-agent research, and launches as a limited research preview.
xAI’s Grok Imagine rolls out on X’s iOS for SuperGrok and Premium+ users, generating images and 15-sec videos from prompts. A “spicy mode” allows NSFW with moderation and celebrity limits; results feel uncanny, but the UX is fast and slick.
At a Reddit AMA, Sam Altman said GPT-5 seemed “dumber” because the autoswitcher failed at launch. He promised fixes, clearer model transparency, doubled Plus rate limits, is considering restoring 4o for Plus, and called the benchmark chart a “mega screwup.”
X will place ads inside Grok’s answers, collapsing the line between utility and promotion. The piece defines AI Neutrality’s pillars (consent, separation, pluralism, transparency, sensitive-context limits), warns of manipulation and privacy risks, and urges user-controlled fetch-agent models.
OpenAI launches GPT-5 days after its open models and despite Altman calling it “the best,” it only slightly beats rivals on some benchmarks. That said, it's pricing ($1.25/M input, $10/M output, $0.125/M cached) pressures Google and undercuts Anthropic.