Anthropic ships Claude Opus 4.1, claims state-of-the-art on real-world coding
Opus 4.1 posts a 74.5% score on SWE-bench Verified and introduces a new “computer use” beta that lets agents click, type and scroll inside live applications.
Opus 4.1 posts a 74.5% score on SWE-bench Verified and introduces a new “computer use” beta that lets agents click, type and scroll inside live applications.
Comet replaces tabs with tasks and ships with an always-on agent that can buy, book and research on your behalf. We took it for a spin.
A minimalist agent framework from Hugging Face is quietly becoming the default for teams who got tired of LangChain abstractions.
After 6 months of preview, LangGraph Platform is now generally available with autoscaling, durable runs, and a visual debugger that actually works.
A new SDK from Stripe lets AI agents create invoices, charge cards and issue refunds — with first-class safety controls.
Assign a GitHub issue to Copilot, walk away, come back to a working pull request. We tested it on 25 real bugs.
Type a sentence, get a deployed app. Replit’s newest agent revision is the closest thing yet to truly no-code AI development.