Anthropic ships Claude Opus 4.1, claims state-of-the-art on real-world coding
Opus 4.1 posts a 74.5% score on SWE-bench Verified and introduces a new “computer use” beta that lets agents click, type and scroll inside live applications.
Opus 4.1 posts a 74.5% score on SWE-bench Verified and introduces a new “computer use” beta that lets agents click, type and scroll inside live applications.
Multi-file edits, terminal awareness and tight git integration are turning Cursor from a fancy IDE into something closer to a junior engineer that lives in your editor.
Assign a GitHub issue to Copilot, walk away, come back to a working pull request. We tested it on 25 real bugs.
A new “agent mode” lets Gemini take multi-step actions in your project, with surprisingly solid reasoning about codebases of any size.