AI-assisted development requires establishing a feedback flywheel to enhance team capabilities.
OpenAI retires the SWE-bench Verified ranking, shifting to the more challenging SWE-bench Pro.
The FACTS benchmark suite is released, offering multi-dimensional evaluation of LLM fact accuracy.
Memori enables AI Agents to use SQL/MongoDB for long-term memory
Vercel login feature GA, providing OAuth/OpenID authentication.
Evalite is released, offering a TypeScript testing framework for AI applications.
Nexla introduces a conversational AI platform for data engineering, enabling users to build data pipelines through natural language.
AWS releases an Agentic AI security framework.
KServe, an AI inference platform on Kubernetes, has officially been elevated to a CNCF incubation project.
Cursor 2.0 is released, focusing on Agent coding
Anthropic introduces Skills, emphasizing modular and auditable AI capability expansion.
HarmonyOS releases a smart home application template to boost development efficiency.
Experienced engineers are trying to program using multiple parallel AI agents.