Detecting and Fixing Tool Call Errors With `limbic-tool-use`

Introduction If you’re building AI agents that interact with external tools and APIs, you know that tool calls are both essential and error-prone. Even when your agent’s reasoning looks perfect, the actual API calls can fail in subtle ways - wrong endpoints, malformed parameters, or incorrect data types. These issues are particularly challenging because they’re often hidden behind seemingly reasonable responses. ...

October 10, 2025 · Quotient Team

Why Language Models Hallucinate: It Was the Context All Along

OpenAI’s recent paper on why language models hallucinate highlights how current evaluation methods create the wrong incentives, rewarding models for guessing rather than admitting uncertainty. ...

September 16, 2025 · Julia Neagu

Building Reliable AI Financial Agents with Automatic Trace Analysis

These systems are now used to parse SEC filings and earnings reports, extract and summarize risk factors, analyze client portfolios, flag potential compliance issues, and even generate draft reports for advisors and regulators. Some are embedded in client-facing experiences, where they answer questions about accounts, balances, and investment performance in real time. The outputs of these systems directly influence investment decisions, regulatory filings, and client communications. In this environment, the margin for error is effectively zero. ...

September 3, 2025 · Quotient Team

How to Detect Hallucinations in Retrieval Augmented Systems: A Primer

Hallucinations—model-generated outputs that appear confident yet contain fabricated or incorrect information—remain one of the peskiest issues facing AI engineers today. As Retrieval- and Search-Augmented Systems have proliferated, systematically identifying and mitigating hallucinations is now critical. ...

May 28, 2025 · Julia Neagu

What OpenAI’s Sycophancy Problem Teaches Us About Using User Data to Evaluate AI

On April 25th, OpenAI shared a surprising update: after introducing thumbs-up/down feedback from ChatGPT users into their GPT‑4o fine-tuning process, the model got noticeably worse. ...

May 2, 2025 · Julia Neagu

Eval-driven development for the AI Engineer

While there is currently an explosion of developers working in the AI space, only 5% of LLM-based projects lead to significant monetization of new product offerings. Many products remain stuck in the proof-of-concept or beta phase and never make their way to users. ...

August 6, 2024 · Julia Neagu