Home / Journal / AI

Agents That Don't Hallucinate Numbers

LLMs are bad at arithmetic. Here is the pattern we use to make our agents reliable with financial data.

The Numeric Hallucination Problem

Language models are trained to predict plausible text, not to perform arithmetic. Ask an LLM to calculate 17% of 3,847 and it will give you a confident, plausible-sounding, and occasionally wrong answer. In a financial agent this is unacceptable.

The Constrained Tool Pattern

Our solution: the LLM never does arithmetic. It calls a calculate(expression) tool that evaluates the expression in Python and returns the result. The LLM is only responsible for identifying which numbers to use and what operation to perform.

def calculate(expression: str) -> float:
    # Whitelist arithmetic operators only
    import ast
    tree = ast.parse(expression, mode='eval')
    allowed = (ast.Expression, ast.BinOp, ast.UnaryOp,
               ast.Num, ast.Constant, ast.Add, ast.Sub,
               ast.Mult, ast.Div, ast.Pow, ast.Mod)
    for node in ast.walk(tree):
        if not isinstance(node, allowed):
            raise ValueError(f"Unsafe expression: {expression}")
    return eval(compile(tree, "", "eval"))

Why Not Code Interpreter?

Code interpreter works but is slow (2ÔÇô4 second cold start) and expensive. Our calculate() tool responds in under 5 ms. For an agent processing 200 financial queries per hour, this matters.

Results

After adding constrained tools, our financial agent's numeric accuracy went from 91% to 99.7% on our benchmark set. The 0.3% failures are all edge cases involving number formatting (lakhs vs millions), not arithmetic errors.

Back to journal
Chat on WhatsApp