Why our agents are versioned, not "latest"
Pinning model versions is the most boring infrastructure decision we ever made. It's also the single feature our customers thank us for the most. A note on why every Soufio agent ships immutable.
Architecture decisions, lessons from incidents, and the unglamorous engineering behind features that ship.
Pinning model versions is the most boring infrastructure decision we ever made. It's also the single feature our customers thank us for the most. A note on why every Soufio agent ships immutable.
How we reduced p50 on embedding endpoints by leaning into prediction-level caching with sticky session keys.
The hardest part of a sales-chat agent isn't the chat. It's knowing exactly when to step out of the conversation and into the CRM.
Cold starts kill latency. Here's how we keep enough hot containers in Almaty without burning capacity at 4am.
How an Almaty furniture business tripled lead volume after consolidating their AI stack into Soufio.
Retries are hard when the response is non-deterministic. We solved it with a single header. Inside the design.
How we evaluate and roll out third-party models behind the same versioned endpoint without breaking customer contracts.
One issue a month. Engineering deep dives, not marketing.