productApril 10, 202611 min read···

Legacy vs AI-Native Compliance: How We Went From 35 People to 5 + Agents

We rebuilt our compliance stack around AI agents. 35 people became 5 + agents doing the same output. Here is the architecture and what broke along the way.

Tomás Kenny

Tomás Kenny

CTO & Co-founder

The short version: we rebuilt our compliance stack around AI agents. We went from 35 people manually reviewing alerts to 5 people plus agents doing the same output. Same coverage, same regulatory posture, a fraction of the headcount. This is an engineering post, not a sales post. I want to explain how the stack is wired, what legacy compliance actually is, and what we learned along the way.

I am writing this from the CTO chair at Gu1. We ship AI-native compliance infrastructure to LatAm fintechs. KYC, AML, and KYT in one API. 54 active clients across Brazil, Mexico, Argentina, and Colombia. Our compliance team used to look like every other compliance team in the industry. It does not anymore.

What legacy compliance actually is#

If you have never worked inside a bank or a fintech, the word "compliance" can sound abstract. It is not abstract. It is a very specific operating model. Strip the jargon and it looks like this.

A team of analysts sits in front of a queue. The queue is filled by a rules engine. The rules engine reads every transaction, every new user, every document, and asks a set of questions someone wrote down in a spec doc. Questions like:

  • If transaction amount > X and country = Y and pattern = Z, flag it.
  • If user's document score < threshold, hold the onboarding.
  • If the counterparty name fuzzy-matches a sanctions list, escalate.

The rules are static. A human wrote them. A human maintains them. A new fraud pattern shows up, somebody opens a ticket, engineering pushes a new rule, QA reviews, deploy, the queue starts catching it. Or failing to catch it. The loop is slow because humans are in every step of it.

Each analyst clears 50 to 100 alerts a day. If the business grows, you hire more analysts. The math is linear. Double the traffic, double the team. There is no leverage in this model.

False-positive rates on legacy compliance systems reach up to 98% in some published industry studies.

Read that again. 98 out of 100 alerts a legacy system raises are noise. The analyst clicks through 98 false positives to catch 2 real ones. This is not a rumor. It is documented in multiple compliance-tech surveys through 2024 and 2025. The productivity tax is enormous and it compounds.

Onboarding under this model takes 3 to 7 days for higher-risk tiers. That is not because the verification itself takes days. It is because the alert sits in a queue waiting for a human to look at it. The user does not care about your queue. They churn.

What "AI-native" actually means (not marketing)#

"AI-native" is a loaded phrase. Every legacy vendor in the space now has an AI page on their site. I want to be specific about what we mean.

AI-native, for us, means four things.

First, models learn patterns instead of humans writing rules. We do not maintain a 4,000-line rulebook. We train and re-train models on labeled regional data. When the distribution shifts, the model shifts. When it drifts too far, we catch it in monitoring and re-train.

Second, inference runs on every single transaction in real time. Not batch. Not a nightly report. The decision comes back in the same request that initiated it. If a user sends money at 2:03 AM from a new device in a new country, the decision lands before the UI finishes its loading spinner.

Third, false positives live below 5%. This is our empirical target, not a marketing claim. When we blow past it on a specific client, it is an incident and we treat it like one. Models that cry wolf are not cheaper than rules. They are just faster at wasting analyst time.

Fourth, and this is the one that matters for unit economics: the stack scales with compute, not headcount. We add capacity by provisioning GPUs and tuning batch sizes, not by posting job ads. If a client 10x's their traffic over a quarter, nothing about our team changes.

Onboarding, under this model, takes seconds to minutes. Most flows complete before the user would even reach for their phone to check what is taking so long.

The real cost of legacy compliance#

Before we walk through our stack, I want to be honest about why this matters. Compliance is not a cost center people talk about at dinner, but the numbers are large.

Compliance spend sits at 15 to 20 percent of fintech operational budget as of 2026.

That is one fifth of your operating cost going to an activity that does not differentiate your product. No user signs up for your bank because your sanctions screening is thorough. They sign up because onboarding is fast and the app does not reject them for no reason.

Analyst throughput is the bottleneck. Queues back up. SLAs slip. Users churn. Customer support escalates. The cost is not only the salary line for the analysts. It is also the churn from users who never finished signing up. It is the opportunity cost of the engineers who keep maintaining the rules engine instead of shipping product.

At scale, the economics stop working. A mid-size LatAm fintech running legacy compliance on a serious transaction volume is bleeding money on a per-user basis. We have seen it at clients before they migrated to us.

How the Gu1 stack is structured#

Let me walk through what we actually ship. Three layers, with agents sitting on top of all three.

1. KYC layer#

This is the front door. Identity verification, liveness check, document OCR, biometrics, UBO lookup.

The identity piece has to be tuned for LatAm IDs. That is not a minor detail. CPF in Brazil, CURP in Mexico, DNI in Argentina, Cédula in Colombia. Each one has its own format, its own check digits, its own issuing authority, its own failure modes. A model trained on US driver's licenses will get this wrong in ways you will not notice until your fraud rate spikes.

Our OCR pipeline is trained on regional document distributions. We collect and label at scale, we validate on hold-out sets per country, and we keep country-specific heads on the model. The liveness check is a 3D active check that catches the current generation of deepfake attacks. That is not a forever-fix. It is a fix for what is being attempted in 2026.

The KYC flow is tiered. Low-risk users get a basic check in seconds. Higher-risk users get pulled into enhanced verification, with additional documents, additional sources, and a risk-weighted path. The tiering is driven by a risk score, which is driven by signals we collect at the same moment as onboarding. Device, IP, behavior, time of day, referral source, velocity.

2. AML layer#

The AML layer does transaction monitoring, pattern detection, and sanctions screening.

Pattern detection is where the ML actually earns its keep. Classical AML was built on thresholds. "More than X dollars in 24 hours across more than Y counterparties." Those rules exist for a reason, and we still run a handful of them as guardrails. But they miss almost everything that matters. Sophisticated laundering structures money specifically to stay under the thresholds.

The models we train look at graph structure, temporal patterns, counterparty clusters, account tenure, and dozens of derived features. They catch structuring and layering patterns that a threshold rule cannot catch by definition.

Screening covers OFAC, UN, EU, and country-specific PEP and sanctions lists. The fuzzy matching is language-aware, which matters in LatAm because a Spanish or Portuguese rendering of a name can confuse a matcher built for English conventions.

3. KYT layer#

KYT (Know Your Transaction) is real-time analysis of each transaction as it happens.

Risk scoring happens per transaction, not per user. This is an important distinction. A low-risk user can make a high-risk transaction. A high-risk user can make a low-risk transaction. Scoring the user once at onboarding and then trusting that score for a year is how you miss account takeovers.

Behavioral context feeds the scoring: the device the transaction is coming from, the network, the velocity of recent activity, the time of day, whether any of it looks like the user's baseline. The decision comes back in tens of milliseconds.

Agents on top#

This is the part that changed our ops.

Agents read every alert the three layers produce. They triage. They resolve clear cases. They escalate ambiguous ones to a human with a full context packet: what triggered the alert, what the user's baseline looks like, what similar cases resolved as in the past 30 days. The human does not have to reconstruct the situation. They read a brief, make the call, and move on.

Agents draft SAR and regulatory reports automatically. A human reviews and submits. The drafting was historically the most tedious part of a compliance analyst's day and it is the part that LLMs are genuinely good at.

The split ends up roughly 90/10. Agents handle about 90 percent of routine work. Humans handle the 10 percent of edge cases where judgment actually matters. That 10 percent is not small. It is where regulators care most, and we staff it with people who can read a regulation and make a call.

What this changed for the team#

We did not just fire people and call it productivity. The shape of the team changed.

The engineering team grew. We invested heavily in the platform, the model training pipeline, the observability, the drift detection. Ops shrank. Those are different skill sets. We shifted hiring accordingly.

The job description for "compliance analyst" changed at Gu1. They do not review alerts one at a time. They run agents, tune policies, review escalations, and own the relationship with the regulator. It is a higher-leverage role, and the people who stayed took a step up in what they were doing day to day.

The ratio of engineers to ops people inverted. If you had drawn our org chart two years ago, you would have seen a broad compliance team with a small platform team next to it. Now it is a broader platform team with a smaller, senior compliance group. The work got harder and more interesting for everyone who stayed.

Money aside, I think this is the more important change. Compliance work used to be the role where bright people burned out clicking through queues. That is not a good use of anyone's time.

What's still hard#

I do not want to give the impression that any of this is done.

Regulations change in each country. Brazil's BCB publishes new rules. Mexico's CNBV updates their guidance. Argentina's BCRA shifts. Colombia's SFC moves. The agents need to adapt. We maintain per-country policy packs and we version them. When a rule changes, it is an engineering event, not a compliance event.

Edge cases in informal-economy flows do not look like Europe-trained models expect. A lot of LatAm payment activity runs through channels that a model trained on European SEPA data will score as suspicious because it has never seen anything like it. We train regionally. We have to. Off-the-shelf models from international vendors fail here in predictable ways. This is not a complaint about those vendors. It is a statement about what happens when you deploy a model outside its training distribution.

Model drift is the other hard one. Fraud patterns change. User behavior changes. The model that worked six months ago is not the model we want running in production today. We re-train on a monthly cadence, and we have drift monitors that will page us if the distribution shifts faster than that. Drift detection is a whole sub-problem. We have an internal team that owns it.

And then there is the obvious point. 50 percent of fraud now uses AI on the attacker side, per Feedzai's 2025 work. The floor is rising. The tools being used against us are getting better. The defense has to keep moving, and it has to be AI-native because you cannot fight inference-time attacks with a nightly batch job.

Why we are sharing this#

There are 2,800+ LatAm fintechs operating in the region right now, per Finnovista's most recent count. A lot of them are running legacy compliance stacks that are bleeding them on a per-user basis. The LatAm fraud detection market is projected to go from $1.74B in 2025 to $9.14B in 2034, a 20.2% CAGR. The pressure on compliance is going to get more intense, not less.

If you are building in this space, you do not need to be us. You do need to be honest about whether your current stack is pattern-matching against 2020's threat model or 2026's. The answer for most teams is 2020. That is fixable.

If you want the full picture on how we handle KYC specifically, the KYC LatAm Complete Guide walks through the country-by-country reality. If you are knee-deep in AML problems, AML Challenges in LatAm Fintechs has the specifics. For broader fraud patterns we see, read Fraud Prevention in Emerging Markets. And if you want the intro to what we do here, start at Welcome to Gu1.

We ship an API. You can try it. That is the pitch.


Related reading:

Share this post

Get new posts in your inbox

One email when we publish. No spam. Unsubscribe whenever you want.