What a code review actually tells you about an AI company - and what it does not

A codebase review is necessary but not sufficient in AI due diligence. The most important signals are often found in the relationship between code, data, team process and operating discipline.

A code review is one of the most useful parts of technical due diligence. It is also one of the easiest to over-interpret.

Code can reveal architecture, discipline, technical debt, security habits and engineering maturity. It can also hide the most important risks in an AI company: weak data provenance, brittle model evaluation, manual operational workarounds, poor product judgement and team dependencies.

The codebase matters. It is not the whole company.

What code review can show

A strong review can identify whether the system has a coherent architecture or has grown by accident. It can reveal whether the team separates concerns, tests important paths, manages dependencies, protects secrets, handles errors and monitors production behaviour.

It can also show how hard the company will find the next stage of growth. If every customer requires a bespoke branch, onboarding will be painful. If data pipelines are fragile, model quality will drift. If deployment is manual, reliability will depend on individuals rather than process.

These are real investment risks.

What code review cannot show alone

Code does not prove that the training data is legitimate, representative or durable. It does not prove that model performance is stable across customer segments. It does not show whether users trust the workflow. It does not reveal whether the commercial proposition is strong enough to support the technical cost base.

It also does not always show how much human effort sits behind the product. Some AI products appear automated but rely heavily on manual review, spreadsheet operations or founder intervention. That may be acceptable at an early stage, but it must be understood.

AI codebases have specific diligence questions

For AI-native companies, reviewers should look beyond application code. How are datasets versioned? How are experiments tracked? How are models evaluated before release? What monitoring exists after deployment? Can the team reproduce results? How are prompts, embeddings, model versions and third-party dependencies controlled?

The question is not whether the code is elegant. The question is whether the company can learn, deploy and operate safely.

Technical debt needs commercial context

Not all technical debt is bad. A young company should not be punished for making pragmatic decisions. The issue is whether the debt matches the stage and whether the team understands the consequences.

Debt becomes dangerous when it prevents onboarding, blocks enterprise security requirements, makes performance unpredictable, or means only one person can safely change the system.

Good diligence distinguishes rational shortcuts from structural weakness.

Team process is part of the evidence

The way a team explains its code is often as revealing as the code itself. Strong teams can describe trade-offs, known weaknesses and the order in which they would improve things. Weak teams deny obvious issues or describe every problem as a quick fix.

Reviewers should ask how decisions are made, how incidents are handled, how customer issues become product work, and how technical priorities are balanced against sales pressure.

The right conclusion

A code review should feed into a wider diligence view. It should connect architecture, data, model governance, security, product operations and team capability.

Used properly, it can identify where investment will be needed after close and whether the company has the maturity to absorb it. Used narrowly, it can create false confidence.

In AI diligence, code is evidence. It is not the verdict.

From the frontier.