Due DiligenceAI InvestmentTechnical AssessmentVCPEAI StartupEnterprise AI

AI Due Diligence for Investors: The Complete Technical Assessment Guide

The pitch deck is impressive. The demo is slick. But you don't know what's behind the API endpoint. Here's the technical due diligence framework that separates real AI from vaporware.

March 12, 202611 minMaryna Vyshnyvetska

AI Due Diligence for Investors: The Complete Technical Assessment Guide

You're looking at an AI company. The pitch deck is impressive. The demo is slick. The market size slide says $200B. The founder drops "transformer architecture" and "proprietary model" in every other sentence.

But you don't know what's behind the API endpoint. You don't know if the "proprietary AI" is a GPT wrapper with a system prompt. You don't know if the training data was scraped from Reddit without a license. And neither does the CTO who joined six months ago.

That's where AI technical due diligence comes in -- and it's fundamentally different from evaluating a standard SaaS company.

Why AI Due Diligence Is Different

Standard software technical due diligence evaluates code quality, architecture, scalability, and team capability. These matter for AI companies too, but they're not sufficient. AI introduces categories of risk that don't exist in traditional software.

Model risk. Is this actually AI, or is it a decision tree wearing a trench coat? We've seen companies raise Series A rounds on what turned out to be elaborate rule engines with a chatbot UI bolted on. The inverse also happens -- genuine ML capability that's so poorly implemented it delivers worse results than a simple heuristic would.

Data dependency. Traditional software companies own their code. AI companies depend on their data, and the relationship between data and value is far more complex. Does the company actually own or control its training data? Is the data legally obtained? Is there a data flywheel that improves the product over time, or is the model frozen at whatever quality it launched with?

Technical moat erosion. In 2023, a custom NLP model was a genuine competitive advantage. In 2025, a college student can replicate similar capability in an afternoon with a foundation model API. The question isn't whether the technology works -- it's whether a competitor can reproduce it in weeks using off-the-shelf tools.

Scalability economics. Software scales predictably. AI inference costs can scale unpredictably. A product that's profitable at 1,000 users might hemorrhage money at 100,000 users if nobody modeled the GPU compute costs at scale.

The Assessment Framework

At Kenaz, our technical due diligence for AI companies evaluates five dimensions. Each one can independently kill a deal.

1. Model Quality

This is where most investors want to start, and where the most creative obfuscation happens.

What models are actually used? We classify AI companies into a spectrum:

Foundation model API wrappers -- the company calls OpenAI/Anthropic/Google APIs with custom prompts. Low technical moat, high vendor dependency, but potentially valid if the value is in the workflow and data, not the model.
Fine-tuned models -- the company has taken a foundation model and trained it on proprietary data. Moderate moat, requires real ML capability to maintain.
Custom-trained models -- the company has trained models from scratch or from open-source bases on proprietary architectures. Highest potential moat, but also highest risk of technical debt and team dependency.

None of these are inherently good or bad. What matters is whether the company's positioning and valuation match their actual technical reality. A wrapper valued like a custom model is a problem.

Evaluation methodology. How does the company know their AI works? "The customers seem happy" is not an evaluation framework. We look for defined metrics, systematic benchmarking, A/B testing infrastructure, and honest accounting of failure rates. Companies that can't quantify their model's error rate don't know their model's error rate.

Failure modes. Every AI system fails. The question is how. Does the system fail gracefully? Is there human-in-the-loop fallback for critical decisions? Has the team characterized when and how their models underperform? A company that says their AI "works great" without being able to describe where it doesn't is either dishonest or dangerously naive.

Retraining pipeline. Can the company systematically improve its models? This means data collection, labeling infrastructure, training pipelines, evaluation suites, and deployment processes. A company that can't retrain its own model is one key employee departure away from a frozen product.

2. Data Assets

Data is the most durable competitive advantage in AI, and the most common source of hidden liability.

Provenance and licensing. Where did the training data come from? Is it licensed? Was it scraped? Post-2024, this isn't academic -- lawsuits from content creators, publishers, and data subjects are real and accelerating. We trace data lineage and assess legal exposure.

Data quality processes. Raw data is not an asset. Curated, cleaned, labeled data is. We evaluate data pipelines, quality controls, labeling processes, and how the company handles data drift over time.

Data moat assessment. Is the data defensible? The strongest AI companies have data flywheels -- the product generates data that improves the model that improves the product. A company sitting on a static dataset has a depreciating asset.

Privacy and regulatory compliance. GDPR, CCPA, HIPAA, and sector-specific regulations impose real constraints on how data can be collected, stored, and used for training. We assess compliance not just of the current state, but of the training data history. A model trained on non-compliant data doesn't become compliant when you fix the data pipeline -- the model itself may need to be retrained. Our AI safety and compliance audit covers this in depth.

3. Architecture & Infrastructure

The architecture tells you whether the company can scale, and at what cost.

System design review. We evaluate the end-to-end architecture from data ingestion through model serving. How are models deployed? Is there separation between experimentation and production? Is the system designed for reliability or held together with scripts and prayer?

Scalability assessment. Can the architecture handle 10x current load? What about 100x? We look at horizontal scaling capability, bottleneck identification, and whether the team has actually load-tested their system or just assumes it'll scale because it's on Kubernetes.

Cost structure analysis. Inference costs are the silent killer of AI business models. We model compute costs at scale -- what does it cost to serve each customer request, and how does that scale? Companies relying on large model API calls for every interaction often discover their unit economics are underwater at scale.

Vendor lock-in risk. Heavy dependency on a single cloud provider, a single model API, or a single data source creates existential risk. If OpenAI changes pricing by 3x (which has happened), can the company adapt? If AWS deprecates a service, is there a migration path? We assess concentration risk across the entire technology stack.

4. Team & Process

AI systems are only as good as the team maintaining them. And AI teams are harder to evaluate than traditional engineering teams.

ML engineering capability. Does the team actually understand the models they're running? We've seen companies where the original ML engineer left and nobody remaining can explain the model architecture, retrain the model, or diagnose novel failure modes.

MLOps maturity. Is there a systematic process for model versioning, experiment tracking, deployment, and rollback? Or does deployment mean "someone SSHs into the production server and runs a script"? We assess MLOps maturity on a scale from ad-hoc to fully automated CI/CD for models.

Deployment frequency and reliability. How often does the team ship model updates? How often do deployments cause incidents? A team that deploys quarterly because they're afraid of breaking production has a process problem. A team that deploys daily and breaks production weekly has a different process problem.

Monitoring and incident response. Is anyone watching the model in production? AI systems degrade silently -- model performance drifts as input distributions change, and without monitoring, nobody notices until customers leave. We look for real-time performance monitoring, drift detection, and defined incident response procedures.

5. Regulatory Risk

AI regulation is moving fast, and companies that ignore it are accumulating liability.

AI Act classification. The EU AI Act categorizes AI systems by risk level. High-risk classifications (credit scoring, hiring, medical devices) trigger extensive compliance requirements. We assess where the company's products fall and what compliance work remains. Companies that haven't even started this analysis are behind.

GDPR compliance. Beyond standard data privacy, AI-specific GDPR considerations include the right to explanation for automated decisions, data minimization in training, and the ability to honor deletion requests in ways that actually affect model behavior.

Industry-specific requirements. Financial services, healthcare, insurance, and other regulated industries impose their own AI requirements. We map the company's exposure and assess compliance maturity.

Liability exposure. When the AI makes a wrong decision, who's liable? We evaluate the company's terms of service, insurance coverage, and contractual protections. The AI readiness assessment helps quantify this exposure before it becomes a problem.

Red Flags We See

These patterns appear repeatedly across engagements. Any one of them should trigger deeper investigation.

"Our proprietary model" is GPT with a system prompt. There's nothing wrong with building on foundation model APIs, but misrepresenting this to investors is a valuation problem. If the company's technical moat is prompt engineering, the valuation should reflect that reality.

Training data with no provenance documentation. If the company can't tell you where their training data came from, they either don't know (incompetence) or don't want to tell you (liability). Both are problems.

No evaluation framework beyond "it seems to work." Companies that can't show you quantified performance metrics on defined benchmarks are flying blind. This is especially dangerous in high-stakes domains like healthcare or finance.

Single point of failure architecture. One server, one model, one deployment pipeline controlled by one person. We see this more often than you'd expect, even at companies raising eight-figure rounds.

No monitoring in production. The model was evaluated once before deployment and nobody has measured its performance since. Meanwhile, the input distribution has shifted and accuracy has degraded by 15%. Nobody knows.

The team can't explain their model's limitations. If the ML lead can't clearly articulate where the model fails and why, they don't understand their own system well enough to maintain it.

What Good Looks Like

The AI companies worth investing in share certain characteristics.

A clear data flywheel. The product generates data that improves the model that improves the product. This is the most defensible competitive advantage in AI, and it compounds over time.

Systematic evaluation and improvement. Defined metrics, regular benchmarking, A/B testing, and a clear process for translating evaluation results into model improvements. Not just "we retrain occasionally."

Production monitoring with defined SLOs. The team knows how their models are performing right now, not how they performed during the last evaluation cycle. Service-level objectives are defined, measured, and enforced.

IP that goes beyond prompt engineering. Proprietary data assets, custom model architectures, novel training methodologies, or unique system designs that create genuine technical barriers to competition. The value should be difficult to replicate with a weekend and an API key.

Regulatory awareness baked into architecture. Privacy-by-design principles, audit trails, explainability capabilities, and compliance documentation that reflect genuine engagement with regulatory requirements, not a last-minute checkbox exercise.

Before You Sign the Term Sheet

AI due diligence isn't optional anymore. The gap between what AI companies claim and what they've actually built has never been wider. The market is flooded with repackaged API calls positioning as proprietary AI, trained on data that may trigger lawsuits, built by teams that can't maintain what they've shipped.

The good news: the companies that are doing it right stand out clearly under examination. A rigorous technical assessment doesn't just protect you from bad investments -- it gives you conviction on the good ones.

Kenaz provides independent technical due diligence for investors evaluating AI companies. Swiss-based, no conflicts of interest, no portfolio companies to protect. We also build the systems we evaluate -- our custom AI agent development work means we know the difference between genuine innovation and a demo that falls apart in production.

If you're evaluating an AI investment and want a technical assessment you can trust, get in touch.