Somewhere in those two million documents is the one email that proves your client was set up. You have six weeks to find it. Your opposing counsel already has a team of twenty people and a forensic vendor on retainer. You have yourself, a junior associate, and whatever software you've been using to upload PDFs.
This is not a hypothetical. It is the actual situation in complex litigation — and it plays out every day at law firms that are still treating a terabyte of discovery data like it's a filing cabinet.
The Two Types of eDiscovery
Not all discovery is the same. The industry has quietly split into two very different problems — and the tools built for one don't work for the other.
Drag-and-Drop Discovery
Upload your documents, run keyword searches, tag and review. Fast to set up, works well for contained matters — a contract dispute with a few hundred emails, a simple employment case, a small vendor disagreement. You know roughly what you're looking for. You upload, you search, you find it.
Massive-Volume Discovery
You're looking at millions of documents. Years of communications across multiple systems — email, Slack, Teams, internal databases, call logs. You don't know exactly what you're looking for yet. You know there's a pattern. You need to find it. Keyword search misses too much. You need every relevant signal from every document — not just the ones that happen to contain the right phrase.
The eDiscovery market has spent the last decade solving problem one. Dozens of tools have emerged to make drag-and-drop document review fast, affordable, and accessible for small to mid-size matters. And they do it well.
Problem two is harder. It's also where the cases are won or lost.
What "Massive Volume" Actually Means
The Enron litigation involved over 500,000 documents. The Epstein files released by the DOJ ran to hundreds of thousands of pages across multiple document sets. Major securities fraud cases routinely produce terabytes of electronically stored information — emails, spreadsheets, trading logs, internal communications — that must be analyzed for patterns that no individual reviewer could spot manually.
Industry data from the Winter 2026 eDiscovery Pricing Survey reflects the scale of the problem. Data processing at ingestion runs $25–$75 per gigabyte for most matters. By the time data is processed to a reviewable state — after culling, deduplication, and filtering — completion-stage costs can exceed $150/GB. Hosting with analytics adds another $15–$25 per gigabyte per month. Project management for complex matters runs $100–$200+ per hour, with 26% of survey respondents reporting senior-level rates above $200/hour.
For a 500GB matter — not large by corporate litigation standards — that's hundreds of thousands of dollars in infrastructure and vendor costs before a single attorney has reviewed a single document. And that assumes you found the right documents to review.
The hidden cost of inadequate discovery isn't the vendor bill. It's the evidence you never found because your tools weren't built to find it.
Why Keyword Search Fails When It Matters Most
In straightforward matters, keyword search works. You know the names, the dates, the terms. You search for them. You find them.
In complex litigation — fraud, securities violations, whistleblower retaliation, antitrust — the misconduct is rarely labeled. People don't write emails that say "here is our plan to retaliate against the whistleblower." They use coded language. They have conversations that only make sense in context. The pattern is visible when you step back and look at the full body of communications. It's invisible if you're searching for specific phrases.
Consider a realistic scenario: a senior analyst at a trading firm, Marcus Webb, notices a pattern — trades being placed just before material announcements, consistently profitable, across multiple accounts. He reports it internally. Within weeks, his performance reviews change tone. He's excluded from meetings. A paper trail starts to build against him.
Overstand in Action
Who
Sarah Kim, a senior litigator at a boutique employment law firm representing Marcus Webb in a whistleblower retaliation matter
The Data
40,000 emails spanning two years — internal communications between Marcus, his direct managers, HR, compliance, and executive leadership at the firm
The Problem
Sarah knows there's a pattern of retaliation starting after Marcus filed his internal complaint. The misconduct is never labeled as such — it's buried in tone shifts, changed characterizations, and behind-the-scenes coordination across departments. Keyword search returns four irrelevant documents.
The Query
"Show me all communications between Marcus's direct managers and HR in the six months following his internal complaint — and flag any that show a change in tone or characterization compared to prior periods."
A keyword search for "retaliation" returns four documents. None of them are relevant. The actual pattern — managers discussing how to "manage the Marcus situation," sudden changes in how his work was described, coordination between HR and compliance before he was ever formally reviewed — is invisible to keyword search because none of those emails use the word "retaliation."
The query Sarah needs to run isn't a keyword. It's a question. And it's the kind of question that Overstand can actually answer.
The proof is almost always there. The question is whether your tools can find it — or whether you're searching for it with a flashlight when you need a floodlight.
What the Market Looks Like Today
The eDiscovery market has two distinct tiers.
At the lighter end, you have platforms built around document upload, keyword search, and collaborative review. They're designed for legal teams that need to move quickly through a contained set of documents. They're fast, relatively affordable, and accessible. For smaller matters, they're exactly the right tool.
At the heavier end, you have enterprise platforms — Relativity, Everlaw, and their peers — built for large law firms and corporations managing massive document corpora. These platforms are powerful, but they're built around the assumption that you have a team of reviewers, a project manager, and significant infrastructure to support them. They're also priced accordingly, and they still fundamentally rely on reviewers doing the work of finding what matters.
Neither tier solves the core problem: making a large, messy corpus of documents actually queryable in the way a skilled investigator would want to query it.
That's where large-scale discovery has been stuck. Until now.
How Overstand Handles the Heavy End
Overstand doesn't replace document review. It changes what review means.
The process starts with integration. Overstand builds what we call a unified data foundation — pulling together every source involved in the matter: emails, messages, call transcripts, internal databases, HR records, whatever the corpus contains. That data is then made queryable in natural language, with reasoning across the full body of information rather than keyword matching against individual documents.
- Thorough, not targeted. Every document is analyzed. Nothing is skipped because it didn't match a keyword. If a document is relevant to what you're investigating, Overstand can surface it — even if it doesn't contain the terms you thought to search for.
- Patterns, not just documents. Overstand can identify how communication patterns changed over time, who was talking to whom and when, and what changed before and after key events — the kind of analysis that takes teams of reviewers weeks to piece together manually.
- No vendor intermediary. Traditional large-scale discovery required a forensic vendor to process and structure the data before attorneys could even begin reviewing it. Overstand handles the integration directly. Your team starts working with the data in minutes, not weeks.
- Legal judgment, amplified. Overstand surfaces evidence. Attorneys interpret it. The tool doesn't replace legal expertise — it makes that expertise dramatically more effective by ensuring the relevant evidence actually gets in front of the people who know what to do with it.
We built the Epstein files into a searchable corpus. Hundreds of thousands of pages, integrated and queryable in natural language, with reasoning across the full document set. What previously required manual review teams and specialized vendors became something anyone with a legal question could investigate directly.
That's the capability. Applied to litigation — to a whistleblower case, a securities fraud matter, an antitrust investigation — it changes the calculus entirely.
Which Type of Discovery Do You Have?
If you're dealing with a few hundred documents, a focused dispute, and a clear set of terms to search for — drag-and-drop tools will serve you well. They exist for exactly this scenario.
If you're dealing with a large corporation's worth of communications, years of data across multiple systems, a case where the misconduct is buried in context rather than labeled in keywords — that's a different problem. The tools matter. The comprehensiveness of your search matters. Whether you find what's actually there matters.
The cases that hinge on large-volume discovery are also the ones that matter most. They involve fraud, retaliation, systemic wrongdoing. The evidence exists. The question is whether you can find it.
Overstand is built for the cases where you can't afford to miss something. Where the evidence is in the data, and your job is to bring it to the surface.