Case Study

The Smaller Mountain: Domain and data analysis with predictive coding to identify required materials

View All

A corporation asked Buckley to handle a broad state investigation focused on specific customer communications. More than 70 million emails had to be searched in less than a month to locate the required materials.


  • Type:

    State Investigation

  • Industry


  • Duration

    4 Weeks

  1. .


    A state attorney general issued a broad subpoena seeking the production of all email communications with select consumers. The company collected emails from over 100 employees who held a similar position within the company, resulting in over two terabytes of data (6 TB after data decompression) — or over 70 million documents. The company had less than a month to collect, search, review, and produce relevant documents.

  2. .


    Rather than process and search emails of 100+ custodians for relevant correspondence at a tremendous cost to the client, the FORTÉ team, in close collaboration with the case team and client, identified three key, representative custodians, and used their emails as an initial sample set. After deduplication, the resulting population for the three sample employees consisted of well over two million documents , which were then segregated into internal and external communications based on email domain. The segregation of the external domains allowed us to identify potential email addresses in addition to the email addresses the company already had on file for the relevant consumers. This process resulted in roughly 20,000 potentially relevant documents. We then used predictive coding, based on a review of statistical samples, to identify relevant and key communications from among the resulting document set. In the end, the team reviewed under 4,000 documents, of which it produced approximately 2,000. The attorney general eventually agreed to forego entirely the processing and review of additional employees’ emails.

  3. .


    A tiered sampling approach of identifying representative custodians, followed by using predictive coding within the emails collected from them, resulted in significant cost savings for the client. Domain name analysis further narrowed the focus to zero in on potential communications with consumers beyond email addresses already known to the company.

Through sampling and analysis, the case team only had to focus on 2.8 percent of the documents collected and processed.


Are you ready to love e-discovery?

Introducing FORTÉ, the better way to do e‑discovery.

Let's Talk