Predictive coding: Is technology the answer to disclosure?

Dominic Tucker, Senior Consultant at Anexsys Ltd, considers how technology can assist and improve the disclosure process, where previously keyword searching and linear review was adopted as the default approach.

E-discovery

The objective of review in e-discovery is to identify as many relevant documents as possible, while reviewing as few non-relevant documents as possible (Da Silva Moore). This is known as achieving the highest possible recall (proportion of all relevant documents identified during a review) and precision (proportion of relevant documents within the reviewed set).

Although keywords are inherently biased thereby naturally excluding a proportion of relevant documents or necessitating the review of increasing volumes of irrelevant documents, lawyers have been relatively slow to adopt alternative approaches.

However, predictive coding, a technology which automates portions of an e-disclosure document review, is now starting to gain popularity in the UK as an approach to disclosure.

What is predictive coding and how does it work?

Predictive techniques are commonly applied to analyse data in order to assess risk and make future predictions. They are not unique to the legal world - common everyday uses include credit scoring, fraud identification and risk underwriting and they have been widely adopted in a variety of industries including accountancy, insurance, banking, financial services, pharmaceuticals and healthcare.

Predictive coding systems apply complex algorithms which, based upon their analysis of review decisions, identify similar documents which are prioritised for review.

In doing so, they aim to limit the review of irrelevant documents and enable relevant documents to be captured as efficiently as possible, thereby improving recall and precision.

A predictive coding exercise typically begins with a senior lawyer training an algorithm by reviewing a ‘seed set’ of example documents.

The algorithm analyses the characteristics of these documents, learns from the lawyer’s decision making and thereafter seeks to identify similar documents and rank them by their likelihood of relevance.

The most highly ranked documents can then be prioritised for review. This review continues until the system fails to return any further relevant documents or when the proportion of relevant documents becomes so low that continuing the review becomes disproportionate.

As in any disclosure exercise, a predictive coding methodology should be supported by an appropriate validation and quality checking regime so that decision making can be justified and each stage of a project independently verified.

Comprising the seed set

How the seed set should be comprised is up for debate, as is the length of time that should be taken to train the algorithm and the extent of any quality control regime that should be adopted in order to validate the process.

In comprising the seed set, predictive coding in its most straightforward form will focus upon a randomly generated set of documents.  No keywords are run and the system is left to present example documents to the senior lawyer unhindered by bias.

Relying on a randomly generated set of documents as the starting point is sometimes a step too far for most lawyers and could be seen as a blind leap of faith in the algorithm.  The risk being that it appears more difficult to validate how the algorithm has been trained and from project to project its ability to stand up to scrutiny is uncertain.

As such, some lawyers are preferring to adopt a middle-ground ‘hybrid approach’ where the seed set is comprised of a mixture of keyword responsive, other searches and randomly selected documents.

Advantages of Predictive Coding

  • Senior lawyer engagement at an early stage in the process
  • Improved Accuracy
  • Fewer irrelevant documents reviewed
  • Higher proportion of relevant documents identified
  • Faster access to the most relevant documents
  • Lower costs

Disadvantages of Predictive Coding

  • Questionable ability to deal with multiple-issues or degrees of relevance
  • Training by under confident or techno-phobic users may undermine the process
  • Questionable ability to cope with the evolution of relevance throughout a review
  • Won't eliminate the problem of a rogue reviewer
  • Questionable ability to deal with documents containing little or no text
  • Questionable ability to cope with foreign language documents

Further Information

Subscribers to Lexis®PSL Dispute Resolution can read Dominic Tucker’s full and more in depth analysis of Predictive Coding including of the hybrid approach recently endorsed by the Irish High Court in Irish Bank Resolution Corporation Ltd & ors v Quinn & ors [2015] IEHC 175 here.

Subscribers can also read his analysis of the ways in which and the challenges the Swiss investigators may face in seeking to  manage the data collected in their investigations into Fifa’s alleged corruption.

Sign up for a free trial here if you are not a subscriber and would like to read that full analysis.

Dominic Tucker is a Senior Consultant at Anexsys Ltd, a leading provider of outsourced eDisclosure and litigation support services to law firms, corporations and government departments.

 

Relevant Articles
Area of Interest