This is some text inside of a div block.
 min read

Large Scale Legal Document Classification

Darts-ip, an Intellectual Property Legal Information Platform, provides information about IP-cases cases from around the world. The core internal activity company is processing, labelling and extracting information from legal documents related to IP.

Photo by lilartsy on Unsplash

Goal

Create a machine learning engine that will automatically classify each legal document across a few different dimensions based on information extract from the document.

Impact

Significant automation of the company's most labour intensive activity. Model improved the classification metrics compared to the human benchmark while reducing the time to classify documents by orders of magnitude.

prophecy labs infographic

Every month the model would free up: 1 additional FTE

prophecy labs infographic
prophecy labs infographic

Problems

Correctly classify a document on 4 dimensions including case type and winner.

Documents have undergone OCR and are machine readable. The documents can be written in multiple languages and come from a variety of courts and legal systems across the world.

Solution

A model factory that trains a model per language/court/class dimension that’s fine tuned to a specific combination of legal system, language, etc.

Technical Highlights

This model factory automatically optimizes hyperparameters of the whole MLpipeline, estimates the performance and performance stability, adjusts manual review flag thresholds to ensure that minimal performance is achieved, trains the model and prepares it for deployment.