Large Scale Legal Document Classification
Significant automation of the company's most labour intensive activity.
Significant automation of the company's most labour intensive activity.
Goal
Create a machine learning engine that will automatically classify each legal document across a few different dimensions based on information extract from the document.
Impact
Significant automation of the company's most labour intensive activity. Model improved the classification metrics compared to the human benchmark while reducing the time to classify documents by orders of magnitude.
Every month the model would free up: 1 additional FTE
Problems
Correctly classify a document on 4 dimensions including case type and winner.
Documents have undergone OCR and are machine readable. The documents can be written in multiple languages and come from a variety of courts and legal systems across the world.
Solution
A model factory that trains a model per language/court/class dimension that’s fine tuned to a specific combination of legal system, language, etc.
Technical Highlights
This model factory automatically optimizes hyperparameters of the whole MLpipeline, estimates the performance and performance stability, adjusts manual review flag thresholds to ensure that minimal performance is achieved, trains the model and prepares it for deployment.