Page content

Legal work often involves searching through large quantities of text for specific information; such dependence on textual research is a common feature of many disciplines. Artificial intelligence is a promising tool for conducting such large-scale searches, but many existing AI-based tools are highly specialised, and are only intended for use with particular types of content, such as contracts. Moreover, as documents change over time, a tool adapted for use with contemporary data may be less useful for working with older material. One possible avenue for improvement involves building on insights from the discipline of theoretical linguistics, which not only makes frequent use of large textual data sources but uses the data obtained to understand the mapping between form and meaning, and how such relationships change over time.

This project attempts to build on previous research both in legal tech and in linguistics, to develop a system for Retrieval Augmented Generation (RAG), a technique that allows AI models to extend beyond their initial traning in order to search and interpret new data. The aim is to develop a system that will be flexible enough to handle data from a variety of time periods and allow multiple aspects of the data to be queried using natural language. Such a system would build in new ways upon previously developed specialised tools, such as Legal-BERT, and integrate them so that users could make use of their multiple specialisms through a single, unified interface.

The system envisaged will be able to query data from multiple sources and in multiple formats. One rich data source currently being used for testing purposes is the Old Bailey Online, which contains proceedings of trials from 1674 to 1913; this corpus spans more than two centuries, a period during which there were numerous changes both in the nature of legal proceedings and in the language used to represent them, and has previously been used to study topics ranging from grammatical change to the evolution of concepts and ideologies. At a later stage other data sources will be added, to provide a more diverse range of topics, potentially including material such as law reports and contractual data.

The ultimate aim of this project is a tool that will be of use both to practising lawyers, for research in areas such as case law, and for academic research in fields ranging from legal history to cultural history and linguistics. To this end, the development process will incorporate feedback from users in different disciplines and ensure that relevance and ease of use are prioritised throughout.

Project research team