Judges are beginning to use large language models like ChatGPT to interpret legal texts. This Note examines whether they should do so. Prior studies testing LLMs as legal interpreters use survey responses as benchmarks for performance. I offer the first study comparing LLM interpretations to real-world judicial decisions. Across eight Ninth Circuit cases, I test whether GPT-4 Turbo (a model of ChatGPT) correctly identifies legal text as ambiguous or unambiguous. I find that ChatGPT’s assessments diverged from the court’s determinations 50% of the time. I then advance a novel argument: judicial reliance on LLMs may constitute improper ex parte communication under current judicial ethics rules.
Beyond Words: The Risks of Generative Interpretation
- Jonathan Scher*
- AI, Artificial Intelligence, Large Language Models, Technology
- Postscript
Like this article?
Share on Facebook
Share on Twitter
Share on Linkdin
Share on Pinterest