AI Models on Law School Exams
The question of how well AI can do on law school exams is one that interests me, since I give exams and want them to be a measure of how much my students have learned (as opposed to their skills at using AI -- although I want them to learn those too). Others appear to be interested too -- just see the ssrn downloads for papers on this topic. Caveat: I can't pretend that I have more than the shallowest of understandings of AI models. But this cool new paper I came across might be of interest to folks.
The paper is from a set of scholars at ETH in Zurich (a place long known for its excellent research). As I understand the draft (and, to repeat, I don't understand a lot of this stuff), the paper finds that the large language models (LLMs) don't do that great when you increase the level of reasoning required on the exam. I was also intrigued to read (I think) that LLMs are not necessarily better on multiple choice exams than essay type ones. From the abstract, here is a sentence that stood out: "Our evaluation on both open-ended and multiple-choice questions present significant challenges for current LLMs; in particular, they notably struggle with open questions that require structured, multi-step legal reasoning".
The paper is "LEXam: Benchmarking Legal Reasoning on 340 Legal Exams"
Among the other cool things about this paper to me is how collaborative it is -- students, professors, and even judges. To me, it reflects well on the culture of the institution that has such a degree of collaboration.
Comments