Empirical Caution: A Lesson from Auto Title Loans
A few weeks ago there was some nice discussion about Jim Hawkin's article on fringe banking. Natalie questioned whether Jim's assumptions about payday lending correspond with empirical reality. Similarly, it's worth pointing out that the data Jim relies on regarding auto title lending aren't what he or even his source thought they represented.
I make this observation not to ding Jim's paper, but to raise a really troubling problem for all academics: how to deal with data from other scholars' empirical work?
I don't think that Jim's paper is particularly problematic in this regard. It isn't. Jim wrote a very provacative, ballsy piece. I disagree with some of this conclusions (mainly because of how he conceives of financial distress--I think it has to relate to consumption), but academically, his article is a success, as it got people thinking, and anyone writing about fringe banking will have to consider Jim's arguments. But because Jim's paper is a critical synthesis of a lot of existing, and often feuding empirical scholarship (especially on payday), it puts the question of whose numbers can you believe front and center.
To illustrate: Jim's paper cited figures from another scholar's paper that indicate there are very low default and repossession rates on auto title loans. These low default rate figures gave Jim comfort that auto title lending isn't predatory loaning-to-owning, but is instead a reasonable fringe banking product.
These default and repo figures seemed surprisingly low to me (although I only caught this on a second reading of the article), so I asked the author of the source publication how the rates were calculated. The author explained that the figures were provided by industry sources and made inquiries about their calculation. What I learned was that the default rates that get reported by the auto title lending industry are the average of the percentage of loans that default in a given month. So the default rate is the number of defaults divided by the number of loans outstanding.
The problem with this is that an auto title loan is typically a one-month loan that can then be rolled over into a new loan. The result is that the denominator (number of loans outstanding) is inflated relative to the number of actual borrowers. If one were to look at rolled-over auto title loans as single loans, rather than a stream of one-month loans, the annual default rate would be significantly higher (many loans roll over 4-5 times).
So here's our situation: one scholar was founding a theoretical enterprise on figures that came from another scholar, that in turn came from industry without clear explanation of what they represented. Is it reasonable to expect the scholarly end-user to trace back the source to origination? Perhaps on critical issues, yes. But it isn't realistic to ask scholars to do forensic histories of ever figure they cite. The end-user scholar has to be able to rely on the primary source scholar. (Sort of a good faith purchaser defense.) If the primary source scholar is too trusting of his source, there's a problem. The auto title lending figures aren't a critical issue in Jim's paper. I only looked into them because they seemed weird to me, but that was only on a second read of the article.
Without working through the raw data and duplicating the manipulations and analysis, one is left to rely on other scholars' presentations of their data, which even when done in good faith can lack in either clarity or competence or critical analysis. While trust but verify might be the right approach, real diligence just isn't practical. Unless a number seems so off or without foundation as to be ridiculous or patently controversial, its rare for scholars to really dig into the details of other scholars empirical claims. But without doing so, we can end up drawing conclusions from data points we might not fully understand.
I really don't know how we are supposed to address this situation as scholars, but I worry that I, like many others, am ultimately taking a reputational gamble every time I cite a statistic without an enormous caveat about its quality.
Comments