Sh*t In, Sh*t Out? the Problem of Mortgage Data Corruption & Empirical Analysis
Empirical economic analysis is a powerful tool. It can elucidate correlations and sometimes even get us to causual explanations. But it has a serious weak-spot: its value is entirely dependent upon the integrity of the data analyzed. To put the problem succinctly: sh*t in, sh*t out.
This brings us to analyses of the housing bubble. There's a sizeable academic literature on the housing bubble (and relatedly also expert witness reports on loss causation in MBS litigation) that rely on loan-level data. The problem is that a lot of that loan-level data is suspect. That should hardly be a surprise: the industry even referred to some products as "liar loans". And there were also FBI Mortgage Fraud reports indicating an uptick in mortgage fraud. But it was easy for economists to ignore the data integrity problem as long as the problems were merely anecdotal (e.g., the mariachi musician with the six-figure income), and could be blissfully assumed to only affect a small number of loans.
No longer. It's hard to show mortgage fraud empirically, but there's a growing empirical literature about mortgage fraud. There are now a couple of academic studies demonstrating significant inflation of borrower income on loan applications (here and here and here and here and here). (To be clear, this does not mean that the income was inflated by the borrowers. It could be inflated by either borrowers or lenders, including loan brokers.) There's also a Fitch Ratings report from late 2007 that shows questionable stated income, employment, FICO scores, property occupancy status, and appraisals on a large percentage of a small sample of subprime loans.
I want to emphasize that this literature does not undermine all empirical work on the housing market during the bubble years. But it should give us pause when considering any analysis that relies on either loan-level or pool-level loan characteristics such as income, DTI, FICO, occupancy status, and LTV/CLTV. I suspect that the empirical mortgage fraud literature will not deter many economists from plowing ahead whenever their data produces a regression with statistical significance. And the studies might well be right in the end. But it should tell the rest of us to consume the studies with a grain of salt.