Identifying Earmarks in Congressional Bills
Vrushank Vora*, Data Science for Social Good; Joe Walsh, Data Science for Social Good; Madian Khabasa, Microsoft ; Ellery Wulczyn, Wikimedia Foundation; Matthew Heston, Northwestern University; Rayid Ghani, University of Chicago; Chris Berry, University of Chicago
Earmarks are legislative provisions that direct federal funds to speciﬁc projects, circumventing the competitive grant-making process of federal agencies. Identifying and cataloging earmarks is a tedious, time-consuming process carried out by experts from public interest groups. In this paper, we present a machine learning system for automatically extracting earmarks from congressional bills and reports. We ﬁrst describe a table-parsing algorithm for extracting budget allocations from appropriations tables in congressional bills. We then use machine learning classiﬁers to identify budget allocations as earmarked objects with an out of sample ROC AUC score of 0.89. Using this system, we construct the ﬁrst publicly available database of earmarks dating back to 1995. Our machine learning approach adds transparency, accuracy and speed to the congressional appropriations process.
Filed under: Classification