The United States Senate does not maintain a centralized repository for information about its committee proceedings that includes links to the videos. Our new website, https://www.senatecommitteehearings.com/, is our effort to address this issue and surface video from committee proceedings.
Background
The Senate publishes basic information about all upcoming meetings on this webpage. It is only a forward-facing website, so it does not contain an archive for information about meetings that have occurred. Nor does it contain detailed information about the proceedings, such as the name of witnesses, the written testimony they have submitted, and — crucially for our purposes — links to videos of the proceedings. Summary information is also published in the Congressional Record, including the names of witnesses, but it is not in a web-friendly format and does not include witness statements or links to videos of the proceedings.
The Library of Congress is now publishing a weekly committee schedule across the whole of Congress, but it also does not include the names of witnesses, testimony for the record, or links to video of the proceedings.
Each individual Senate committee publishes information about its ongoing proceedings. The information generally is more complete: it usually includes witness names, written testimony and video of the proceedings. However, information is spread out across more than 20 committees, the information is not published in a standard format, and it is difficult to search for prior committee proceedings or across committees.
Our Approach
We built a new website, https://www.senatecommitteehearings.com/, that merges information that we have scraped from each Senate Committee page and an API from the Government Publishing Office that contains committee proceedings information going back to the late 1990s, to early 2000s, depending on the committee. The code for the scrapers is available here.
This process is not perfect, however. We are unaware of a consistent unique identifier for events used across GPO’s API, committee proceeding information published on individual Senate committee pages, the central Senate committee schedule, and Congress.gov’s website. So we attempted to match proceedings by using the name of the hearing. This is an inherently lossy and unreliable process, limited by the fact the names of the hearings on the Senate website and within GPO do not always match. Only 25% of hearings collected from Senate committee websites could be connected to GPO data (3,495 out of 14,185 hearings collected initially in early September). Issues arose from differences in punctuation, additional text on the GPO side, and sentences containing the name of the hearing on the GPO side. We had significant difficulty combining authoritative GPO data with the Senate data. Despite these challenges, the more important task was completed, the scraping of the various Senate Committee sites to create a central website to search hearings in.
The end result is a searchable database that allows you to search by committee or hearing name or witness and find video of the proceedings. Aggregating the data also allowed us to provide basic statistics such as “how many hearings occurred last week?” or “What is the trend in hearings per year for a particular committee?”
It is our hope that this new dataset will inspire the Library of Congress, which maintains the calendar of committee proceedings on Congress.gov, to add links to videos of Senate committee proceedings as they already do for the House. The Library of Congress should serve as an authoritative repository of committee hearing video and re-publish the video in a more useful-friendly repository.
We also hope the Senate will consider deepening the functionality of its committee calendar website to include links to the videos of committee proceedings, to publish information online about past Senate proceedings, and gather witness statements and other documents from ongoing and previous proceedings. The House’s website, docs.house.gov, which maintains a non-partisan central repository for committee information, may be a useful model to consider.
Finally, we are hopeful that third-party republishers of legislative information, commercial and non-commercial, will use and improve upon these data-sets to make video of congressional proceedings more widely available to the public, to researchers, and to advocates.