Document Cloud

Via ProPublica

Goal

To create a public, easily searchable index of original source documents on the Web

Source documents are the foundation of investigative journalism. Yet, once used, the documents are often either locked away or thrown away. They are lost to further inquiry. In a few cases, source documents are placed on the Web. But those are in a variety of formats, difficult to search.

This grant will create a new way to find, search, annotate and share documents among news organizations or individuals. It will be tested in New York in a partnership with the Gotham Gazette. The grantee will create an open standard for describing and sharing information about source documents. The project will take source documents beyond the limited search capabilities of the PDF format and make them an intrinsic part of the searchable Web. The software will be released as public, open-source code for use by others.

We expect this project to make sharing and finding source documents easier through “fielded metadata,” indexing and other content notes about the documents provided by web users.

This grant is being made to DocumentCloud, a new nonprofit formed by employees of ProPublica and The New York Times. But because it did not have official 501c3 status at the time the grant was made, ProPublica, the nation’s largest new nonprofit investigative reporting organization, agreed to serve as fiscal agent. Ultimately, we hope the project allows more people to be able to mine more documents than before, encouraging new stories, citizen participation in the investigative process and greater government openness.

Project Team

ProPublica