About this project

This project aims to digitise and index all American Caving Accident reports published by the National Speleological Society.

Project goals

History

ACA reports, which date back over 50 years, have historically only been available in a journal format, which in recent years has been published online as a PDF. The PDFs are not searchable and the data is not indexed in any way, making it difficult to find information on specific incidents or to perform any kind of analysis on the data.

Before the creation of this website, NSS volunteers generously donated their time to compile a spreadsheet containing an index of every incident report ever published. This spreadsheet was then used as the basis for the database which powers this website.

The primary function of this website is to provide a user-friendly interface for volunteers to validate and format the data extracted from the spreadsheet and PDFs and make it ready for publication online. This involves entering data in specific metadata fields attached to each incident record and reviewing the data to ensure it is consistent and properly formatted.

Workflow

The process of digitising the collection of incident reports requires a significant amount of human intervention. This website has been designed to allow volunteers to complete the various tasks required quickly and easily with minimal time commitment. If a team of people can each spend just ten minutes a day working on this project then we can make significant progress towards our goal.

When this website was created, incidents were originally added manually and processed almost entirely by hand. We soon upgraded to a system which did the bulk of the processing using artificial intelligence, vastly reducing the amount of volunteer time required to process each incident. The current workflow is as follows:

  1. Data is extracted from the ACA Journal PDFs using OCR software.
  2. AI analysis is performed on the OCR data to extract the incident report and metadata.
  3. The incident is published on the website with a "data incomplete" warning.
  4. Volunteers review the AI analysis and correct any errors.
  5. The incident is published on the website with an "approved" tag.

Source code

This project is open source. Anyone interested in contributing to the development of this website can find everything they need over on https://github.com/anorthall/incident-db.