Previously one could only access the records contained with SCOS in Scotland and only with arcane name and subject indexes. SCOS, partnering with the University of Edinburgh, is digitizing and describing the entire collection of 58 linear feet of documents housed at the University of Virginia Law Library and the Law Library of Congress.
To do so, the UVA Law Library uses Drupal, an open-source content management platform that is easily customizable and is supported by a community of over one million users who establish best practices and keep Drupal compatible with cutting-edge Web technologies. To add content to our database, we utilize two concurrent workflows, description and digitization. As description is a slower process than digitization, it allows for scanning to proceed unhindered by description. Using this method authorized users need only an internet connection, allowing us in the future to expand the database to include Court of Session Papers held in collaborating archives around the world and to create an increasingly comprehensive finding aid for discovering these rich materials. Important as well, these users contribute using a web-based interface most immediately feel comfortable with.
The descriptive phase of this project creates a comprehensive item-level catalog of this collection. We customized all metadata fields to capture the elements of these Court documents that most directly serve the needs of researchers and make these documents discoverable, most especially biographical and geographic information. This work included building local fields for geographic data and working with the Gazetteer for Scotland online encyclopedia to incorporate place name information from that resource. For individuals named in case documents, we have searched name authority files, namely at the Library of Congress, and we have included this linked data in our metadata. We are also partnering with Social Networks and Archival Networks cooperative (SNAC) from UVA’s Institute for Advanced Technologies Humanities and the National Archives. SNAC is a name-authority tool for archiving biographical or socio-historical contexts within and across archival collections. Within our own database we Use Drupal’s “linked entities” capabilities, and create nodes for individual people and places and then linked these entities to the cases and documents in which they appear, thus increasing researchers’ ability to see patterns and find materials in this collection. Project metadata meets the standards of Qualified Dublin Core (QDC). Combined with specialized vocabularies and custom metadata fields, this covers all our necessary metadata description needs as well as offering relatively straightforward semantic interoperability, which allows for our digital objects and their corresponding metadata to be accessed by other websites and web-based data aggregation tools.
This project produces high-resolution digital images of all Court of Session Papers in the UVA Law Library collection and links these images, along with OCR-created text files and searchable PDFs, to descriptive metadata. All manuscript pages are scanned and preserved as 400 dpi TIFF image files, and all oversize maps and architectural drawings will be scanned and preserved as 600 dpi TIFF image files. [click] master files are stored in UVA preservation repository. After scanning, project staff run the images through Optical Character Recognition (OCR) software that has been trained to recognize the documents’ contemporary fonts and ligatures. Project staff then review the recognized text alongside the scanned image and do a quick, diplomatic cleaning of the OCR with a focus on names, places, and legal terms. Text files created as part of the digitization process will be entered into the project’s SOLR index to build a robust search feature for the entire collection. The digitized records are inputted into Drupal where they are matched up with the descriptive workflow I laid out earlier.
The project team leveraged its constortium partnerships to make its Session Papers digitally available and fully text-searchable. SCOS relies on International Image Interoperability Framework (IIIF) technology to display its documents. Although digitized at UVA, the images are hosted on a server at the University of Edinburgh’s Centre for Research Collections (CRC) and sent to SCOS using IIIF. Permitting the CRC to host the images enables the CRC’s digital research team to apply a newly developed Optical Character Recognition (OCR) algorithm to each digital page and read it with a high degree of accuracy. This enhances SCOS’s search functionality and makes it possible for researchers to conduct large-scale text analysis on the corpus. IIIF is at the heart of the exchange of images and metadata between the University of Edinburgh and SCOS. The consortium's use of IIIF demonstrates the power of this framework, as we use the underlying data created through the submission to LUNA which is essential for getting the pixels onto an powerful, flexible image viewer, but in SCOS’s case, we supplement the metadata presented through IIIF with metadata provided by our interpretive process and localized is to our project’s needs.
Additionally, project digitization and OCR scanning creates a corpus of text files derived from the entire collection that can be downloaded in its entirety or filtered by various criteria such as case, date, or document type for selective download according to user needs. This feature will allow the collection to be run through computer-assisted language analysis.
In summary, by aggregating these digital files with project metadata and uploading them to the project database, researchers will be able to discover these documents through multi-modal entry points such as name, date, location, and full text searches as well as interpretive pieces such as blog posts, essays, and lesson plans . Project materials will be discoverable through a number of additional portals, including the online catalog for all UVA libraries (Virgo), WorldCat, ArchivesGrid, and web search engines like Google. Through a new collaboration recently arranged with the British and Irish Legal Information Institute (BAILII.org), a web portal for open access to British, Irish, and Scottish case law, the BAILII website will include links to our project and, in the future, direct users to digital copies of case documents available on our website.