Ed. Note: This blog was originally posted on the Biodiversity Heritage Library blog. To view the original post, click here.
We’ve spent a fun-filled week exploring the history, art, and science of gardening with our Garden Stories event. Seed and nursery catalogs and lists played a starring role in our campaign, allowing us to explore the world of gardening through the instruments that informed, documented, shaped, and transformed the industry.
As our journey this week has demonstrated, seed and nursery catalogs and lists allow us to trace the development of the seed industry, agriculture, and the home garden, documenting the rise, decline, and development of new plant varieties and prices; changing agricultural and printing technologies; the individuals who shaped the industry; the evolution of garden fashion and landscape design; the introduction of chemical agents for insect and weed control; early methods of cleaning, preserving, and shipping seeds; and cultural and social dynamics such as the effects of and reactions to scientific advancement, global wars, and the shifting roles of women in society and business.
Because of their cultural, historic, and scientific importance, many BHL partners are engaged in a variety of projects to digitize and improve access to the seed and nursery catalogs and lists in their collections. As we wind things down in our Garden Stories event, we invite you to explore the exciting world of seed catalogs in the Biodiversity Heritage Library consortium.
Digitizing One of the Largest Seed Catalog Collections in America
Started in 1904 by USDA’s first economic botanist, Percy Leroy Ricker, the National Agricultural Library’s (NAL) Henry G. Gilbert Nursery and Seed Trade Catalog Collection consists of over 200,000 American and foreign catalogs. The earliest catalogs date from the late 1700s, but the collection is strongest from the 1890s to the present.
As one of NAL’s most frequently used collections with an appeal to a wide-ranging audience, the Nursery and Seed Trade Catalog Collection was a natural candidate for digitization. In 2013, NAL began digitizing the collection with Internet Archive which operates a scanning center at NAL. As of February 2015, NAL has cataloged all of its U.S seed catalogs through 1923 and digitized over 13,000 seed catalogs, including all of its U.S. catalogs through 1906 as well as its entire collection of catalogs from long-established firms such as Peter Henderson & Co., and woman-owned firms such as Miss Ella V. Baines. The Nursery and Seed Trade Catalog Collection will remain a focus of NAL’s digitization work with Internet Archive for the foreseeable future. Started in 1904 by USDA’s first economic botanist, Percy Leroy Ricker, the National Agricultural Library’s (NAL) Henry G. Gilbert Nursery and Seed Trade Catalog Collection consists of over 200,000 American and foreign catalogs. The earliest catalogs date from the late 1700s, but the collection is strongest from the 1890s to the present.
Soon after NAL’s catalogs became available in Internet Archive, BHL added them to its own Seed and Nursery Catalogs Collection. In 2014, NAL formally became a BHL affiliate and the two institutions began working on a standardized process for ingest of NAL seed catalogs (and other relevant digitized materials) into BHL.
Digitizing to Improve Access and Discoverability
In 2013, the Biodiversity Heritage Library engaged in an ambitious project to explore the applicability of purposeful gaming to tackle a significant challenge for digital libraries today: poor output from Optical Character Recognition (OCR) software. OCR allows a computer to “read” the text on a digitized page and produce a searchable text file for each page image that allows users to more easily discover content relevant to their needs.
Led by the Missouri Botanical Garden’s Center for Biodiversity Informatics (CBI) and in partnership with Harvard University, Cornell University, and the New York Botanical Garden, the Institute of Museum and Library Services (IMLS)-funded project, Purposeful Gaming, will demonstrate whether or not digital games are a successful tool for analyzing and improving outputs from OCR and transcription activities because large numbers of users can be harnessed quickly and efficiently to focus on the review and correction of particularly problematic words by being presented the task as a game.
As part of Purposeful Gaming, project participants are digitizing seed and nursery catalogs and lists because these documents are great examples of materials that are notoriously difficult subjects for OCR to parse. The picturesque fonts and elaborate page layouts so endearingly characteristic of seed catalogs cause the resulting OCR output to be error prone and less than optimal. By identifying unique catalogs and lists in their collections and integrating them into the BHL Seed and Nursery Catalog Collection, transcription sites, and purposeful games, participating institutions are helping us enhance our OCR and improve access to not only these catalogs and lists but the entire BHL collection as well.
Seed catalogs and Index Semina – What’s the difference and why do we care
As described, Purposeful Gaming involves digitization of historic seed catalogs and seed lists, or index semina. What is the difference? Beautifully illustrated seed catalogues were issued regularly by seed companies to list their current selection available for sale. The catalogues occasionally included plants that were not only new to the garden, but also new to science. Similarly, the far less colorful seed lists were issued and exchanged by botanical gardens to facilitate the free exchange of new seed acquisitions and also included plant species new to science.
The seed lists were published and circulated in limited numbers and were often considered ephemeral so they were not generally deposited in libraries. Today no library in the world has a complete set. BHL partners have joined forces to digitize their collections to form a virtual set that is nearly complete and are far more accessible to botanists and everyone around the world.As described, Purposeful Gaming involves digitization of historic seed catalogs and seed lists, or index semina. What is the difference? Beautifully illustrated seed catalogues were issued regularly by seed companies to list their current selection available for sale. The catalogues occasionally included plants that were not only new to the garden, but also new to science. Similarly, the far less colorful seed lists were issued and exchanged by botanical gardens to facilitate the free exchange of new seed acquisitions and also included plant species new to science.
Nierembergia frutescens is just one example of a plant that was first named and described in a seed exchange list. This beautiful flowering herb is a member of the Solanaceae or Nightshade family. It was first named in 1866 by the French botanist Michel Charles Durieu de Maisonneue. He described the plant in great detail while advertising the availability of seeds of this new species to his colleagues in Catalogue des graines récoltées en. 1866, issued by Jardin-des-plantes de la ville de Bordeaux.
Identifying the Unique
With many Purposeful Gaming-affiliated institutions involved in digitizing seed and nursery catalogs, alongside the significant digitization underway at the National Agricultural Library (NAL), it can be difficult to ensure that the same catalog is not digitized by multiple libraries. Recognizing this challenge, Cornell University’s Mann Library developed a process to identify and digitize the unique seed catalogs in their collection.
The first step in this process was to collect metadata about the seed catalogs held by BHL institutions currently digitizing these works. This was done using Excel spreadsheets with matching columns. The merging of the metadata was complicated by differing cataloging processes among various institutions. For instance, NAL catalogs each seed catalog as a monograph whereas Cornell catalogs them as serials based on the firm name. To further complicate the situation, Cornell also cataloged the firms for which they have only a handful of catalogs in alphabetic ranges by firm name.
After the metadata for the seed catalogs in applicable institutions was merged into one large spreadsheet, it was necessary to try to match up the varying firm names. For example, one institution might have a firm cataloged as John W. Adams, whereas another institution may have JW Adams, Adams JW, or even John W Adams & Sons. Additionally many firms changed names over time.
Mann decided to use Google Refine in an attempt to standardize firm names. Using Google Refine’s Cluster option they were able to match up various firm names used for the same firm. (For more on clustering methods, please see https://github.com/OpenRefine/OpenRefine/wiki/Clustering-In-Depth). If requested, Google Refine will change the firm names in the spreadsheet to use a common firm name for each. This allowed the resulting spreadsheet to be sorted by firm so that all metadata for one firm appeared together in the spreadsheet. Mann Library then reviewed the spreadsheet manually to see what firms or seed catalog publication years are uniquely held by them, avoiding the scanning of material already digitized by other BHL institutions.
Gaming to Enhance Collections
So we’ve covered digitizing the catalog and list collections. How then will Purposeful Gaming use video games to decipher difficult-to-read texts–such as seed and nursery catalogs–that cannot easily be read by OCR software?
Here’s how it works: an original catalog is scanned, and the image uploaded to BHL. The image is then uploaded to a transcription portal, where volunteers type out the text that would be too difficult for a computer to read (thanks to all who have helped us transcribe seed catalogs this week!). Multiple transcriptions of the same text are then incorporated into the video game, which identifies discrepancies between them. The task of the player is to correctly transcribe the text in question through a creative video game interface. Eventually, games like this could help create searchable versions of seed and nursery catalogs, increasing their value to historians and horticulturalists alike.
Beta versions of the games are being tested right now, and we hope to release them this summer. Stay tuned to our blog and social media for more updates. In the meantime, you can help inform the game development by transcribing seed and nursery catalogs today! Learn more.
We’re so glad you joined us this week for Garden Stories. You can explore all our great posts by following #BHLinbloom on Twitter and Facebook and diving into our Garden Stories blog series. Be sure to check out the over 14,000 seed and nursery catalogs in BHL and enjoy over 2,500 seed catalog images in Flickr, with a selection also available in Pinterest. Find great online gardening resources, facts, and tips on the BHL Gardening Resources Page.
Manager, Business Development, National Agricultural Library
Director, Missouri Botanical Garden Library
Information Technology Project Manager, Cornell University’s Mann Library
Contracts Librarian, National Agricultural Library
Marketing Intern, Ernst Mayr Library, Museum of Comparative Zoology, Harvard University
Judith A. Warnement
Librarian of Harvard University Botany Libraries