skip to content
  • Project Partners

  • Project supported by



We plan to demonstrate the generality and to extend the procedures developed by Lu et al. (2008), applying them to the Biodiversity Heritage Library scans of some journal volumes held at the Natural History Museum.

The metadata we extract will focus on proper nouns (taxon, people and place names) and dates. We will enhance the searchability of those terms using associative techniques from Natural Language Processing (NLP) combined with likely Optical Character Recognition (OCR) errors, for example by allowing the recovery of Pioa against a search for Pica, provided the context of Pioa is a bird, ideally a magpie.


Lu, X., Kahle, B., Wang, J. Z. & Giles, C. L. (2008). A metadata generation system for scanned scientific volumes. Proceedings of the 8th ACM/IEEE joint conference on Digital libraries, Pittsburgh PA.