How OCR Helped a Herbarium Disseminate Knowledge

by | Published on Oct 4, 2014 | Data Entry Services, Outsourcing Services

OCROptical character recognition (OCR) is an often indispensable tool in data entry and transcription. This has been used by the Royal Botanic Garden Edinburgh (RBGE) for expanding its digital archive. The digitization drive is being carried out with great enthusiasm.

What a Herbarium is All About

Plant specimens of bygone eras are preserved in a herbarium in their original form – though dead, they are still in a distinguishable form. They are dried by being placed between paper pieces and mounted on card. They provide a reference collection of the various kinds of fauna that populate the earth – this process is how people have preserved plant specimens and kept them through centuries. The herbarium is where one can find such specimens. The RBGE is one such place, and it is looking to digitize these specimens and store them in virtual records.

Specimens in the RBGE herbarium date back to 1697, while the nearly three million specimens housed here represent more than half of all flora in the world. Moreover, the collection gets richer by around 10,000 to 20,000 specimens every year.

The Need for Online Documentation

However, the difficulty in sharing the specimens had set back the herbarium. Physical sharing of the specimens always poses the risk of damage or loss, but even if these factors were discounted, sharing is always limited to people visiting the herbarium unless they’re sent out on loan. Online documentation of the specimens helps increase access to the herbarium’s collection and also contributes to research.

As part of this process TBGE embarked upon imaging the specimens but faced issues when it came to capturing text on the specimen labels, which could be found in various languages and fonts. Many of the defining features of certain plants cannot always be spotted from the specimens; they have to be described textually including their habitat, scent and the color of their flowers. This information which could be found in the labels of the specimen was manually entered in a time-consuming procedure which caused many records in the database to be incomplete. That’s where the need for text recognition was felt.

The Need for OCR Text Recognition

The need for a smart technique was felt, to capture text on specimen labels even if it is complex and the quality of the label is poor. Most importantly, all the information needed to be captured without any bit of it being lost. The technique would also need to be efficiently incorporated into RBGE’s Image Management System.

RBGE’s Text Capture and Image Management

That’s where OCR (optical character recognition) was thought of. This technology, an integral element of document conversion services, could enable capturing the label text and its conversion to editable digital information. RBGE used it for converting scanned images into text documents in order to classify, search and export the information to the internal system of RBGE for document storage as well as management.

The Recognition Server then accesses the TIFF images (the format in which RBGE stores images) stored in one of the folders of the herbarium’s Image Management System and processes them, creating two output files – an image PDF for backup and the plain text file that is saved in a folder on the RBGE server. The latter file is picked up and entered into the MySQL database by RBGE’s workflow. From here it can be easily accessed by worldwide researchers through the RBGE website or other respectable online botany resources.

That’s job done and the mission of the RBGE fulfilled, all thanks to OCR.

Recent Posts

What is the Future of RPA (Robotic Process Automation)?

What is the Future of RPA (Robotic Process Automation)?

Robotic Process Automation (RPA) has come a long way from just being a fancy gimmick to woo users, into a trainable and effective tool to handle mundane and repetitive tasks performed by humans. RPA has now evolved to be an indispensable tool of broader automation...

RPA and Outsourcing: A Winning Combination

RPA and Outsourcing: A Winning Combination

Customer expectations have gone through the roof with the advancements in technology over the past decade. Instant gratification is a norm with customers nowadays and that puts a burden on businesses to meet their ever-growing demands, without significantly putting a...

Data Mining: The Foundation of Strategic Business Decision-making

Data Mining: The Foundation of Strategic Business Decision-making

The emergence of global organizations such as MNCs in the world market generates data at an accelerating pace, enabling unique opportunities for business growth. In information analytics and business intelligence, data mining is a robust, versatile technique for...

Share This