D5.1orig-Appendix G - The Terminology Systems for Getty Images

From EChase
Jump to: navigation, search
Warning: this is a copy of the D5.1 document as originally submitted to the EC.

To contribute please use: The Terminology Systems for Getty Images.

Back to D5.1orig System Specification


When we had been working with Hulton Deutsch for a year or so the library was acquired by Getty Images. The business philosophy of Getty Images was to turn a disjointed and fragmented stock photography market into a thriving, modernized industry. As the twin motors of the web and imaging technologies were gaining momentum in the late 1990s, it was clear that the business challenges would include the need to allow a variety of legacy systems, already in place in libraries acquired by Getty Images, to work together harmoniously in a corporate systems environment. A primary requirement was to be able to search across all Getty Images libraries from a single interface.

Building on our experience with the Hulton thesaurus and our wider work with large scale public sector image repositories and the use of the Art and Architecture thesaurus in museum applications, we began work with Getty Images IT staff to develop and implement a terminology based search system to support the corporate strategy. It comprised the following components:

COMPASS

Originally known as the Internal Picture Research Tool this was a server/client database system with a Windows client and a Unix server. The Index+ database holds image metadata which was originally uploaded from a corporate Oracle databases is now updated nightly by the Index+ system. The image data in Compass is read-only and the only way it can be changed is through the WIX (Windows Index+) client interface. COMPASS provides search functionality across the main Getty Images libraries.

Pilot

Almost identical to the Compass system, Pilot is designed to take flat data files from the Getty Images libraries and map them onto an agreed central database structure adding fields as required. Example libraries include the Hulton Archive, EyeWire, PhotoDisc, The Image Bank, Illustration Works and The Bridgeman Art Library. Pilot supports keyword editing and workflow control flags.

Merlin

Merlin provides the vocabulary management capability. The main elements of the vocabularies are

Tree Words

A hierarchical tree-structured set of keywords, currently about 10,000 items.

Exacts

A further 16,000 set of keywords which are not incorporated into the Tree Words hierarchy.

Synonyms

(`the thesaurus’). A series of alternative terms for other items in the vocabulary.

Roof Terms

A number of short-cuts for commonly used combinations of other vocabulary terms, which are expanded into their definitions when typed (e.g. "safari animals")

Macros

A number of additional keywords which are combinations of other vocabulary terms and which are added during the indexing phase.

Translations

Translation of UK vocabulary terms and synonyms is carried out off-line, with translated data coming back in tab-separated files in the following format:

  • Foreign Keyword
  • Foreign Synonyms
  • UK Keyword
  • UK Parent/UK Keyword
  • Keyword ID
  • Parent ID
  • UK Synonyms

The languages required were

  • Brazilian
  • Portuguese
  • French
  • German* Italian
  • Dutch
  • Spanish
  • Swedish

Operation of the Compass/Pilot/Merlin application

The three components share a common platform based on SSL’s Index+ database and the terminology client. Compass and Pilot share 95% of the source code and Merlin uses the same infrastructure. The database structure is common across all the applications. Thus the component base is robust and reliable.

Use follows a 24 hour cycle based on the main servers at corporate headquarters in Seattle. Each night the Compass/Pilot/Merlin system dumps existing keywords to an archive, retrieves new image data and keywords, updates the current image data and vocabularies and creates files of image metadata, vocabularies and the their translations.

Configuration details

The system is a configuration of the following components:

For the client software

  • Index for Windows
  • Application library DLLs
  • Index+ TCP/IP network DLL
  • Application extension DLLs
  • Client startup system
  • Optional bitmaps

For the server:

  • Index+ TCP/IP server
  • Configured database
  • Image arena
  • Application tools and extensions

All source code (C, C++, applications and shell scripts) was maintained using SCCS at SSL. Mirror copies of the code, not the data, were also maintained at SSL.

Back to D5.1orig System Specification