Metadata quality is made up of many aspects, but one principle that receives little attention is ensuring accurate spelling.

According to Beall (2003), there are nine common spelling mistakes.

Error category Definition Example
Misstrokes errors traced to inaccurate motion of the finger [None given]
Transposition errors

two consecutive letters in a word are interchanged typing iknd for kind
Interchange errors two non-consecutive letters are interchanged typing jamor for major
Migration errors

One letter moves “migrates” to a new position typing atht for that
Omissions

a letter in a word is left out typing omt for omit
Insertions an extra letter is inserted into a text typing asnd for and
Substitutions

occurs when the wrong letter is typed in place of the correct letter [None given]
Doubling errors A word containing a repeated letter is typed so that the wrong letter is doubled typing bokk for book
Alternation errors A letter alternates with another, but the wrong alternation sequence is produced typing thses for these

Table from Beall (2003)

Typographical errors can hinder the findability of your assets in your digital collection. Because metadata acts as a surrogate for an object, metadata typos will impede retrieval: nothing will be retrieved when users input terms into a search field that do not match up with the ones used to catalog assets in your system. The ease with which metadata can be duplicated compounds the problem. Copy a file with embedded metadata with typos and you’re left with double the trouble.

It has been argued that user created metadata — aka social tagging — is scalable (crowdsourcing), informative (Amazon), and aids discoverability. As important as these advantages are, they can lead to gross misspellings that can be duplicated across your retrieval system.

Automatically generated metadata, as attractive as it may be, is not the answer. Most computer programs still have difficulty disambiguating terms from one another.

How can you manage the fallout from misspelled metadata values? Here are a few ideas for you to consider.

  • Use a controlled vocabulary (CV)
  • Ensure the use of the CV (e.g., pick-lists, autofill)
  • Add variant terms (e.g., common misspellings) to your CV
  • Spellcheck (but review suggested spellings for inaccuracies!)
  • Clean your metadata
  • Enable fuzzy searching

References:

Beall, J. (2006). Metadata and Data Quality Problems in the Digital Library. Journal Of Digital Information, 6(3). Retrieved from http://journals.tdl.org/jodi/index.php/jodi/article/view/65/68

McIlroy, T. (2012, July 3). Spelling and Findability. Future Of Publishing. Retrieved September 25, 2013, from http://thefutureofpublishing.com/2012/07/spelling-and-findability/

Leave a comment

I’m Ian Matzen

Welcome to Tame Your Assets: a blog about digital asset management. I am a Senior Manager (Automation Programs) with a Master of Library and Information Science degree and experience working in higher education, marketing, and publishing. Before working in DAM I post-produced commercials, episodic television, and corporate videos. Recently I wrapped up an automation project for Coca-Cola.

Let’s connect