Metadata quality is made up of many aspects, but one principle that receives little attention is ensuring accurate spelling.
According to Beall (2003), there are nine common spelling mistakes.
|Misstrokes||errors traced to inaccurate motion of the finger||[None given]|
|Transposition errors||two consecutive letters in a word are interchanged||typing iknd for kind|
|Interchange errors||two non-consecutive letters are interchanged||typing jamor for major|
|Migration errors||One letter moves “migrates” to a new position||typing atht for that|
|Omissions||a letter in a word is left out||typing omt for omit|
|Insertions||an extra letter is inserted into a text||typing asnd for and|
|Substitutions||occurs when the wrong letter is typed in place of the correct letter||[None given]|
|Doubling errors||A word containing a repeated letter is typed so that the wrong letter is doubled||typing bokk for book|
|Alternation errors||A letter alternates with another, but the wrong alternation sequence is produced||typing thses for these|
Table from Beall (2003)
Typographical errors can hinder the findability of your assets in your digital collection. Because metadata acts as a surrogate for an object, metadata typos will impede retrieval: nothing will be retrieved when users input terms into a search field that do not match up with the ones used to catalog assets in your system. The ease with which metadata can be duplicated compounds the problem. Copy a file with embedded metadata with typos and you’re left with double the trouble.
It has been argued that user created metadata — aka social tagging — is scalable (crowdsourcing), informative (Amazon), and aids discoverability. As important as these advantages are, they can lead to gross misspellings that can be duplicated across your retrieval system.
Automatically generated metadata, as attractive as it may be, is not the answer. Most computer programs still have difficulty disambiguating terms from one another.
How can you manage the fallout from misspelled metadata values? Here are a few ideas for you to consider.
- Use a controlled vocabulary (CV)
- Ensure the use of the CV (e.g., pick-lists, autofill)
- Add variant terms (e.g., common misspellings) to your CV
- Spellcheck (but review suggested spellings for inaccuracies!)
- Clean your metadata
- Enable fuzzy searching
Beall, J. (2006). Metadata and Data Quality Problems in the Digital Library. Journal Of Digital Information, 6(3). Retrieved from http://journals.tdl.org/jodi/index.php/jodi/article/view/65/68
McIlroy, T. (2012, July 3). Spelling and Findability. Future Of Publishing. Retrieved September 25, 2013, from http://thefutureofpublishing.com/2012/07/spelling-and-findability/