The metadata challenge
What exactly is the metadata challenge?
As discussed in our insight “the language of metadata”, using precise and consistent language as a business tool is a vastly different concept to using it as an expression of social behaviour, as a means of differentiation between social groups.
When language, in the form of words or terms, is used in information management, it is for the purpose of describing information resources so that we can discover, use and manage information over time.
Terms are applied to information objects as metadata: data which describes other data or content. Metadata is a core component of information systems. Essentially metadata puts structure into unstructured information.
Metadata delivers the best results when it is standardised and controlled at a user, enterprise or global level – ensuring that everyone is using the same language. Seldom will two people describe the same thing in exactly the same manner. Without defined rules, descriptive metadata becomes subjective and results in ambiguous language, as highlighted in the following example:
This is a fireant.
In Australia we spend millions of dollars trying to rid ourselves of this pest. As a notifiable pest, sightings of fireants must be reported in all states.
However, our simple fireant is also known as:
- Solenopsis invicta
- fierce ant
- ginger ant
- tropical fire ant
- red ant
- red imported fire ant
- rifa
- R.I.F.A
It can be spelled as:
- fireant
- fireants
- fire ant
- fire ants
- fire-ant
- fire-ants
All of these terms have been used to title files and documents in government agencies around Australia. So when scientists need to find a file or collate reports on sightings it may take multiple searches to ensure that they have the complete record!
This problem is not confined to Australia; imprecise description in information management is common throughout the English speaking world.
So what’s the big deal about using different terms and spellings to describe the same thing? Can’t we just do a Google-type search and find it?
Yes, but you will also get potentially thousands of hits and still be unsure whether you have actually found everything – and do you really want to scroll through all those results when consistent and precise description would allow you to get complete, accurate and targeted results?
Precision: A measure of search effectiveness expressed as the ratio of relevant records or documents retrieved from a database to the total number retrieved in response to the query.
Recall: A measure of the effectiveness of a search expressed as the ratio of the number of relevant records or documents retrieved in response to the query to the total number of relevant records or documents in the database.
Fall-out: The proportion of non-relevant documents that are retrieved, out of all non-relevant documents available.
Imprecise search results are a major records management issue: Do I have all the information I need to make a decision? Do I have the latest version? Are my records complete?
Incomplete records can cause major dysfunction for businesses as documented in over 90 published cases on our website Lest We Forget.
Is there a solution? How can we overcome the language barrier and create quality metadata to apply to our information resources? We’ll look into this further in our next post.