Why you need ontologies to automate records appraisal and classification
Even information professionals need a helping hand to appraise and classify the growing document stockpile. Not surprising then that most organisations we work with have reached the same point where autoclassification is the only viable solution to appraise, capture and classify their information holdings in accordance with the business and governance requirements.
What do we mean by autoclassification?
Autoclassification is the process where documents are classified (tagged with metadata) by machine. So if you read this Webopedia definition, you could easily think that the machine can make informed decisions about classification just by reading the document. In reality, the classification is dependent on the knowledge that you build into the autoclassification engine.
Classification that supports information governance is significantly different to classification for search. There is more complexity because there are more aspects to consider.
- Certainly users want to be able to capture and classify their documents so they can find, use and share them
- But information professionals have to classify with metadata that governs access, enables data protection (ie GDPR), retention and disposal, all in accordance with contemporary standards and legislation
- ICT professionals want to be able to manage information infrastructure more effectively
- And the C suite want everyone to be more efficient information managers, to get on with their core business – but without exposure to unnecessary risks
Is it possible to achieve all this through autoclassification?
Yes it is, but we need to develop fit for purpose machine readable data models, such as ontologies, that convey the requisite knowledge into the autoclassification platform.
Ontologies are linked data models for describing a domain that list the types of objects and their instances, the relationships that connect them, and the constraints on the ways in which objects and relationships can be combined.
The term dictionary is used to refer to an electronic vocabulary or lexicon as used for example in spelling checkers. If dictionaries are arranged in a subtype-supertype hierarchy of concepts (or terms) then it is called a taxonomy. If it also contains other relations between the concepts, then it is called an ontology.
Unlike file plans, ontologies enable us to combine multiple concepts and define multiple types of relationships. In an ontology we can accommodate several taxonomies within the same scheme so you can address the needs of all stakeholders. And, because they are built for machine application, scale is not an issue.
Ontologies hold the key to automated appraisal
Ontologies enable the automation of records appraisal. If we extract the knowledge built into contemporary disposal authorities, we can create data models which enable the autoclassifier to recognise significant concepts then tag documents with the appropriate records class ID.
It’s all about the linked data model and, with a BCS and retention schedule incorporated into an ontology, content can also be tagged with recordkeeping metadata.
The same outcome could never be achieved with a file plan model.
Are ontologies difficult to build?
Not in my experience. In fact, I find ontologies far easier to build than file plans because the logic is more explicit.
You start by defining your data model (i.e. what metadata you want to tag with), and the relationships between your concepts. Then you slice and dice existing controls, metadata libraries, business classification schemes and disposal authorities.
Lastly, you need an environment where you can use the outputs of autoclassification. We’ve loaded the ontology into the SharePoint Term Store and are using all of the SharePoint functionality for exploiting managed metadata – creating views, setting up filters, refining searches and redirecting documents using workflows.
Of course you need purpose built tools to achieve all this. We used our own a.k.a. software for building the ontology and an autoclassification platform to apply the ontology and tag the documents in our SharePoint library.
Where to get training on building ontologies?
Synercon offers a training course: Metadata, Taxonomy and Ontology Design for Automating Information Governance.
We’ve created this workshop to share the knowledge of how to develop your metadata and classification schemes into ontologies for autoclassification and auto-appraisal.
Don’t hesitate to contact us if you would like to host or attend one of our training courses