Barriers to auto-classification for information governance and how to overcome them
After my last post on classification, I got into conversations with colleagues about why we haven’t been able to leverage auto-classification more for information governance (IG). Given that its no longer possible for us humans to process the vast quantities of information we create and receive.
So why can’t we take advantage of the tremendous advances made in the field of data analytics and auto-classification?
Let me explain.
What you see in the above model is a digital transformation process, where 1) a body of knowledge is translated into 2) schemes, rules and processes, which are then 3) parsed into data elements, which can then be 4) built into algorithms enabling automation of the process.
This is the means by which modern accounting systems (like Xero) have evolved, where computer based processing is achieved through the creation of rules based on data recognition. Same with ERP, same with other business systems.
But not with information governance. Because we have failed to convert the information governance body of knowledge into a digital framework.
Over the last 20 years, the IG body of knowledge has been well documented in standards such as IS015489 Records management, ISO23081 Recordkeeping Metadata, and various government standards relating to security, privacy, data protection, recordkeeping etc. But the controls which support these standards, like retention schedules, are still written as documents for human interpretation and application.
Take this description from the Records Authority 2014/00247391 for records which are to be retained as national archives, ie permanently:
Records relating to arrangements, agreements, Memorandums of Understanding (MOUs) and contracts relating to the management, conservation and use of significant NCA managed land and assets such as National Memorials, diplomatic sites and estates, heritage value buildings and Lake Burley Griffin. Includes water abstraction agreements, Crown Leases for diplomatic sites and estates, agreements with external parties to undertake major works activities and projects, including those that do not proceed.
And this for records which can be destroyed 12 years after completed or termination of agreement:
Agreements, Memorandums of Understanding (MOUs) and contracts relating to the management, conservation, maintenance and use of NCA managed land and assets, other than those covered by class 61534.
Based on these rules, a human appraising records would interpret the rules by knowing which document types, sites and buildings, business activities to look out for.
But if we specified which documents types, sites/buildings and activities were significant, the same work could be undertaken using a search engine.
To enable automation we need to transform our governance controls into lexicons, taxonomies, data models from which we can build algorithms to feed into the search engines, enabling recognition of the terms/term sets which indicate significance. Likewise for access and security controls.
So how can we breakthrough?
Oddly enough we have an effective model for converting information governance requirements into a digital framework. It is called the DIRKS methodology and you’ll find it in ISO 15489 Standard for Records Management. Over the last 15 years consultants like myself working in Australian government have followed this methodology to develop classification schemes and retention schedules. I developed much of my body of knowledge from following this methodology over and over again and then by developing a.k.a.® software to make the process a whole lot easier.
DIRKS involves a lot of work, but by following the methodology we are able to develop a logical model which puts the information governance rules in context. And it provides an extensible framework from which to build the data/metadata structures which will underpin the automation of information governance processes.
What’s holding us back?
DIRKS fell from grace in 2007 because the process mandated by the National Archives of Australia was highly prescriptive and costly. A bit like the parent of Baby Huey, NAA struggled to control their creation, and there was a backlash from government agencies. But in the intervening years no one has come up with an effective alternative, so it’s overdue for a reprise.
I predict that once we take re-embrace DIRKS as a methodology for developing our classification and data models, automation is going to become a whole lot easier.