Why do we need canonical data models for information governance?
A canonical data model refers to a logical data model which is the accepted standard within a business or industry for a process/system etc.
In programming, canonical means “according to the rules.” The term canonical is the adjective for canon, literally a ‘rule’, and has come to mean also standard, authorised, recognised, or accepted.
Take accounting systems. Essentially all accounting systems are the same or very similar. The entities and relationships are the same, processes are the same, and data flows in the same predictable ways. This is because accounting systems are built to the same canonical data model.
Functional requirements are all very well, but they only express the features an organization desires in a software program. Models and diagrams bridge the gap between the business’ desires and the system produced by the software developers.
Logical data models are much more explicit, listing all entities and relationships and specifying attributes for each entity. Data is described in as much detail as possible.
The canonical data model is the accepted standard for logical data models and it provides software developers with a standard template to build to.
In the absence of industry standards developers will devise their own data models.
Given that most developers are unfamiliar with the detail of business processes, a lot of assumptions are made along the way, many erroneous. This is what has happened with recordkeeping systems.
A few issues that could have been avoided if we had industry standard data models.
- A bizarre multiplicity of names for industry standard terms
e.g. retention schedule to describe a records class, expiration to describe disposition - Inbuilt processes which have no real world equivalent
e.g. records declaration - Real world processes which are not captured in the data model
- The loss of metadata when new data is imported into system tables and overwrites data linked to records
- The inability to apply retention rules to records because the system doesn’t support the events that trigger disposition changes
- The loss of metadata when records are migrated from one system to another with an incompatible data model
That’s just for starters.
Given how often information systems are replaced, one of our great industry challenges is to achieve cleaner information exchanges between systems.
As yet we still don’t have agreed data definitions to describe some of the most common elements of recordkeeping such as terms, classes, disposition events, and disposition actions.
In an ideal world, there should be no need for complex mappings when performing simple tasks like uploading retention rules into systems or when migrating records from one system to another.