Introduction
In my 20 years’ experience of managing data within the asset management industry, I have noticed many similarities between data sets. Almost every system has ingested data sets using interfaces that were tailored to each individual data set. It seemed a practical idea: since the data sets are being loaded into a relational database, we would want to create a similarly shaped table as a target for the interface. Often, the requirement provided to the development team, such as the example below, maps the source data directly to a target table. This would then drive the creation of a bespoke target data shape.
The Problem
So why is this an issue? Surely, if it’s been used as a load pattern for so long, it must work? Indeed, it does. The problem is, however, that this approach is starting to show its age. In an era where data sources are increasing in breadth and depth, this legacy method has increasingly revealed its drawbacks:
-
Time to market. A bespoke load per data set means having to code the interface, then test it. This increases the time before making it available to the data consumers.
-
Repetitive development effort. Development resources are directed to these business change requirements, rather than concentrating on development that directly adds insight value.
-
Maintenance. Each bespoke interface is a distinct set of artefacts that must all be supported. Over time, this support overhead increases.
-
Release overhead. Each new on-boarded data set brings a new cycle of test and release. This means an increase in operational risk from the code releases to Production.
The Solution
So, what’s the solution? There are a plethora of possible implementations, but these all tend to boil down to use of meta data. With meta data we can describe a data set. Looking at the above picture from a mapping document, the meta data is in the two right-most columns. It is the description of each source data set’s attributes, so the data points can be interpreted in context. I am simplifying here – the actual meta data would contain additional tags to accurately define the attributes. There is also coding required to align the source data points to the meta data describing them (but once we look at things this way, we are free of the constraints of bespoke target tables). This is essentially a unified piece of coding that underpins the scalability of the generic load approach.
By adopting a meta data-driven load pattern we can access several value-add scenarios that counteract the above problems:
-
Time to market. Instead of spending development cycles on a new load, we can use a moderately-valued story to implement the meta data that describes a new data set. This greatly reduces on-boarding time and can even move ownership of the process to the data stewards.
-
Repetitive development effort. A single, generic pipeline becomes a unified piece of code that manages the load. It does this by leveraging the meta data to provide that context of a new data set.
-
Maintenance. With a single pipeline to maintain code, maintenance no longer grows linearly with the scope of data sets.
-
Release overheads. Releases tend to become more the domain of the business rather than technology. The releases become scoped at the introduction of new meta data to Production.
In addition to the counteraction of legacy issues, we also gain access to an extendable way to manage data beyond the data load. We can extend the meta data to describe if or how the loaded data should be typed, validated, and routed to data schemas. By extending the meta data model those schemas can interface with the meta data to create their own dynamic abstractions of the data points. Furthermore, governance and reporting of load cycles naturally become a consistent experience in the UI (User Interface). This enables a convenient data model for BI (Business Intelligence) analysis of the loads, driving data rationalisation.
There are also further considerations, such as regression testing effort when the unified pipeline undergoes a maintenance change. This effort can be mitigated with continuous testing built into the pipeline. A distributed event-based architecture would work well here, where the unified code is instanced and runs on its own resources.
In real-world usage the UX (User Experience) would need attention to present a digestible view of the many data sets ingested daily. Judicious use of dashboards to roll-up to insightful metrics can help here. The maintenance of the meta data would be managed by the business, whose staff would appreciate a less onerous experience. The system could generate an inferred set of meta data to mitigate the task.
The Scope
There is almost no limit to the scope of source data sets. Providing the data is structured or semi-structured, the source data sets should be in-scope (whether that is via csv/flat files, XML/JSON messages, streams, or indeed other tables). The data sets do not even have to be flat in structure: a data set of key-value pairs (as often is the case with the new breed of sustainable investment data) can be modelled with meta data. As this is a conceptual design it can be implemented on most integration platforms, from SSIS to ADF to EDM, and beyond.
In other industries the take-up is already widespread. Perhaps it is time for asset management to do the same – and leverage some quick wins?

Interested in speaking to one of our team?
If you’ve got questions, we’ve got expert insights. Contact us to discuss how our expertise can be leveraged to address your most pressing business and technology needs.
Latest Insights

Centralised vs. Federated vs. Hybrid: Choosing the Right Data Governance Model
Data governance is often overlooked until something goes wrong. We regularly see firms prioritise it...

The Next Chapter Introducing Our New Website
Some milestones feel like a moment to stop and take stock. Five years ago, Liqueo was just an idea &...

Six Considerations to Set Testing Up for Success in OMS Transformation Programmes
By Senior Consultant, Shireen Quadir, and Consultant, Edward Wimble The success of an Order Manageme...