Search

Exploring historical data changes

Business Problem

Enterprise data is becoming increasingly cloud-native and scattered across different systems of records and storage. This makes it challenging to keep track of how its history changes, especially once it is transformed and loaded into reporting-ready data warehouses and marts. This presents a host of business and governance challenges when auditing is required, as well as limiting factors when historical context is eventually needed for machine learning and data science applications.


Components

Requirements

  • Access to an Infor CloudSuite (or any data in Datalake)
  • User privileges for Data Fabric (DATAFABRIC-SuperAdmin)
  • Optional Infor U Campus courses:
    • Infor OS: Foundation for Multi-Tenant – Part 1
    • Infor OS: Foundation for Multi-Tenant – Part 2

Tutorial

Difficulty: Easy

Estimated Completion Time: 10 minutes

The Infor Data Lake, as a central repository for enterprise data, and by virtue of being an immutable object storage architecture, can retain the full history of records ingested into it. With its built-in data versioning abilities and synthetic functions to expose previous or deleted versions of records stored in it, the Data Lake allows building queries that travel back in time to investigate data changes, deleted records from systems of records, and unlocks a host of use cases for data science and machine learning applications.

As data is ingested in the Data Lake, data objects are indexed for future retrieval and querying. A number of properties are added and are specifically used to create what we commonly refer to as synthetic columns . These columns exist as queryable platform metadata and can be useful in data processing and exploration. Particularly, in this scenario, these can be used to effectively query across time and review version changes in data replicated from systems of records.

In this tutorial, we’ll focus on tracking historical changes for specific items in the MITBAL table which comes from Infor M3. The table contains item details per warehouse. This table is part of a provisioned replication set for CloudSuite tenants where changes to the table are replicated to Data Lake at preset intervals.

1. Explore the various object properties in Atlas

Before we begin, navigate to Data Fabric/Datalake/Atlas:

Navigate to Atlas and select the table MITBAL

As you can see below, the table consists of multiple data objects that were incrementally loaded into Data Lake. Selecting any one of these objects allows you to preview its contents on the panel to the right or explore its properties.

Data Object properties in Atlas

2. Familiarize yourself with query hints and synthetic functions in Compass

Exposing these properties in a query-setting can be done in Compass through the use of a SQL query hint accompanying a SELECT statement:

Using the expression “ –includeInSelectAll=s/g/p/l” before the SELECT statement, will expose and append the columns in the following image to the result set. Note that the syntax starts with two hyphens followed by an asterisk.

Query results with –*includeInSelectAll=s/g/p/l

Alternatively, and specific to our time-based analysis requirement, you can run a simplified SELECT statement and append the infor.lastmodified() function to the list of columns you’d like returned:

Query to show infor.lastmodified column

3. Use the infor.lastmodified function to visualize historical changes to records

We now introduce infor.allvariations() as another synthetic function to our query. Infor.allvariations() enables you to select all variations, including lower variations and variations marked as deleted. We can also add a WHERE clause to focus our auditing on a specific item in the table, and order our results by their lastmodified date:

Visualizing historical variations for specific records

4. Filter queries using infor.lastmodified()

The lastmodified column and function can also be used to filter while querying your data resulting in a much more efficient query performance.

For example, if I’m only interested in querying across data lake objects that landed in the data lake since yesterday, I can run my query with a WHERE statement limiting the scope of lastmodified to dates starting on that date. Keep in mind that the while the lastmodified property is returned as a timestamp in UTC in ISO8601 format with three milliseconds, you can use a date or a timestamp that does not include milliseconds in your WHERE clause against it:

Filtering query results with infor.lastmodified()

Additional Resources:

This video provides an overview of the infor.lastmodified() function covering most of the concepts included in this tutorial:

Overview of infor.lastmodified() query function

Best Practices