Skip to main content
Version: 2.3

2.1 Data Engineering & Data Integration

The basic concept regarding Data Engineering and Data Integration in Data Context Hub is transforming incoming data, which is delivered via Data Pumps, from different sources into a common Data Context Hub Model based on Target Entities and Relationships. These components represent the Staging Layer. Based on the Model and the data present in the Staging Layer the knowledge graph can be generated via the Processing step.

"data integration overview"

Staging Layer

The Staging Layer in the Data Context Hub consists of Data Pumps, Target Entities, and Relationship objects. It enables the transformation of incoming data from various sources and the integration by applying the model, based on the Target Entities and Relationships.

Data Integration in the Data Context Hub context is defined by saving the data in the defined Target Entities and generating the relationships based on the relationship's objects between Target Entities.

It provides an environment where the data can be shaped into a form required, to efficiently generate a knowledge graph.

Data Pumps

Data Pumps are pluggable components in Data Context Hub that enable the extraction of data from different sources and transform it into a tabular form. They can be installed on a particular Data Context Hub instance.

More details on the usage of Data Pumps can be found in the chapter Data Pumps.

The incoming tabular data can be mapped to the properties of the defined Target Entities.

Target Entities

In Data Context Hub, a Target Entity is a data object that represents a real-world object or concept, such as a person, place, or thing. An entity has properties, such as a name, type, and description, or similar, and can also have relationships with other entities, such as a one-to-many or many-to-many relationship. Each Target Entity needs to have one property that is unique. This entity is marked as the Business Key.

In this context, Target Entities are used to model the data that is stored in the Data Context Hub Staging layer and are a key component of knowledge-graph modelling.

An entity is typically represented by a table in the Staging Layer, where each row in the table represents a single instance of the entity, and each column represents an attribute or property of the entity. Those rows will be processed into nodes in the knowledge graph.

Relationships

In Data Context Hub, a relationship can be created between two different Target Entities or one Target Entity to itself. A Relationship defines a From Entity and a To Entity as well as the Properties in each, which values will be compared during the data integration process.

The evaluation of the relationship is performed when the data is loaded into the Staging Layer of Data Context Hub. When the incoming data arrives, Data Context Hub based on the relationship definitions, compares the values and generates relationship objects in the Staging Layer. Those will later result in edge-connecting nodes in the knowledge graph.

Knowledge Graph Processing

When the data is pulled from the sources, saved in Target Entities and relationships are generated, the knowledge graph can be generated with Graph Processing. In this process, the rows from the Target Entity tables are transformed into nodes and the relationship objects into edges.

After the Graph Processing is finished, the knowledge graph can be explored in the Data Context Hub Explorer or by querying the Graph Database used in Data Context Hub.