Actionable, Facet-based, Property Graphs
Metadata Track | KGC 2023
•
29m
In general, property graphs are very flexible since we can associate any number of properties with nodes and edges. To add more structure, nodes and/or edges are often typed (via a label). In that case, a labeled node (or edge) of a particular type is expected to have specific properties. This works fine if node types are well defined and remain relatively stable. But what if we want to define relationships between any kind of nodes (existing or future node types)? For instance, in a metadata graph, we may be interested in the data lineage between various node types ("entities"), but in reality it doesn't matter whether the node type is a dataset, the input to (or output of) a machine learning model, a physical device or digital twin that provides real-time data, etc. To model data lineage, all nodes need to include a group of properties that we would refer to as a database schema, but the actual type of those nodes is irrelevant. In general, how nodes can be related to other nodes, or how any service can observe or interact with nodes in a graph merely depends on the shared groups of properties which are often referred to a aspects or facets. In our presentation, we provide numerous examples of the various benefits that graph models which are based on facets provide. In particular, we will focus on actionable property graphs that can be utilized for self-governing data management and various aspects of optimizations via a pattern that is very popular in game programming, namely Entity Component Systems (ECS). Instead of defining nodes of a particular type, nodes are merely modeled as UIDs and sets of facets (aspects, components) that are standardized and can be added dynamically. For a metadata graph, this could include the logical model (via schema and ontology facets), physical aspects (facets for data formats and locations), statistics and usage, governance (e.g. facets state details about the inclusion of personal identifiable information). In order to make a graph actionable, external processes (so-called systems) operate on (arbitrary nodes) that happen to include certain facets. One system would operate on nodes that contain a schema facet and ensure that data lineage is maintained and provides an impact analysis if changes are necessary. Another system continually monitors access restrictions for nodes that represent datasets and contain a facet that specifies personally identifiable information.