Ben De Meester | PROV4ITDaTa: Flexible Knowledge Graph Generation Within Reach
KGC | All Access Subscription
•
16m
Personal Knowledge Graph generation is no longer a cumbersome technical endeavor. PROV4ITDaTa is an MIT open source platform to provide a smooth user experience for generating knowledge graphs from your online web services, such as Google, Flickr, and Imgur, into your personal data space. This brings your personal data back under your control, and as a graph, its true interlinking potential is unleashed. PROV4ITDaTa allows to configure and set up a web application where users can easily pick one or more web services to extract their data from, transform that data into best-practice knowledge graphs, and push those graphs to a personal data space, such as a Solid pod.
All heavy lifting is included in PROV4ITData: management of service authentication (e.g., OAuth 1.0/2.0 sessions), setting up the infrastructure to extract and transform your personal data from popular web services, directly loading those graphs into your personal data space, and generating a simple user interface. Developers only need to focus on providing custom connectors to more specialized web services, and configuring the data processing pipeline to generate knowledge graphs into any needed form or shape.
The data processing pipeline is based on RML.io, meaning that it is extensible to any data source, can integrate multiple data sources on the fly, includes data cleansing functions, supports any RDF graph structure and ontology, and its configuration is fully declarative. The pipelines are thus more maintainable and more transparent than hard-coded solutions such as the Data Transfer Project (DTP). Where existing platforms such as DTP provide one-to-one data transfer, PROV4ITDaTa enables extensible many-to-many data processing pipelines, beyond the web services and (personal) data spaces we currently provide. All these features can easily be included in the PROV4ITDaTa platform, and will be showcased during the presentation.
As a result, developers can more easily support custom web services. As users can generate personal knowledge graphs faster and easier, more knowledge graph applications can be built that rely on real-world data. With a click of a button, users can try out knowledge graph applications using their actual data from music streaming systems, fitness apps, address books, social media, etc.
Continuing this product, we are building a data processing workbench, where these different data processing pipeline configurations can be managed, scheduled, and orchestrated, giving companies more control, and allowing to upscale PROV4ITData more easily.
Introducing PROV4ITDaTa is Ben De Meester, a postdoctoral researcher at IDLab. In this talk he provides insight on the problems with current data that is messy and personal.Data is scattered across several service providers and is difficult to transfer over. The solution he provides is PROV4ITDaTa, an open source flexible tool for transparent and direct transfer of personal data to personal stores. He discusses mainly about PROV4ITDaTa explaining how it could knowledge scientists and allow users to easily transfer their data using PROV4ITDaTa. #knowledgegraphs #knowledgegraphconference #knowledgegraphtools #knowledgegraphandbigdataprocessing
Up Next in KGC | All Access Subscription
-
Joshua Shinavier | Anything To Graph
Show me your schemas, and I will show you a graph! Although graph databases have become very popular in the enterprise, deep expertise in graphs is still in short supply (see "Building an Enterprise Knowledge Graph @Uber: Lessons from Reality" from KGC 2019). Developers often think of graphs as a...
-
Abhishek Mittal | Re-Imagining Regula...
Content Enrichment: Development and deployment of a 5-stage taxonomy. Applying the taxonomy to tag regulations and classify them for improved discovery & work assignment.
Smart Authoring: Leveraging advanced NLP and ML techniques to learn from the past content authoring for identification of ... -
Jonas Almeida | Data Commons In The W...
The increasing reliance on distributed epidemiological data sources for time sensitive analysis defines an emergent computational commons space: API ecosystems supporting epidemiology data commons. This space has been forced to evolve significantly to meet the real-time requirements of COVID-19 i...