A distributed Vertex-Deduplication Framework for Large Graphs
Systems and Scale Track | KGC 2023 • 26m
Our framework is built on top of a distributed Hadoop/MapReduce/Hbase infrastructure capturing both “low-level” graph database operations as well as “higher level” algorithmic aspects such as vertex-vertex similarity, graph clustering, and “robust” vertex id stamping. We have been using the framework to de-duplicate the Goldman Sachs Knowledge Graph – in this talk we will report a few experimental results applying the framework to public datasets.
Up Next in Systems and Scale Track | KGC 2023
-
Web-Scale Data Integration in Life Sc...
Biomedical data and knowledge are expanding exponentially in the 21st century. In this talk, I will present how Knowledge Graph technologies are increasingly being developed and leveraged in academia and industry to tackle complex data integration challenges in life sciences and healthcare. I wil...