Neuralsymbolic Visual Understanding and Reasoning using Deep Learning and KGs
25m
Visual AI has made incredible progress in basic vision tasks using deep learning techniques that can detect concepts in visual scenes accurately and quickly. However, the existing techniques rely on labelled datasets that lack common sense knowledge about visual concepts and have biased distribution of visual semantic relationships. As a result, these techniques have limited visual relationship prediction performance, limiting the expressiveness and accuracy of semantic representation and downstream reasoning. We employed deep neural networks to predict visual concepts, including objects and visual relationships, and linked them to generate symbolic image representation. To alleviate the challenges above, we leveraged rich and diverse common sense knowledge in heterogenous knowledge graphs to systematically refine and enrich the generated image representation. As a result, we observed significant improvement in recall rates of visual relationship prediction (7% increase in Recall@100), expressiveness of the representation, and the performance of downstream visual reasoning tasks, including image captioning (15% increase in SPICE score) and image reconstruction. The encouraging results depict the effectiveness of the proposed approach and the impact on downstream visual reasoning.