Recent Advances in representation learning and application of Deep Neural Nets towards structured data and Knowledge Graphs (KG) is enabling opportunities for multi-modal representation of entities and relations. We can now aspire to build access to data encoded in knowledge Graphs through one of many modalities (image, audio, text or video) and also train joint representations to cross over from one modality to another (e.g. text-to-image, audio-to-text). These kinds of capabilities allow us to build applications that can use entity information in entirely new ways, to exploit the sensors available in modern multi-modal devices like glasses, watches, smart earphones etc... These contextually aware smart devices provide a better model for the users to interact with the world and consequently need a more robust support from a multi-model knowledge Graph to help them contextualize the users environment. In this presentation I will talk about a retrieval architecture to support a multi-modal KG. I will also show some examples of prototypes we have built to multi-modal retrieval.