International Journal of Image, Graphics and Signal Processing(IJIGSP)

ISSN: 2074-9074 (Print), ISSN: 2074-9082 (Online)

Published By: MECS Press

IJIGSP Vol.12, No.5, Oct. 2020

Learning Semantic Image Attributes Using Image Recognition and Knowledge Graph Embeddings

Full Text (PDF, 583KB), PP.44-52

Views:2   Downloads:0


Ashutosh Kumar Tiwari, Sandeep Varma Nadimpalli

Index Terms

Knowledge graph embeddings, Image attributes, semantic Information, Image recognition, entity embeddings, Convolutional Neural Nets, COCO dataset, GLoVe, YOLO


Extracting structured knowledge from texts has traditionally been used for knowledge base generation. However, other sources of information, such as images can be leveraged into this process to build more complete and richer knowledge bases. Structured semantic representation of the content of an image and knowledge graph embeddings can provide a unique representation of semantic relationships between image entities. Linking known entities in knowledge graphs and learning open-world images using language models has attracted lots of interest over the years. In this paper, we propose a shared learning approach to learn semantic attributes of images by combining a knowledge graph embedding model with the recognized attributes of images. The proposed model premises to help us understand the semantic relationship between the entities of an image and implicitly provide a link for the extracted entities through a knowledge graph embedding model. Under the limitation of using a custom user-defined knowledge base with limited data, the proposed model presents significant accuracy and provides a new alternative to the earlier approaches. The proposed approach is a step towards bridging the gap between frameworks which learn from large amounts of data and frameworks which use a limited set of predicates to infer new knowledge.

Cite This Paper

Ashutosh Kumar Tiwari, Sandeep Varma Nadimpalli, " Learning Semantic Image Attributes Using Image Recognition and Knowledge Graph Embeddings", International Journal of Image, Graphics and Signal Processing(IJIGSP), Vol.12, No.5, pp. 44-52, 2020.DOI: 10.5815/ijigsp.2020.05.05


[1]Xie, R., Liu, Z., Jia, J., Luan, H., Sun, M. Representation Learning of Knowledge Graphs with Entity Descriptions. In AAAI, 2016

[2]Kristiadi, A., Khan, M.A., Lukovnikov, D., Lehmann, J., Fischer, A. Incorporating Literals into Knowledge Graph Embeddings. CoRR, 2018 

[3]Goyal, P., Ferrara, E.Graph Embedding Techniques, Applications, and Performance: A Survey on Knowlege Based Systems,2018

[4]Ivana Balažević, Carl Allen, Timothy M. Hospedales. Hypernetwork Knowledge Graph Embeddings. arXiv:1808.07018, 2018

[5]Pezeshkpour, P., Chen, L., Singh, S. Embedding multimodal relational data for knowledge base completion. arXiv preprint arXiv:1809.01341,2018

[6]Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-CNN: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, 2015

[7]Robert Speer, Joshua Chin, and Catherine Havasi. Conceptnet 5.5: An open multilingual graph of general knowledge. In AAAI, pages 4444–4451, 2017

[8]Quan Liu, Hui Jiang, Andrew Evdokimov, Zhen-Hua Ling, Xiaodan Zhu, Wei, and Yu Hu. Probabilistic reasoning via deep learning: Neural association models. arXiv preprint arXiv:1603.07704, 2016.

[9]Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[10]Wang, Q., Mao, Z., Wang, B., Guo, L. Knowledge Graph Embedding: A Survey of Approaches and Applications. In TKDE, 2017

[11]Xie, R., Liu, Z., Chua, T.S., Luan, H.B., Sun, M.Image-embodied, knowledge, representation learning. In IJCAI, 2017

[12]Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition,2015

[13]Andrea Frome, Greg S. Corrado, Jonathon Shlens, Samy Bengio et al. DeViSE: A deep visual-Semantic embedding model. Advances in Neural Information Processing Systems 26, NIPS, 2013

[14]Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang et al. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, 2013

[15]Dong Li, Hsin-Ying Lee, Jia-Bin Huang, Shengjin Wang, and Ming-Hsuan Yang. Learning structured semantic embeddings for visual recognition. arXiv preprint arXiv:1706.01237, 2017.

[16]Vincent Lonij, Ambrish Rawat, and Maria-Irina Nicolae. Open-world visual recognition using knowledge graphs. arXiv preprint arXiv:1708.08310, 2017.

[17]Wikipedia contributors. 2018. Glove (machine learning) — Wikipedia, the free encyclopedia. [Online; accessed 18-December-2018].

[18]Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision,2014

[19]Joseph Redmon, Santosh Kumar Divvala, Ross B Girshick, and Ali Farhadi. You only look once: unified, real-time object detection. CoRR abs/1506.02640, 2015

[20]P. Druzhkov and V. Kustikova, A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognition and Image Anal., vol. 26, no. 1, p. 9, 2016.

[21]Jan Hendrik Hosang, Rodrigo Benenson, and Bernt Schiele.Learning-maximum suppression. In Conference on computer vision and pattern recognition, 2017

[22]Hamid Rezatofighi, Nathan Tsoi, JunYoung Gwak, Amir Sadeghian, Ian Reid, Silvio Savarese. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. Conference on Computer Vision and Pattern Recognition. 2019

[23]Joseph Redmon, Ali Farhadi. YOLO9000: Better, Faster, Stronger. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017