Using Natural Language Inference to Perform Visual Inference: the Case of Quantified Noun Phrases

Publication date

DOI

Document Type

Master Thesis

Collections

Open Access logo

License

CC-BY-NC-ND

Abstract

Evaluation of quantities in visual data remains one of the biggest challenges in the area of Visual Inference. We explore a novel approach to reasoning about quantities in visual contexts using the tools of Natural Language Inference, working with textual descriptions of visual scenes. Based on a complete description of a simple geometrical scene, we try to predict if a quantified statement about objects in this scene follows from the description. We test an LSTM-based neural network architecture on this task and examine the generalization ability of the model.

Keywords

Citation