Application of ensemble Learning in visual question-answering
Loading...
Date
Journal Title
Journal ISSN
Volume Title
Publisher
University of M'sila
Abstract
Visual Question Answering (VQA) is a field that combines two different techniques:
computer vision and natural language processing. Computer vision is used to process the
image or video, and NLP uses for the processing of natural language. VQA is a technology
that automatically answers the question based on the context of images or videos. The VQA is
one of the Vision-language tasks that require a high level of language and image
understanding, making this a difficult and complex problem. In this dissertation, we explore
and apply an ensemble of diverse VQA models combined with Weighted Average techniques
to increase the accuracy.