This repository has been archived by the owner on Oct 9, 2023. It is now read-only.
Extending the Question Answering Models to Visual Question Answering #1382
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
won't fix
This will not be worked on
🚀 Feature
Extending the idea of Question Answering to Visual Question Answering
Motivation
I was going through the example and was interested in using transformers for the purpose of Visual Question Answering (could not find many resources related to the same as code), so I thought of contributing my own implementation (implemented in PyTorch), for the same. I believe that the implementation is simple enough to be quickly able to fine-tune on any dataset with ease.
Pitch
I am not sure about how to pitch, but I have managed to implement the model and get fair results on the same model. I want to extend the applicability of the model for any dataset and since this is a multi-modal model, it would be helpful for the research community as well
Alternatives
Not sure about it, since this is a model contribution.
Additional context
Here is the implementation for the same here
What does this implementation contain?
The text was updated successfully, but these errors were encountered: