In the field of computer vision, conventional methods and systems used in image/video analysis and multimodal analysis for determining semantic information from an image for classifying the image are not always optimal and may be computationally intensive.
The listing or discussion of any prior-published document or any background in this specification should not necessarily be taken as an acknowledgement that the document or background is part of the state of the art or is common general knowledge. One or more aspects/examples of the present disclosure may or may not address one or more of the background issues.