As the quality and resolution of cameras integrated on mobile communication devices are improving, many users are relying on the mobile communication devices as a primary means for taking pictures and/or videos. With such increasing use of the mobile communication devices for media related activity, image browsing has also become a common trend among the users of the mobile communication devices. During image browsing a user may view a scene from different angles and may want to have an image of the scene from the preferred angle. However, it is challenging for the user to have the exact image either because the user is not physically present at the prescribed location or it is hard for the user to request for a precise image capture in a verbal or a non-verbal manner. Further, it is demanding to describe the exact camera position without using the global camera pose and it is rather difficult for the other user capturing the image to follow the requirements for taking the exact image that conforms to the specification of the requester. Accordingly, service providers and device manufacturers (e.g., wireless, cellular, etc.) face significant technical challenges in providing a service that generates a request for capturing at least one media item based on the camera pose information preferred by at least one user.