Autonomous agents (e.g., vehicles, robots, drones, etc.) and semi-autonomous agents may use object detection models, such as a trained convolutional neural network (CNN), to identify objects of interest in an image. For example, a CNN may be trained to identify and track objects captured by one or more sensors, such as light detection and ranging (LIDAR) sensors, sonar sensors, red-green-blue (RGB) cameras, RGB-depth (RGB-D) cameras, and the like. The sensors may be coupled to, or in communication with, a device, such as the autonomous agent.
To improve machine learning models, such as the object detection model, human analysts may analyze the accuracy of the machine learning models used in real-world scenarios. In conventional systems, to determine a model's accuracy, the analysts analyze a video and the corresponding classification data. The analysts may also analyze the raw sensor data. In most cases, the video is a high resolution video. To facilitate the analysis, the agent transmits both the video and the classification data to a remote server.
Due to the limited bandwidth of wireless networks, conventional autonomous agents do not transmit the video and the classification data via a wireless network while an agent is active in an environment. Rather, conventional agents upload the video and the classification data to the server when the autonomous agent is at a location, such as a garage at a home, with a high bandwidth connection, such as a fiber-optic Internet connection. As such, there is a delay between a time when the classification data and the video are generated and a time when the classification data and video are uploaded to the server. It is desirable to stream classification data and videos to the server, via a wireless network, while the autonomous agent is operating in a real-world environment.