More and more mobile applications have audio features when providing various functions to end users. For example, an application often deploy many locations on a graphical user interface for receiving user inputs in accordance with the rhythm from an audio file being played on a mobile device (e.g., a smartphone or a tablet computer). Based on the user interactions (e.g., finger touches) with the locations, the application provides a user-friendly environment. For example, in a game application that plays music concurrently, multiple locations are deployed at each image frame of the game application based on the music's rhythm so as to generate user inputs based on the user interactions with the deployed locations. Very often, a deployed location corresponds to a moving object that moves on the graphical user interface based on the music being played. More specifically, a specific location, which is associated with a particular frame of the application, is often mapped to a set of data in the audio file that generates the music. The set of data identifies the tempo of the frames of the application as well as the locations corresponding to the tempo. For example, the movement speed of a location may be measured and controlled by the cumulative time of rendering the frames of the application. In other words, if the cumulative time of the frames matches the falling time of a corresponding location defined in the data, the location is rendered on the graphical user interface. However, because the cumulative time of rendering the frames of the application is calculated on a per-frame basis, this approach would inevitably accumulate the errors associated with the respective frames. As a result, when the accumulated errors are great enough, there could be a time gap of multiple seconds between the music being played and the corresponding movement of the locations on the graphical user interface.