Stored data or information is generally structured to be accessed using an interface of a particular type. For example, web pages are generally structured using a markup language, such as the hypertext markup language (HTML). These web pages generally include HTML components that specify what type of HTML is displayed. The HTML components can include text boxes, buttons, tables, fields thereof, selectable links, and generally any type of HTML component that can be displayed by an internet browser.
Thus, some web pages utilize interactable components. Although these interactable web pages are typically accessed using a screen-based interface in a client-server arrangement, problems often arise when there is no screen-based interface, such as when there is only an audio interface to interact with those web pages. Many conventional voice systems used to access web pages are unable to interact with interactable components thereof, and instead are often limited to reading the text of only those web page already specified by users. Therefore, many conventional voice systems are unable to fully utilize web pages, and in particular web pages that are used to control a process or workflow.