GISdevelopment.net ---> GITA 2003 ---> Mobile

Voice Extensible Markup Language (VoiceXML) Technology and Geospatial Internet Applications

Peter R. DeCurtins
GE Network Solutions, Inc. 5600 Greenwood Plaza Blvd.
Englewood, CO 80111


Abstract
GIS technologies increasingly make extensive use of XML to facilitate seamless data exchange between various and otherwise isolated computer systems. XML provides a flexible, platform-independent way of transferring data over standard network protocols, as well as an extensible mechanism that has enabled the development of numerous XMLbased data exchange standards to assist communication between different systems. One such derivative, the Voice Extensible Markup Language (VoiceXML) language, has as its major goal to bring the advantages of web-based development and content delivery to interactive voice response applications. It is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. In addition to making data accessible to any number of diverse clients, including web browsers, Java applications and mobile devices, a VoiceXML enabled Internet application can simultaneously distribute data to client platforms as ubiquitous as the ordinary phone. The possibilities of leveraging an organization’s geospatial information in this manner are only now beginning to be explored. This paper will briefly introduce VoiceXML technology, examine some simple potential VoiceXML-enabled GIS applications, and summarily discuss possible business advantages such an application may provide.

Telephony and the internet

Traditional Telephony
Computer Telephony is a term that usually is generically applied to any situation that employs a general purpose computer-based architecture in configuration with telecommunications technology. Traditional automated telephone services and call processing systems have historically been created from scratch as separate applications, requiring dedicated specific hardware, parallel infrastructure, and custom development using very specific skills and languages such as C++ and Java.

An example of a common component found in professional telephony applications is the high-end “voice card”, a type of PC expansion card that generally features specialized microprocessors known as Digital Signal Processors (DSPs) which efficiently process digitized signals such as audio and video (often called “media processing”). Voice cards are usually connected to one another through a high-speed “voice bus” which transfers real-time data between various components, and connected to a telephone system through multiple analog or digital trunk interfaces.

Some common categories of telephony applications are Interactive Voice Response (IVR), Voice Mail, Customer Call Center, and Audiotext Information Systems. Web Telephony Some technologies allow telephony architecture to be built on top of existing Internet and Web-enabled infrastructure. With such tools, telephone interfaces can be added to web applications with minimum effort, because the underlying layers remain unchanged. One benefit of doing this is the utility of having a single application simultaneously serve data to both telephone users and computer users. In addition, it may prove advantageous to port some traditional legacy telephony applications to Web sites with telephone interfaces in order to reduce costs and duplication of resources and effort.

The Voice Browser
An internet-based telephony architecture is enabled through a component known as a “voice browser”. Analogous to the common graphical web browser, a voice browser is a computer-hosted application that allows a user to receive information from and interact with it through a telephone. Using this metaphor, the telephone serves as the browser’s input/output device, much as does the mouse, keyboard, monitor and computer speaker for a traditional graphical browser. The browser outputs audio prompts, which may be pre-recorded, synthesized, or streamed from a dynamic source, and accepts input from the user through either the phone’s microphone or keypad.

The voice extensible markup language (VoiceXML)

XML
The Extensible Markup Language (XML), which allows data to be stored in a selfdescribing formatted text file, has rapidly become a popular standard for inter-application data exchange. XML provides a flexible, platform-independent way of transferring data over standard network protocols, as well as an extensible mechanism that has enabled the development of numerous XML-based data exchange standards to assist communication between different systems.

VoiceXML Standard
The Voice Extensible Markup Language (VoiceXML) standard has as its major goal to bring the advantages of web-based development and content delivery to interactive voice response applications. It is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. More information about the standard can be obtained from the W3C’s Voice Browser Working Group and the VoiceXML Forum, which are included in the references below.

As was stated above, voice browsers allow users to interact with telephony applications in much the same way traditional graphical browsers allow users to interact with standard web applications. By extension, VoiceXML is analogous to HTML, the format that developers use to support graphical browsers. VoiceXML is a language used to build web applications for the telephone.


VoiceXML Browser
A typical VoiceXML browser runs on a workstation with voice cards serving as trunk interfaces and media processors. An IP network interface card (NIC) connects the platform to a network which itself may be connected to an intranet or internet.

Voice Forms
Voice forms, or voice dialogs, are the essential building blocks of a VoiceXML application. A form consists of a set of prompts and fields, which serve to collect input from the user. Once a form has been filled, the values will be submitted to the web/application server by the voice browser. This is very similar to the common HTML form used in traditional web based applications.

Input/Output
User input is provided through speech and Dual-Tone Multi-Frequency (DMTF) or “touch-tone” key input. In order to be used by an application, speech input will either be digitally recorded to be preserved as raw input, or, more likely, processed using voice recognition technology. In order to define acceptable speech for input to a voice dialog, VoiceXML requires the definition of what is known as a “speech grammar”. Likewise, audio output must be provided in one of three ways; through the playback of pre-recorded digital material, from a live, dynamically generated source through streaming technology, or by generating synthetic audio from textual input which is often termed “Text-to-Speech” (TTS).

Hyperlinks and Client-Side Scripting
Two other important features of VoiceXML are its support for moving to a new voice dialog and its ability to specify scripts that the voice browser will process prior to submitting the form results to the web server. Hyperlinks, which are quite similar to their HTML based counterparts, simply direct the execution of the form to a different voice dialog on the same or a different VoiceXML page. VoiceXML’s implementation of ECMAScript (more commonly known as JavaScript) allows the voice dialog to perform conditional logic and decision-making steps for purposes such as controlling program execution or data-validation.

Geospatial VoiceXML Applications
Currently, mobile phones and VoiceXML technology have been combined to provide users with some location-enabled tools that have the ability to perform tasks such as finding the nearest restaurant or other point of interest based on supplied location information. Location-aware phone units and GPS-equipped units will make this type of application even more useful and convenient as time goes forward. It’s a fairly simple process to provide a VoiceXML front-end to already existing applications to provide voice access to an enterprise’s database or services, and no doubt this is being done at some scale already.

In the utility industry, issues such as work management, scheduling and dispatch, and resource management often involve both voice-based and field/mobile client delivery mechanisms. Using standard telephone equipment and network distributed thin-client applications, employees and associates can use VoiceXML applications to access information and report activities. The collection of data from the field through simple voice-activated commands or direct spoken input will greatly facilitate improved productivity and accuracy.

The impact that VoiceXML will have on geospatial internet and intranet applications in the coming years is impossible to predict, but most likely will be quite significant. The ability for mobile users to access sophisticated applications and enterprise databases through the phone will allow a massive leveraging of spatial resources. The cost of implementing automated and interactive voice systems should be lowered: the ready availability of skill sets and resources necessary to provide voice-based applications will be increased. The fantastic capability of internet technology to distribute enterprise information more efficiently and effectively will be magnified by VoiceXML’s ability to extend its benefits beyond traditional desktop and mobile clients. There are far more phones in the world than there are computers: phones are everywhere. The explosion of wireless phones, particularly outside of North America, has created the most pervasive potential client platform in history. VoiceXML will allow any ordinary telephone to be used to access Internet services regardless of their physical location, and will be a great benefit to anyone with visual or manual impairments, as well as anyone who needs to access an application while keeping their hands and eyes on other tasks, such as driving or operating a piece of machinery.

VoiceXML brings the advantages of web-based development and content delivery to interactive voice response applications, and is already significantly affecting industries and sectors such as transportation, banking and media. The impact it will have on the geospatial industry will no doubt be felt for many years to come.

References
Edgar, B. The VoiceXML Handbook–Understanding And Building The Phone Enabled Web (New York: CMP Books, 2001)

www.xml.org and www.w3.org/xml - information on XML

www.voicexml.org and www.w3.org/Voice - information on VXML

© GISdevelopment.net. All rights reserved.