Voice Extensible Markup Language (VoiceXML) Technology and Geospatial Internet Applications
Peter R. DeCurtins
GE Network Solutions, Inc. 5600 Greenwood Plaza Blvd.
Englewood, CO 80111
Abstract
GIS technologies increasingly make extensive use of XML to facilitate seamless data
exchange between various and otherwise isolated computer systems. XML provides a
flexible, platform-independent way of transferring data over standard network protocols,
as well as an extensible mechanism that has enabled the development of numerous XMLbased
data exchange standards to assist communication between different systems. One
such derivative, the Voice Extensible Markup Language (VoiceXML) language, has as
its major goal to bring the advantages of web-based development and content delivery to
interactive voice response applications. It is designed for creating audio dialogs that
feature synthesized speech, digitized audio, recognition of spoken and DTMF key input,
recording of spoken input, telephony, and mixed-initiative conversations. In addition to
making data accessible to any number of diverse clients, including web browsers, Java
applications and Innovative Technologies devices, a VoiceXML enabled Internet application can
simultaneously distribute data to client platforms as ubiquitous as the ordinary phone.
The possibilities of leveraging an organization’s geospatial information in this manner
are only now beginning to be explored. This paper will briefly introduce VoiceXML
technology, examine some simple potential VoiceXML-enabled GIS applications, and
summarily discuss possible business advantages such an application may provide.
Telephony and the internet
Traditional Telephony
Computer Telephony is a term that usually is generically applied to any situation that
employs a general purpose computer-based architecture in configuration with
telecommunications technology. Traditional automated telephone services and call
processing systems have historically been created from scratch as separate applications,
requiring dedicated specific hardware, parallel infrastructure, and custom development
using very specific skills and languages such as C++ and Java.
An example of a common component found in professional telephony applications is the
high-end “voice card”, a type of PC expansion card that generally features specialized
microprocessors known as Digital Signal Processors (DSPs) which efficiently process
digitized signals such as audio and video (often called “media processing”). Voice cards
are usually connected to one another through a high-speed “voice bus” which transfers
real-time data between various components, and connected to a telephone system through
multiple analog or digital trunk interfaces.
Some common categories of telephony applications are Interactive Voice Response
(IVR), Voice Mail, Customer Call Center, and Audiotext Information Systems.
Web Telephony
Some technologies allow telephony architecture to be built on top of existing Internet and
Web-enabled infrastructure. With such tools, telephone interfaces can be added to web
applications with minimum effort, because the underlying layers remain unchanged. One
benefit of doing this is the utility of having a single application simultaneously serve data
to both telephone users and computer users. In addition, it may prove advantageous to
port some traditional legacy telephony applications to Web sites with telephone interfaces
in order to reduce costs and duplication of resources and effort.
The Voice Browser
An internet-based telephony architecture is enabled through a component known as a
“voice browser”. Analogous to the common graphical web browser, a voice browser is a
computer-hosted application that allows a user to receive information from and interact
with it through a telephone. Using this metaphor, the telephone serves as the browser’s
input/output device, much as does the mouse, keyboard, monitor and computer speaker
for a traditional graphical browser. The browser outputs audio prompts, which may be
pre-recorded, synthesized, or streamed from a dynamic source, and accepts input from
the user through either the phone’s microphone or keypad.
The voice extensible markup language (VoiceXML)
XML
The Extensible Markup Language (XML), which allows data to be stored in a selfdescribing
formatted text file, has rapidly become a popular standard for inter-application
data exchange. XML provides a flexible, platform-independent way of transferring data
over standard network protocols, as well as an extensible mechanism that has enabled the
development of numerous XML-based data exchange standards to assist communication
between different systems.
VoiceXML Standard
The Voice Extensible Markup Language (VoiceXML) standard has as its major goal to
bring the advantages of web-based development and content delivery to interactive voice
response applications. It is designed for creating audio dialogs that feature synthesized
speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken
input, telephony, and mixed-initiative conversations. More information about the
standard can be obtained from the W3C’s Voice Browser Working Group and the
VoiceXML Forum, which are included in the references below.