Transparent access to distributed Geographic Information Systems
Cy Smith State of Oregon - Geospatial Enterprise Office 955 Center Street, Room 470 Salem, OR 97301 Mike Walls PlanGraphics, Inc. 112 East Main Street Frankfort, KY 40601 Abstract The State’s data is distributed among a large variety of data repositories, managed by various agencies. Utilization of data from different networks generally requires agencies to download copies of the data. This leads to multiple copies, outdated versions of information, and data stewardship issues. Oregon has tested a network-based tool that will integrate data from several sources, and demonstrated that web-based applications can utilize this tool for data access without knowing data locations or formats. The Logistical Bottleneck Recently the State of Oregon contracted with PlanGraphics, Inc., and their technical partners Xmarc and ESRI to successfully complete a proof of concept application, named DIMOND - Digitally Integrated Mining of Oregon Networked Data. The multiple technologies involved worked together to provide a novel way of getting around the logistical bottlenecks of consolidating data from multiple sources in the enterprise. Utilization of data from different networks or Internet locations generally has required agencies to manually, or via automated FTP, download copies of the data to local network locations to be used by applications. This leads to multiple copies of the same information at different places, possible issues surrounding use of outdated versions of this information by applications, and a number of issues regarding data stewardship, ownership, and management. Common problems encountered when attempting to use data in distributed locations include:
Many agencies and organizations have considered placing their collective enterprise data in a centralized data set in advance of queries, so the data set would be ready to support the agencies’ fast-breaking, critical operational needs. Many agency operational functions require that data and information be shared across agency system boundaries in a rapid, transparent manner. Repositories, operational data stores, and data warehouses are each a different technical approach to meeting this need. The DIMOND proof of concept introduces another. Traditionally, one of the biggest obstacles to building a centralized repository of data for an enterprise is the logistical effort involved in updating the contents and keeping them internally consistent. Finding a solution to this problem is often feasible for a one-time policy study, but has proven difficult to set up as an ongoing operation. Recent advancements in network oriented data management technologies have provided some options not previously available. For this project, the technical team was able to implement a proof of concept for a “virtual data warehouse” in which Web-based middleware eliminated the need for centralizing a data repository, along with the associated updating and consistency problems. The Dimond Project The key to the DIMOND project was the elimination of the requirement to physically centralize the data. Instead, a virtual repository was created using middleware tools from Xmarc, Inc. and Oracle 8i Spatial to manage metadata. The Web was used to display answers to a user’s query that is executed on their desktop machine, thus eliminating the need for a repository. The proof of concept that was developed for this project had to meet the following requirements:
Users start a session by pointing their Web browser to the URL of the opening screen. This downloads the core of the Proof of Concept application -- a custom-written Java applet. As the user navigates through the application, it appears to them that they are querying a set of GIS tables and layers in a traditional client-server application. Instead, they are interacting with a middleware servlet that looks up in Oracle tables the accessible metadata for each data set being used. The middleware then initiates a connection to a series of translator servlets located at each of the data provider sites, which pass the query through and package the response. The middleware servlet consolidates all of the responses into a Web page that the user sees. Each of these interactions was coded by and managed using standard Java techniques. Figure 1 indicates the overall technical architecture used in the DIMOND project. ![]() Figure 1: System Architecture—DIMOND Proof of Concept Applets Xmarc’s Enterprise Spatial Java Map product (ESJMap) was utilized in this project to produce the interface for the virtual data warehouse. The interface was created using applets coded in Java and compiled using Sun’s JDK 1.1.8 product for maximum browser compatibility. The applets can be viewed in a standard browser window with Java enabled, such as Internet Explorer or Netscape. The Xmarc ESJMap Applications Programming Interface (API) consists of hundreds of Java classes, bundled in one JAR file (ESJMap.jar) as a Java standard means of packaging pre-defined class definitions for distribution and reuse. The main components of the API consist of a map window for displaying spatial data, a legend for displaying information about layers in the map window, and a toolbar for manipulating data in the map window, each of which were utilized in the proof of concept project. Servlets and Associated Components The applet uses Xmarc servlets designed to access Oracle spatial metadata and specific data formats. The ESM servlet, for accessing metadata, was installed on the DIMOND server, while the various data translator servlets were installed on the remote server machines from which the data will be accessed. Enterprise Spatial Manager Xmarc’s Enterprise Spatial Manager (ESM) GUI allows configuration of Oracle Spatial data as metadata stored in an Oracle schema. The applet accessed this metadata via a FireRender servlet instantiated on the DIMOND server. The metadata tells the applet how to communicate with each of the data translators, and contains information on such connection properties as data layers, display scales, and symbology. ESM can also store detailed metadata about any data managed in an Oracle Spatial database, whether in the same database as ESM or a remote database at a data provider’s site. Once the metadata is read, spatial data are loaded into the applet via a FireStation servlet designed to read the data stored in the Oracle database. Oracle8i is used to store the data used in ESM. Data Translators Data that reside across the Internet on remote servers were accessed via a proxy servlet on the DIMOND server that redirected data requests from the applet to the appropriate server location. Metadata about these locations were maintained in an Oracle table attached to the ESM metadata. After the data is requested from the applet, an Xmarc FX translator returns data to the applet as a binary stream using HTTP 1.0. The data stream is encoded using Xmarc’s HOSE (Heterogeneous Object Stream Encoding) protocol. The FX translators, which were installed on the remote servers, are command-driven programs that convert non-Xmarc spatial data into Xmarc Entity Import Module (EIM) data streams. EIM data is an Xmarc interchange format describing graphic entities with or without nongraphic metadata. All FX translators share a common command set and a variety of deployment options. An FX translator is a program written using C/C++ to access data management tool APIs. These translators adhere to a standard methodology that rationalizes Xmarc’s vector translators. The result is that an applet can interact with any Xmarc translator in a consistent manner, without knowing any details of the implementation. The FX translator servlets supplied by Xmarc were installed on the remote servers as multiple-client services to serve the data to the applet. The servlets are one of three types—1) file-based, where data is centralized in self-contained data files (i.e., AutoCAD, Intergraph, Shape, MapInfo); 2) coverage-based, where data is spread across multiple file hierarchies (i.e., ArcInfo, VPF); or, 3) spatial database, where spatial data and metadata are stored and managed within object-relational database management systems (e.g., Oracle, Informix Universal Server). Three translators were used in the proof of concept—FX8i, FXShape, and FXArc. Each of the translators was installed and initialized as a service on the remote server machine. Specific commands are given when initializing the service, such as service name, host name, and port. Data Usage The data used by the application need not be housed in a central location or even in the same file format. Instead, each query is processed in real time by the master copy of the data at the custodian agency. For the proof of concept, data usage is highlighted in Table
Benefits of Implementation Several significant benefits exist in the technical implementation of the Xmarc technologies for the DIMOND proof of concept. The more important ones are discussed here. Java Security Model The DIMOND concept primarily relies on real-time, on-demand access to source data. This is in contrast to a replication scheme in which a copy of source data is made on a regular schedule, to be used for queries and analysis. This virtual approach requires a three-tier architecture. Users need only have a standard Web browser on their machines and access to the Internet. The application with which they interact resides on the DIMOND server and is a Java servlet developed using a mix of standard Internet utilities and proprietary Xmarc libraries. The underlying security model is standard Java. Often referred to as “the sandbox,” this model forces applets into a highly restrictive run-time environment where they can do minimal damage to the machine on which they execute. The combination of default security and the ability to lock a servlet to a particular URL and port means an application can safely be deployed without extensive security testing and review. The data server side of this three-tier environment is where the innovative nature of the virtual warehouse comes in. The applet may only make requests to its server, but a DIMOND proxy servlet initiates a request to the designated data provider’s server via the Internet, based on what the user has requested. This request is sent to a designated URL and port on the data source system. There, the port has been dedicated to a data translator applet developed by Xmarc, and this in turn has read-only access to data of a particular type in a specified directory. The translator servlet interprets the request and searches the directory to which it has access for the requested data. If the servlet does not recognize the request or cannot find the requested data, it handles the event as an error. If it finds the requested data, it packages the resulting data set in the proprietary Xmarc transfer format (EIM) and returns it to the calling DIMOND servlet. This servlet then streams the result to the calling applet. For example, let us suppose that DIMOND included some data on the PlanGraphics server “ZEUS” (to use a neutral example). Specifically, we are to provide a service area data set “SRVC” in ArcInfo coverage format and a project boundary “PROJECT” in shape file format. Both of these are located in a directory, “C:\DIMOND,” which is an ArcInfo workspace. We would need to install two translator servlets, one for each type of data, in the C:\DIMOND directory. The FXARC translator could see SRVC and any other coverages in the DIMOND workspace but could not see the shape file in the same location. Nor could it see coverages elsewhere on the ZEUS server. To access the shape file, we would need a copy of the FXSHAPE translator. Parameterized Interaction with Data Sources The use of a table in Oracle to store parameters necessary to connect to each translator at a data provider system means that the system is expandable and easily administered after translators are installed. The system administrator need only write changes to the parameter table to instantly update all users of the application to events such as relocation of a data set on a service provider’s system. Minimal Administrative Effort by Data Providers The use of translator servlets minimizes the administrative effort required of systems administrators at the data provider site. Although the initial configuration of the servlet is not as straightforward as could be desired, once set up, the data provision process requires minimum attention. Required tasks vary with the system configuration, but generally are limited to:
The biggest problems encountered in completing data access for DIMOND were not technical, but institutional. The major problem areas are discussed here. Technical Readiness The level of technical readiness of the data-providing agency is a significant issue. Lack of knowledge about and experience with Web services means that the potential data providing agency (a) will have difficulty making a realistic assessment of what they are being asked to participate in, and (b) will require outside support to acquire and set up the necessary services. It should be noted that the technical requirements mirror typical Web-based implementations. Security Concerns Several agencies agreed to cooperate in principle as data providers for DIMOND, but expressed varying degrees of concern about the security of their systems. These are legitimate business concerns, which can be addressed technically and which must be planned for in developing a work plan for any production implementation of this technology. Education of the technical staff and of management at the data-providing agency will help to relieve these concerns. As previously discussed, the systems architecture has substantial security built in through the use of Java-based technology. Level of Cooperation It must be noted that the level of cooperation of a data provider agency cannot simply be assumed. The IT staff of the provider may well feel that they are being placed in a position of enhanced risk for no potential reward, since they will have to do extra work to accomplish a task not on their priority work plan and since they will be blamed if anything goes wrong. Thus, the organization may express a willingness to cooperate without ever carrying through with the effort. Impact on Operations It is also important to note that there can be some impact on IT operations for the data providing agency, should the demand for data ever become strong enough. The Xmarc translator servlets are small footprint applications that do not demand significant system resources individually. However, if numerous concurrent users begin hitting the data provider systems, performance will begin to suffer. Several low-cost steps can be taken should the data provider role become too onerous. These include:
The middleware product from Xmarc eliminated the need for centralized data. By consolidating all the responses from various data provider sets into a user-friendly web page, no logistical bottlenecks were experienced and DIMOND was a successful proof of concept. The collaboration between PlanGraphics’, Xmarc, ESRI, and the State of Oregon proved that it is technically feasible to build a virtual data warehouse for Webbased interactions. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|