Data Models for Object-Component Technology
Geographic Data Modeling BackgroundDr David J. Maguire Director of Products, ESRI Dr David J. Maguire Director of Products, ESRI dmaguire@esri.com Steve Grisé Product Manager, ESRI sgrise@esri.com 380 New York Street Redlands, California, USA 92373 A data model is a set of constructs for describing and representing parts of the real world in a computer system. Data models are vitally important to GIS because they control the way that data are stored and have a major impact on the type of analytical operations that can be performed. Early GIS were based on CAD, simple graphical and image data models. In the 1980s and 1990s the hybrid geo-relational model came to dominate GIS. In the last few years major software systems have been developed based on more advanced and standardized geographic object data models that include elements of all earlier models. Data Model; Application
What is a Data Model? The heart of any GIS is the system-level data model, which is a set of constructs for representing objects and processes in the digital environment of the computer (Figure 2). People interact with operational GIS in order to perform tasks like making maps, querying databases and performing site suitability analyses. Because the types of analyses that can be undertaken are strongly influenced by the way the real world is modeled, decisions about the type of model to be adopted are vital to the success of a GIS project. Geographic reality is infinitely complex, but computers are finite. Therefore, difficult choices have to be made about what and how things are modeled. Because different types of people use GIS for different purposes, and the type of phenomena people study have different characteristics, there is no single type of data model that is best for all circumstances. ![]() Figure 2: The role of a data model in GIS. Levels of Data Model Abstraction When representing the real world in a computer, it is helpful to think in terms of the four different levels of abstraction (levels of generalization or simplification) that are shown in Figure 3. First, reality is made up of real world phenomena (buildings, streets, wells, lakes, people, etc.), and includes all aspects that may or may not be perceived by individuals, or deemed relevant to a particular application. Second, the conceptual model is a human-oriented, often partially structured, model of selected objects and processes that are thought relevant to a particular problem domain. Third, the logical model is an implementation-oriented representation of reality that is often expressed in the form of diagrams and lists. Fourth, the physical model is a slightly misleading term for the digital abstraction that portrays the actual application in a computer system, and often comprises tables stored as files or databases. Use of the term physical here is actually misleading because the models are not actually physical, but only exist digitally in computers. In data modeling, users and system developers participate in a process that successively engages with each of these levels. The first phase of modeling begins with a definition of the main types of objects to be represented in the GIS and concludes with a conceptual description of the main types of objects and relationships between them. Once this phase is complete, further work will lead to the creation of diagrams and lists describing the names of objects, their behavior, and the type of interaction between objects. This type of logical data model is very valuable for defining what a GIS will do and the type of domain over which it will extend. Logical models are implementation independent, and can be created in any GIS with appropriate capabilities. The final data modeling phase involves creating a model showing how the objects under study can be digitally implemented in a GIS. Physical models describe the exact files or database tables used to store data, the relationships between object types and the precise operations that can be performed. ![]() Figure 3: Levels of abstraction relevant to GIS data models. A data model provides system developers and users with a common understanding and reference point. For developers a data model is the means to represent an application domain in terms that may be translated into a design and implementation of a system. For users, it provides a description of the structure of the system, independent of specific items of data or details of the particular application. Object-Component Technology An ‘object-component’ approach has advantages over an ‘object’ approach for AM/FM/GIS because it adds a component-based framework for developers to extend the system. In the basic object model approach only the original AM/FM/GIS vendor has complete customization capabilities. Perhaps the most complex aspect of basic object systems is the eventual need for cut-and-paste reuse on large implementation projects, complicating the process of system upgrades, which ultimately increases lifecycle costs. With the object-component model, users extend the system with exactly the same technology as the AM/FM/GIS vendor programmer. As a consequence, users have more options and their objects will perform just as well. To the user, there is absolutely no difference between the AM/FM/GIS vendor supplied objects and the custom objects. Object-Oriented Concepts in GIS An object is a self-contained package of information describing the characteristics and capabilities of an entity under study. In a geographic object data model the real world is modeled as a collection of objects and the relationships between the objects. Each entity in the real world to be included in the GIS is an object. A collection of objects of the same type is called a class. In actual fact classes are a more central concept than objects from the implementation point of view. A class can be thought of as a template for objects. When creating an object data model the data model designer specifies classes and the relationships between classes. Only when the data model is used to create a database are objects (instances or examples of classes) actually created. Examples of objects include oil wells, soil bodies, stream catchments and aircraft flight paths. In the case of an oil wells class, each oil well object might include properties defining its state – annual production, owner name, date of construction and type of geometry used for representation at a given scale (perhaps a point on a small scale map and a polygon on a large scale one). The well class could have connectivity relationships with a pipeline class that represents the pipeline used to transfer oil to a refinery. There could also be a relationship defining the fact that each well must be located on a drilling platform. Finally, a well object might have methods or behavior defining what it can do. Example behavior might include how to draw itself on a computer screen, how to create and delete itself, and editing rules about how the wells snap to pipelines. There are three key facets of object data models that make them especially good for modeling geographic systems: encapsulation, inheritance, and polymorphism. Encapsulation describes the fact that each object packages together a description of its state and behavior. The state of an object can be thought of as its properties or attributes (e.g. for a transformer object it could be the phase designation, rated kV, and unique transformer number). The behavior is the methods or operations that can be performed on an object (for a transformer object these could be create, delete, draw, query, split and merge). For example, when splitting a conductor into two parts, it is useful to get the GIS to automatically calculate the lengths of the two new parts. Combining the state and behavior of an object together in a single package is a natural way to think of geographic entities, and a useful way to support the re-use of objects. Inheritance is the ability to re-use some or all of the characteristics of one object in another object. For example, in a gas facility system overwriting or adding a few properties or methods to a similar existing type of valve could easily create a new type of gas valve. Inheritance provides an efficient way to create models of geographic systems by reusing objects and also a mechanism to extend models easily. New object classes can be built to reuse parts of one or more existing object classes and add some new unique properties and methods. Polymorphism describes the process whereby each object has its own specific implementation for operations like draw, create and delete. One example of the benefit of polymorphism is that a geographic database can have a generic object creation component that issues requests to be processed in a specific way by each type of object class factory (e.g. gas pipes, valves and service lines). Object class factory is the term given to the software module used to create new objects of a given class. This mechanism will work for a new object class because the new class is responsible for implementing the object creation method. The central focus of an object data model is the collection of geographic objects and the relationships between the objects. Each geographic object is an integrated package of geometry, properties and methods. As such it is a software representation of the extent and function of the 'geographic individuals'. In the object data model, geometry is treated like any other attribute of the object and not as its primary characteristic. Geographic objects of the same type are grouped together as object classes, with individual objects in the class referred to as instances. In modern GIS software systems each object class is stored in the form of a database table, with each row an object and each property a column. The methods that apply are attached to the object instances when they are created in memory for use in the application. All geographic objects have some type of relationship to other objects in the same object class and, possibly, to objects in other object classes. Some of these relationships are inherent in the class definition (for example, a topologic polygon data set will not have overlapping polygons) while other interclass relationships are user-definable. Three types of relationships are commonly used in geographic object data models: topologic, geographic and general. Topologic relationships are generally built into the class definition. For example, modeling real world entities as a network class will cause network topology to be built for the nodes and lines participating in the network. Geographic relationships between object classes are based on geographic operators (such as overlap, adjacency, inside, and touching) that determine the interaction between objects. In a model of an agricultural system, for example, it might be useful to ensure that all farm buildings are within a farm boundary using a test for geographic containment. Geographic relationships are useful to define other types of relationship between objects. In a parcel management system, for example, it is advantageous to define a relationship between land parcels (polygons) and ownership data that is stored in an associated tabular DBMS table. Similarly, an electric distribution system relating light poles (points) to text strings (called annotation) allows depiction of pole height and material of construction on a map display. This type of information is very valuable for creating work orders (requests for change) that alter the facilities. Establishing relationships between objects in this way is useful because if one object gets moved then the other will move as well, or if one is deleted then the other is also deleted. This makes maintaining databases much easier and safer. In addition to supporting relationships between objects (strictly speaking, between object classes), object data models also allow several types of rules to be defined. Rules are a valuable means of maintaining database integrity during editing tasks because they enforce validation constraints. The most popular types of rules used in object data models are attribute, connectivity, relationship and geographic. Attribute rules are used to define the possible attribute values that can be entered for any object. Both range and coded value attribute rules are widely employed. A range attribute rule defines the range of valid values that can be entered. Examples of range rules include: highway traffic speed must be in the range 25-70 miles (40-120 km) per hour; forest compartment average tree height must be in the range 0-50 meters. Coded attribute rules are used for categorical data types. Examples include: land use must be of type commercial, residential, park or other; or pipe material must be of the type steel, copper, lead, or concrete. Connectivity rules are based on the specification of valid combinations of features, based on the geometry, topology and attribute properties. For example, in an electric distribution system a 28.8 kV conductor can only connect to a 14.4 kV conductor via a transformer. Similarly, in a gas distribution system it should not be possible to add pipes with free ends (that is, with no fitting or cap). Geographic rules define what happens to the properties of objects when they are split or merged. In the case of a land parcel that is split following the sale of part of the parcel it is useful to define rules to determine the impact on properties like area, land use code, and owner. In this example, the original parcel area value should be split in proportion to the size of the two new parcels, the land use code should be transferred to both parcels, and the owner name should remain for one parcel, but a new one should be added for the part that was sold off. In the case of a merge of two adjacent water pipes decisions need to be made about what happens to attributes like material, length and corrosion rate. In this example, the two pipe materials should be the same, the lengths should be summed, and the new corrosion rate determined by a weighted average of both pipes. Geographic Data Modeling in Practice A utility information system is only as good as the geographic database on which it is based and a geographic database is only as good as the geographic data model from which it was developed. Geographic data modeling begins with a clear definition of the project goals and progresses through an understanding of user requirements, a definition of the objects and relationships, formulation of a logical model and then creation of a physical model. These steps are a prelude to database creation and, finally, database use. No step in data modeling is more important than understanding the purpose of the data modeling exercise. This understanding can be gained by collecting user requirements from the main users. Initially, user requirements will be vague and ill defined, but over time they will become clearer. Project goals and user requirements should be precisely specified in a list or narrative. Formulation of a logical model necessitates identification of the objects and relationships to be modeled. Both the attributes and behavior of objects are required for an object model. A useful graphic tool for creating logical data models is a CASE tool and a useful language for specifying models is UML. It is not essential that all objects and relationships be identified at the first attempt, because logical models can be refined over time. Once an implementation-independent logical model has been created, this model can be turned into a system-dependent physical model. A physical model will result in an empty database schema – a collection of database tables and the relationships between them. Sometimes, for performance optimization reasons or because of changing requirements, it is necessary to alter the physical data model. Even at this relatively late stage in the process, flexibility is still important. It is important to realize that there is no such thing as the 'correct' geographic data model. Every problem can be represented with many possible data models. Each data model is designed with a specific purpose in mind and is sub-optimal for other purposes. A classic dilemma is whether to define a general-purpose data model that has wide applicability, but that can, potentially, be complex and inefficient, or to focus on a narrower highly optimized model. A small prototype can often help resolve some of these issues. Geographic data modeling is both an art and a science. It requires a scientific understanding of the key geographic characteristics of real world systems, including the state and behavior of objects, and the relationships between them. Geographic data models are of critical importance because they have a controlling influence of the type of data that can be represented and the operations that can be performed. As we have seen, object models are the best type of data model for representing rich object types and relationships in facility systems, whereas simple feature models are sufficient for very elementary applications such as a map of the body. In a similar vein raster models are good for field-based data such as soils, vegetation, pollution and population counts. Why Develop Industry-Oriented Data Models? At first glance the importance of domain-specific data models for object-component technology may not be apparent. After all, the fundamental benefits of object-oriented technology stem from bringing data and behavior together. But the development of practical, common data models is important. The intent is not that every project will adopt exactly the same data model, but that a common, essential data model can be captured, shared, and maintained by user communities. An example for electric networks is shown in Figure 4.
Figure 4: A portion of a data model template developed for electric distribution networks Standards for data models are important to utilities for several reasons. First, developing data models for conversion and migration is an important first step in any GIS project. The time spent developing data models at the front end of a project directly affects the duration of the project. As conversion progresses, data model changes can result in significant cost and schedule impacts. Therefore, a carefully considered data model can reduce project cost, duration, and risk. Since data costs are typically greater than 70% of direct cash expenditures on GIS projects the cost savings may be significant. Second, the complexity of developing a data model design is a barrier to the adoption of GIS technology. In the past many proprietary data models have been developed on specific projects and then sold to other utilities. Inevitably the cost of these models is a barrier, leaving some project teams with budget constraints to develop in-house models, diverting valuable resources away from other success-oriented activities. In other cases this purchase cost for a proprietary model is too high, and internal resources are frequently not available. The net result in these situations is that the lack of a data model prevents the adoption of GIS technology. Third, common data models provide a platform for the development of applications and solutions. The data model combined with an object-component framework permits the development and interoperability of functional capabilities. A common data model can support the proliferation of tools and solutions for utility users. The result is more choice and lower lifecycle cost for enterprise project implementation. So in many ways we achieve the benefits of object technology through a data model combined with a component based architecture. The benefits of object-component technology are just beginning to be realized by AM/FM/GIS users. An important aspect to common data models is that there are internal and external views of the data. For instance, a utility’s internal, or producer, view of the database involves extensive attribution and information that would not be appropriate to share with external users. From an external standpoint, most consumers of spatial data are only concerned about the basic location and simple attributes for utility networks. Key to the successful implementation of common data models is consistency in the external presentation of data to support the development of national and global datasets. Supporting technologies must allow the maintenance of the producer model and the publication of consumer models to multiple internal and external consumers of GIS data. Process The process for the development of industry-oriented data models requires involvement from many users and domain experts from a given user community. In general, most national and global GIS standards today provide high-level guidance for implementation but are reliant on local solutions to the development of datasets. So while these standards are valuable, the key to developing these datasets is practical guidelines for implementation that involve specific technologies and strategies for sharing data. The process involves gathering input from many existing projects and discovering the common or essential model based on experience and typical data requirements. One way to think about the essential model is to imagine a very basic municipal mapping application to track the location and characteristics of utility assets. From this base each utility can add its own specific data requirements and build up object behavior and integration architectures to provide enterprise-level solutions. Key elements of the data modeling process include:
What does this mean for utilities? The bottom line for utilities is a transformation in how projects are developed. A summary of the key differences for utilities include:
Looking ahead to future AM/FM/GIS projects, these new approaches will create a new set of technical issues for information technology projects. Each of the following topics will become important for utilities, and project teams need to consider these issues now so that they will be prepared to take advantage of these emerging trends. Utilities have unique requirements for emergency management. In the future utilities will rely heavily on GIS infrastructure records in these situations. As a result, utilities need highly available networks and computer systems. Because of these requirements, it is unlikely that utilities will be able to rely on a pure Application Service Provider (ASP) model for GIS data and applications. A more likely scenario involves the potential for enterprise data warehouses to be based on internet services, with local failover to operational systems in emergency situations. Positional Data Accuracy/Rubber-Sheeting An important issue for many utilities will be that initial data conversion was typically performed using an in-house landbase that had major positional accuracy issues. Many utilities will require conflation or rubber sheeting tools before they will be able to take advantage of externally provided landbase datasets. Data Currency and Publication/Subscription Timing Once positional accuracy issues are addressed, another issue is change management of internal and external datasets. For instance, a utility may want to get the latest landbase data from local counties, but when they decide to produce a major run of circuit maps they may want to temporarily hold back external changes until the maps are produced. Similarly, a utility will want to publish groups of changes to internal and external data consumers. The development of industry-oriented data models provides many internal and external benefits. Ultimately, essential data models will support simplified project implementation and reduced costs. The evolution of technology and readily available external datasets also present new opportunities and challenges for utility project teams and IT managers. References
| ||||||||||||||||||
| © GISdevelopment.net. All rights reserved. |