GISdevelopment.net ---> GITA 2001 ---> System Architecture

Tuning oracle for infrastructure applications

Sue A. Johnson
Geographic Information Technology, Inc.


Introduction
In this paper, certain theory and methods will be explored for tuning Oracle in GIS infrastructure applications specifically focusing on GIS applications. With the advent of GIS applications and technology becoming a corporate core technology, GIS is increasingly becoming interdependent on databases such as Oracle. There are also the wide selection of server technologies that provide a high availability and scalability for the enterprise-wide technologies and applications such as GIS and Oracle databases.

The current versions of Oracle such as Oracle 8 and 8i combined with the available server technologies provide a highly available and scalable architecture for business and government application solutions. Combining GIS with Oracle databases on servers provides increased performance and throughput. Typically Oracle and other scalable databases will be a key component of GIS technologies utilized by business or government to meet their current GIS application requirements.

The current dependency of GIS on the incorporation of Oracle databases varies by vendor and application. Increased usage of Oracle increases the necessity to ensure that the Oracle database is properly set up and tuned for the particular GIS application requirements. These types of GIS applications including AM/FM, land information systems, e-business and other internet applications, and future usages will require different types of Oracle database environments. Given the variety of GIS vendors and their respective GIS application offerings, there will be a discussion of the rules of thumb for tuning the Oracle database environment. The focus of tuning the Oracle database will specifically include the possible ways to improve the Oracle database environment including storage, memory and other processes, and the SQL that is used to generate data from the Oracle database in the GIS infrastructure application.

Oracle has been scalable and reliable since the release of 7.1 of the data repository. Today Oracle on a variety of servers can process massive amounts of data for a variety of applications. Over the last decade with Oracle 8 and 8i, Oracle Corp. has surpassed all database technology by providing business with the capability to deliver data stores in the Oracle database that can be highly tuned, scalable both in physical and user size, and highly available. Oracle is scalable on Unix and NT platforms with seamless integration with other databases and tools via ODBC (open database connectivity).

GIS as an infrastructure application
In the past, GIS applications were not dependent upon Oracle databases, as the data has historically been retrieved from GIS file systems. GIS applications are becoming a key component of the enterprise-wide information infrastructure that may link many types of data resources. These enterprise information sources may include the following data sources:
  • relational
  • object
  • spatial
  • legacy systems
  • video
Further, this information may be ported to an Internet application for retrieval by worldwide users. With this in mind, it is no longer possible to segregate the business of tuning databases from GIS applications because the Oracle database may be a key component of the GIS application.

When some or part of the application(s) depend upon the Oracle database for some or all of its information, there needs to be a significant amount of attention paid to the tuning of the Oracle database. This will primarily include tuning Oracle's physical and memory related processes and SQL statements.

Each type of application may have different requirements for tuning. Applications implementing long transactions require large amounts of memory and disk space in order to store all stages of the transaction. Short transaction applications are less memory intensive but may require extensive network traffic. In any case, knowledge of the database internal views and tables is imperative in order to effectively tune the Oracle database.

Primary Tuning Areas of the Oracle Database

The primary tuning areas consist of the following:
  • Database Storage and Design
  • Memory processes and Database Initialization Parameters
  • SQL statements
The following illustration depicts the relationships existing in the Oracle database including application users, other types of users, Oracle memory processes, and Oracle database physical storage. The Oracle database and its associated instance consist of the following required structures:
  • Storage structures, physical and logical
  • Processes, memory and physical

Oracle Database Architecture


Database storage and design
Tuning Oracle databases has not traditionally been a focus for GIS implementations as the GIS was more than likely stored in proprietary files or data stores. Of course some vendors have been moving to the idea of relational databases for the storage of GIS. Data can then be retrieved by Enterprise Applications, thus integrating GIS with other corporate business systems.

Often, an application that utilizes Oracle does not specify names of tablespaces for the application tables, so they will have to be created in-house and assigned appropriate sizing to fit the application needs. Choosing names that correspond either to the application name or type such as GIS is a good rule of thumb. Note that if the application utilizes indexing for the application tables, a tablespace will have to be created separately for the indexes to avoid database contention while using the application. Logically, one might utilize tablespace names such as GIS_DATA or _DATA for the application tables and GIS_INDEX or _INDEX for the application indexes.

Within the database creation scripts, it is equally important to modify the definitions of tables, indexes and constraints to appropriate storage, growth, and tablespace parameters. From experience it is assumed that the business or government personnel along with their GIS experts will provide these parameters.

Note that some GIS applications are integrated with Oracle databases in some of the following ways
  • reference, or lookup, database structure
  • spatial data structure
  • or a combination
Certain vendors of GIS applications require usage of a vendor type of data store and hence the primary usage of an Oracle database would be for reference, or lookup, data. The variety of GIS spatial applications runs the gamut between only lookup data to all spatial data residing in the Oracle database. Certain vendors have defined all of the structures for the GIS location and descriptive data within Oracle. This combination will require creative database management in the future if a variety of database structures are operational in an environment possibly consisting of strictly GIS application requiring a datastore only with Oracle lookup data and a potential AM/FM requiring full usage of the Oracle database. The eventual design of the tablespaces and tables will be determined by the GIS application requirements.

It is also important to set up users with the appropriate default and temporary tablespaces. If these are not set at all, the critical SYSTEM tablespace is used, and will eventually cause degraded performance due to contention with the Oracle instance. The general rule of thumb is to have a USERS tablespace with enough room for storing any user database objects such as tables, indexes, packages, procedures, functions, etc. Of course the TEMPORARY or TEMP tablespace could be considered a standard for any space required for the user's query sorts and other temporary data.

The amount of space allocated to a user for the default and temporary tablespaces can be unlimited but will most certainly need to be restricted. Typically this number is limited by the type of application, the size of sort area required by queries and the type of database objects the user expects to create and utilize in the GIS application. Tablespace size may also be affected by GIS applications that require new or modified database objects.

Memory processes and database initialization parameters
It is necessary to consider many parts of the database to fine-tune the Oracle database environment so that it will handle the needs of the database and its respective users. A key consideration is the setup of memory processes. For future versions of Oracle, a viable goal is to have the entire database available in memory to its users thus improving access and throughput. Computers are now available with large memory areas for sorting, parsing user and GIS application SQL.

The Oracle database instance consists of memory processes, system and process memory areas, and physical processes. The main memory process, SGA or System Global Area, must be properly set up and tuned to effectively process SQL statements and data. Tuning the SGA consists primarily of setting its size correctly as a percentage of available memory. If there is more than one instance running on a particular server, this will need to be established with respect to the GIS application needs.

The SGA consist of the following memory processes:
  • Shared Pool -Data Dictionary
    -Library Cache
  • Redo Log Buffer
  • Database Buffer Cache
SGA is created at instance startup and the size is calculated using the initialization parameters specified in the init.ora parameter file. Specifically the SGA size is calculated as follows:

Size of the SGA (in bytes) = db_block_size*db_block_buffers + Shared pool size The 1st component identifies the amount of memory allocated for processing data and the 2nd component, Shared pool size, identifies the amount of memory allocated for both the library cache and data dictionary.

The Dictionary Cache has information in memory about database objects and processes. This is where the database looks for information about Tablespaces, Tables and Columns, and all other database objects. This is how the database checks whether a table column in a query exists.

The library cache contains information about recently executed SQL statements including the query's parse tree and execution plan. In an application using the same queries, which are executed frequently, the query will most probably not be parsed again. If there are a large number of queries that are parsed often while being executed, they are being put on the LRU or least recently used list for swapping out of the SGA. After consideration, the SGA will need an increase in size. Further it is always recommended to pin, or permanently place in memory, any required database packages. With a larger SGA and planning the usage of memory can reap huge performance gains.

For example, a strictly GIS application that only utilizes Oracle for lookup tables, the SGA required for this instance might be 20M - 40M. In a highly intensive environment with a large number of users on a Oracle - GIS application, this range could be 1G - 1,000 Gigabytes. This will depend on the following factors:
  • Number of Users
  • SQL statements processed by the application user
  • Size of Data
  • Long vs. short Oracle transactions
Note that certain types of GIS applications are long-transaction meaning that many cursors will be opened to process GIS for long periods of time. This will require an increase of the possible number of open cursors when the database is started. Typically in most IT applications, the number allowed varies between 50 and 200 depending upon the number of processes and users. In these long-transaction databases requiring many open cursors, this number can rise to 1000 or more.

A large SGA is required for long-transaction GIS type Oracle sessions that may not commit the work and/or release used cursors for many hours. This also requires that the number of open cursors be very large. Some vendors recommend somewhere between 250 and up to 1,000. If a user session opens over 250 in a session, this statistic will certainly influence the number of open cursors that are allowed in memory to allow all required concurrent users to process data.

Monitoring the memory processes is a key part of maintaining the Oracle Database. This can be done with tools developed by a variety of vendors or by a SQL session that monitors these processes by collecting information from a variety of Data Dictionary views that are updated to reflect point in time activity. Some of the important views for checking the SGA performance are the following:
  • V$SYSSTAT maintains information regarding the hit ratio
  • X$BH identifies the number of available, in use, and free data buffers
  • V$SQLAREA reflects information about database queries
  • V$SESSION identifies users and their status
  • V$ROWCACHE identifies the hit and miss ratio of the data dictionary portion of the SGA's shared pool.
  • V$LIBRARYCACHE identifies the hit and miss ratio of the library cache in the SGA's shared pool.
  • V$SGASTAT identifies the amount of free memory currently available in the shared pool for SQL queries.
  • X$KSMSP provides a detailed look at the shared pool including freeable statistics
When paging and swapping of processes in and out of the Oracle memory structure, or SGA (System Global Area), this area will need to be increased to handle the probable increase in SQL area processes for the GIS application.

GIS applications generally require a sort_area_size big enough for users or processes that need to allocate memory to sort Oracle data processed from queries, adhoc or internal to the application. Often with location and map information, there is a requirement for large sort areas due to the amount of data required for display and reporting.

The processes requiring temporary memory for sorting are the following:
  • creating indexes
  • select statements with
  • order by
  • distinct
  • group by
  • union
  • unindexed joins
Note that the memory allocation is on a process basis and is returned back to the respective OS when the sort operation has completed.

Cache data into memory for small tables that require frequent selects using a select with cache. Data will be kept resident in the SGA/shared pool. After the 1st query from a particular table needing to be cached, the table's data will remain cached in memory. It will only be released when it is specifically released or database shutdown and startup was required which flushes memory processes.

The memory processes of the Oracle instance are a key part of tuning the operation of a database. These processes are responsible for the efficiency of SQL execution and retrieval and storage of both data and logical database information. It is necessary to focus attention on the database parameters and stored statistics that are available to enhance the Oracle environment.

SQL Tuning
It has been stated that 90% of all tuning problems of a resident application lie in the tuning of SQL used by the application. The importance of tuning SQL statements could not be understated in any GIS application environment utilizing Oracle databases. SQL query tuning must be individually considered with respect to the GIS application and must take into account the following information:
  • Oracle Optimizer mode
  • Indexes available
  • Version of Oracle
  • Physical and logical location of tables and indexes
SQL statements are the main activity of any user of databases. The importance of their tuning can never be understated. Some GIS applications that store data in Oracle tables still expect rule-based optimization for the Oracle query's execution plan. Although the cost-based optimization was introduced with later versions of Oracle7, this became a standard by Oracle8. Certain GIS application vendors require Oracle 8 or 8i at this date, it can not be assumed that cost-based optimization is supported.

Cost based optimization improves SQL execution by a number of factors. It is necessary to perform analysis of tables and store statistics for SQL execution plans. It is assumed that GIS applications will quickly employ analyzing tables and storing these statistics for increased database throughput and performance.

The usage of indexes must be established with the GIS database design. The version of Oracle as well as the optimization method that is required will determine the required tuning parameters and methods required.

The tools to get the SQL performance and execution information and statistics are the following:
  • Check the execution plan - Explain Plan
    - Tkprof
    - Set Autotrace On
    - Use Recommended Hints
Conclusion
The interdependency of GIS applications and object/relational databases such as Oracle require the technical consideration of tuning the database environment. The ability to provide GIS information in a timely, accurate manner has influened GIS vendors to utilize scalable, reliable database technologies such as Oracle databases.

The emergence of GIS-Oracle interdependent databases will require the on-going examination of these operational environments. Specifically, the Oracle database environments will require appropriate tuning in many areas. Certain key areas have been discussed with the intent of focusing on some key areas of database tuning.
© GISdevelopment.net. All rights reserved.