Abstract
The demand for geocoding and reverse geocoding applications in Malaysia is increasing in the current context of new requirements and needs for geospatial solutions and location based services. To address these issues, Navi and Map Sdn. Bhd. has been developing specific and customized applications to perform fast and accurate geocoding, as well as to provide on line reverse geocoding services. From an existing consolidated GIS built up from accurate and updated GPS surveys, a Street Map Address Database has been compiled and configured to feed the geocoding application. The geocoder can perform standard processes such as address standardization, clean up, parsing, classification and re-structuring. This solution was tested and customized for Malaysian addresses specifications. Able to locate the exact or eventually nearest address, and affix its longitude and latitude values from the geodatabase, the geocoder was also designed to provide reports on the results accuracy achieved. For municipalities department of valuation and property management, the geocoding service is integrated in an application covering the overall process of issuing payment notices to the corresponding property owners. Geocoding can therefore be part of a daily routine and become one entire and efficient component in property administration and tax management system. Similarly, a GIS application has been developed to retrieve address information from a given point knowing its X and Y values. A reverse geocoding algorithm combined with a dense and detailed street geodatabase allowed to make this service active and available on user request for all major towns in Malaysia.
Introduction
Geocoding is a common and standard feature in GIS and LBS applications worldwide. However, this happens to be a very challenging task in Malaysia. Geocoding services are not available or poorly performed in Malaysia due to the lack of consistent data for reference. The address rules, as for street naming, number attribution and positioning in Malaysia makes it even harder. A large number of streets do not have an identification name but a number related to the suburb where the street is located. House numbers are arbitrarily distributed in many places. There is frequent redundancy in naming and variations in spelling and labeling. At present, the demand for verified and geocoded address data for all major populated cities and towns is increasing. But the efforts to compile such comprehensive and complete data can hardly be bared by a single agency. The data collection and processing requires an immense and highly time consuming effort, as well as very costly to implement. Surveys using GPS devices or mobile mapping softwares on PDAs require large data collection campaign where it is very difficult to minimize the cost of the resources deployed on the field. There are many ways to perform geocoding and Navi and Map has tried to address this issue in a rapid and efficient manner by using the existing available geographic data. Navi and Map has been looking into a solution involving setting up a geocoder having some level of interpolation features, combined with simple manual post processing procedures.
How to proceed ?
Building and customizing geocoding services for Malaysia
The basic requirements to perform geocoding consist in 3 essential components:
-
An address information as input file in text format
-
A dictionary of georeferenced data including attributes with location names and coordinates.
-
A geocoder or geocoding engine; software application able to perform database search and matching
Before geocoding :
After geocoding :
The overall process will start with getting the input address record, run the several geocoder modules, provide the results and integrate them in a mapping environment. When the records are successfully matched, they can be precisely tagged with the correct location on the map. The process will include various filtering steps and will be able to provide report on the accuracy achieved for each result. Based on the quality of the geocode and the expected level of accuracy, post processing processes will be performed to locate the point on the basemap and affix its coordinates accordingly. For example, to get the location at the premise level when the house number could not be matched, a manual correction will be required to append the longitude and latitude corresponding to that exact location.

Cf Figure 1: Navi and Map geocoding system.
Let’s now review the process components and roles in details.
1.) The address data listing
Being able to get quality data input is one of the key issue to success. It is important here to design a standard address model that should be followed by the address data providers. A minimum of information is required so that the address can be matched with the georeferenced data bank. It must include correct and complete names for states, cities, suburbs and streets, as well as the postcode when available. The postcode numbers constitute a very reliable source of data able to provide direct and immediate matching to the postcode demarcation areas. Geocodes in smaller postcode demarcation areas are of course more accurate, as there are less numbers of houses counted in.
Next, the address input quality requirements goes to the structure of the address file, e.g. how several fields of information are organized with street suffix, prefix, etc… The recurrent problems are linked to the spelling errors and labeling formulation including initials or short names. Another issue is also to ensure the validity of the address, as the address listed should correspond to the location targeted in case of multiple registrations of property ownership. For all these reasons, the geocoder must include an address pre-processing module to clean and standardize the inputs before running the engine.
2.) The georeferenced data or street map address dictionary
The georeferenced data dictionary is basically a huge look up table containing name information with coordinates information. Here the quality of the database in terms of density and accuracy of information, as well as in terms of database indexing structure, is an essential factor in the overall process. The records saved in this database will be the ones checked to match with the data input. Therefore, the more complete and precise it is, the more chance will be to get it matched to the address location requested. The database coverage should include the widest coverage to make sure all locations could be found within the area mapped in the geographic database. This must includes detailed street maps for city suburbs and similar places not always covered by standard maps. Navi and Map disposes of a highly accurate street geodatabase built up for Navigation systems and other on line mapping applications. The same geodatabase will be used for this project as a base and will be enhanced to provide foundations for address geocoding. The available database has been built by capturing data through field surveys using differential GPS instruments at sub-meter accuracy. The database includes two specific objects classes for buildings and points of interest. All major buildings, government premises, industrial or commercial buildings, as well as high rise residential condominiums, have been mapped with address attributes. These location data will provide a direct link to the address from the name recorded, e.g. Menara TM, Phileo Damansara II, Times Square, etc… To affix the coordinates, the building centroids will be generated to populate the database. Further improvement can be achieved by combining trade directories, building plan notifications from town councils and other relevant documents. Currency is an issue, and to keep track on the changes with new buildings and modification of names, surveys and data collection campaigns need to be a constant effort. For residential areas, as house numbers are partly captured with first and last house numbers per housing rows, interpolation will be required to locate an individual house on the map. As part of the geocoder, interpolator modules can be built using referenced benchmark such as other adjacent landmarks and street names. The application requires more than just a data dictionary, it requires looking up the spatial data by efficiently indexing and organizing the records. A hierarchical structure helps to provide filtering and searching functions by geographic areas and subdivisions, such as districts, postcode demarcations, and suburbs.
The Geocoder
A geocoder is basically a customised application that can run as a stand alone program. Geocoding can be performed in a non gis environment as a database driven application. The geocoder has to be built based on the georeferenced data generated in GIS and compiled and structured in a SQL database. Its main functions will be to provide textual matches among the many attributes to compare with, between the address input request and the geodatabase records. Processing speed has to be considered has one important factor, especially when the database counts a huge amounts of records. The user interface is also a key quality component, even though not an essential one. A good graphical user interface will allow to offer a more user friendly design. The most important factor remains the data quality, especially the accuracy and structure quality of the georeferenced data dictionary. As mentioned earlier, the geocoder needs to have several modules to assume its role of processing and matching address records.

Cf Figure 2: The geocoder modules
The parser module
The parser module is an application consisting of address pre-processing and cleaning before forwarding data to the next module. The parser filters address by area name, postal code, city names, suburbs name, up to street level. To allow complex searching and matching functions, it is first necessary to correct the labeling and spelling problems occurring. For example the streets prefix may be given as jln for jalan, or lrg for lorong, tmn for taman, etc… Similarly, a larger number of initials will be present such and PJ could stand for Petaling Jaya or Putra Jaya. Therefore, a first level of standardization must be applied here in order to classify the records by object category.
The matcher module
The matcher module is the core application of the system. Its role is to perform search within the database and look for the proper corresponding records. Once all options are identified, the results will be given as a potential geocode for the related input. The matcher module can be customized instead of checking only the exact match on fields. Searching by open keyword allows to identify a name independent from the field classification. For example the text ampang can be identified for jalan ampang, taman ampang, ampang park, ampang jaya. The application will generate a list of candidates that corresponds at least to one member of the various class, be it street names, city names, postcode, surburb names, or place names.
The interpolator module
Interpolation functions will be used to determine geocodes from benchmarks using specific spatial positioning rules and related algorithms. The geocoder should be able to generate results by interpolation based on the street geometry and address ranges. From the available database where house numbers as objects in a GIS are not consistent and complete, street segments are created to provide an alternative solution. The system will make use of each of the road or street segment geometric characteristics defined in the GIS, such as length, orientation and connections to junctions. The objective focuses in placing a point on the map as per its location within the street segment. The options in house numbering per range, as even or odd number, constitutes an additional difficulty to place the point accurately.
When the street address interpolation fails to find matching records, the result can be given using postcode centroids to assign coordinates. Location will then be less accurate than as street level, but still classified within the region of interest.
At the last stage, the results need to be associated to a graphical object to be displayed on the map as a marker. In order to do so, a manual process will be the final technical step where the technicians will use map documents references to locate the point at the premise level.
The overall system architecture consists in a client-server architecture with a web based application where the server processes transactions and sends responses to the client. The system can be scalable to maintain efficient processing speed and response with an increased number of requests from clients. The user can access and operate the geocoding application through a browser based HTML client. This set up answers the Municipality project requirements, where the geocoding services needed to be accessible in a non GIS environment, without investing in desktop GIS systems. The application for the Municipality has been designed with an easy and user friendly interface for use by the clerks without prior background in mapping tools and technologies. For this particular project, the objective was to link the geocode to the accounting database of the department of valuation and property management. The system is a dynamic system where the daily routine of producing payment status reports and notice are making use of up to date geocoded data. The technician can perform inquiries on a particular address by first identifying and locating the address on the map, and save the result to the accounting database for the next procedures. Geocoding can therefore be part of a chain of procedures and become one entire and efficient component in the property administration and tax management system.
Setting up a reverse geocoding system from geocoded database records
Similarly to the process of appending latitude and longitude coordinates to a given address record, some geospatial applications require to generate geographical location information corresponding to latitude and longitude data. This process is called reverse geocoding. This application is particularly useful for mapping and finding a geographical nearest location of interest. Reverse geocoding should provide results as a text information figuring the corresponding location address details.
Besides making use of the geodatabase, building the reverse geocoding engine involves additional development programming. It requires calculations using algorithms measuring the distance between two points on the earth. Here is an example of one of the formula
References
Arc(AB) = R x Arccos [cos(lat1) cos (lat2) cos (lon) + sin (lat1) sin (lat2)]
Lat : latitude in degrees
Long : longitude in degrees
R : Earth radius = 6378 km average radius at equator
For the current project designed for a real estate agency, the application is making use of data captured by a mobile GPS device. Navi and Map reverse geocoding engine is able to return information on the location plotted on the map with details on the nearest street or road name, and details on the geographic area demarcation where the point falls in, such as which city and suburb. Additionally, it can also provide supplementary information like distance to major landmarks, or any particular point of interest. From the geodatabase available in Navi and Map, the points of interests were classified by range of priority and selected upon the user interests.
For this project, the mapping application developed by Navi and Map needed to be coupled with a SMS gateway to get request and send back response to the user, after processing the data transferred as text information. The system required a customized Application Programming Interface to be provided for integration with the telecommunication devices.
Cf Figure 4: The reverse geocoding simulation display
Conclusion
Geocoding is the immediate and necessary gateway step to locate and mapped accurately address data. The process of getting the latitude and longitude of a record relies mainly on the database made available for searching and matching. It is essential to focus on the database content and structure in order to be able to run a geocoder. In Malaysia, ready made address data bank in a GIS environment are not available, and this lack of adequate references implies building a system able to cope with missing records and discrepancies. However the performance of the geocoding engine and interpolation methods, the data capture still remains the foundation for providing geocoding services. Continuous efforts have to be made and centralized to build up standardized geographical referenced data in this regard. Once the system can be put in place, it provides ability to perform a large number of location based services. The number of applications for a spatially enabled government is exhaustive. Geocoding can be used for emergency response systems, enforcement command and control systems, etc… Business applications that can make use of such system can also be very rewarding.
For instance, reverse geocoding standard services and commercial applications can include vehicules tracking, insurance providers services, friend finder, find nearest point of interest and push marketing.