|
GISdevelopment.net --> Application --> Miscellaneous
Migration of Arabic Label to Modern GIS Software.Mohamed A. Eleiche and Sanaa Issa GeoTiba Systems Email: meleiche@geotiba.com Abstract Modern GIS systems are converging towards standardizations and interoperability. In the process of migrating GIS systems from traditional GIS systems established in the eighties and nineties to modern GIS systems, usually the data models, programs and user interfaces are reconstructed based on the new design of the system. The geospatial data is usually migrated according to the existing data model in traditional GIS system and the new data model in the modern GIS system. The traditional GIS systems did not have standard libraries for using Arabic language in map labeling. Special Arabic libraries were implemented to enable traditional GIS systems to use Arabic language, and huge amount of data were created using these libraries. When migrating to modern GIS systems, these geospatial data were imported into modern GIS systems from different GIS platforms. Unfortunately, the imported Arabic data (labels and text) are not readable (stored and displayed as random characters) on the modern GIS platforms. This paper introduces a reverse engineering methodology to correct the Arabic labels and text of the imported geospatial data into modern GIS systems. 1. Introduction The GIS (Geographic Information System) technology started in the early sixties in USA, EU and other regions. The GIS industry started while the IT (Information Technology) industry suffered from lack of standards. Early GIS systems were characterized by expensive hardware (mainly Main Frame), special operating systems, sophisticated GIS software, specific file formats for data storage and special programming languages. These GIS systems started to penetrate in Middle East in the early eighties, with the same characteristics. The traditional geospatial data were considered closed pools difficult to access or to share. With these characteristics of traditional GIS systems, and under the necessity for mapping organizations around the world to adopt the GIS technology, those traditional systems started to spread. The traditional GIS systems did not fulfill all the users' requirements, and the owners of these GIS systems started to add their needed requirements to commercial GIS softwares and systems they own. Among these important GIS users requirements was labeling the text annotation in geospatial data with local languages. Unfortunately, this option was not available for earlier GIS systems. Regarding map labeling and text annotation, the English language and some other languages mainly based on Roman origin had pretty nice fonts and libraries to write and plot on digital maps, while other languages such as Arabic did not have a way to write Arabic labels or legends on their digital geospatial data. To overcome this obstacle, some special Arabic commercial libraries were created to write Arabic labels and text annotation on digital maps. Consequently, these special Arabic libraries were specific to the traditional GIS which formed their development and usage platform. These libraries were developed mainly to fulfill business requirements without standardization. 2. Problem Statement These special Arabic libraries shared common characteristics, such as: The library depends on the platform (Hardware & Software) The library is used inside the GIS software only, not by other software on the platform Once exported to another version or software, the Arabic data is lost and cannot be readable No standards or industry specifications were applied in the development of these libraries At that time, those libraries were efficient and doing their role: storing, querying, plotting Arabic labels and text annotation on digital maps. Those Arabic libraries, after several years in the GIS market, ended with two conditions: Condition 1: The producer of the library still maintains his library frequently, and produced a tool to convert it to modern GIS systems. This condition enabled the successful conversion of Arabic text annotation to modern GIS platforms such as those based on Microsoft windows. Condition 2: The producer of the Arabic library stopped the maintenance and support of the product and announced it as deprecated and not-supported product, and here is the problem. In Egypt for example, dozens of organizations used commercial libraries to create the Arabic labels and text on digital maps and the support for these commercial libraries ended and the problem of the conversion arose, because these data when imported to modern GIS systems the Arabic label is not readable at all. ![]() Sample of Arabic Labels created by commercial font library on GIS under DOS operating system in 1993 By AlCahira Company, displayed as English Text when imported to modern GIS. 3. Reverse Engineering Methodology 3.1 Reverse Engineering of Data The reverse engineering is a known process to acquire knowledge about an existing product without referring to its producer or its documentation. The well known example is to discover the inner components of a mechanical device or an integrated circuit in order to understand how it functions. The same concept is widely used in IT industry to the three main aspects of the system: data, process and control (Davis 2000). The main objective of data reverse engineering is to use structured techniques to reconstitute the data assets of an existing system (Aiken 1996). The reverse engineering methodology will be used to correct the Arabic labels and text in migrated geospatial data, by analyzing the Arabic alphabetic character by character and determine how it will be displayed. The requirements to solve this problem are: The traditional GIS system with the non-standard Arabic library (the traditional GIS software on its hardware and operating system platform, and the Arabic font library used to create Arabic text annotation). The modern GIS system where the geospatial data will be migrated. Development tool (programming language) on the modern GIS system. Missing data: documentation, description, or any details about the Arabic font library used to create Arabic labels in the traditional GIS system. Since the traditional GIS system exists, we can track how the labels are treated when imported to the modern system. The Arabic alphabetic is composed mainly from 28 letters, and due to different letters shape, the Arabic keyboard has 32 different Arabic letters. In addition to numbers and other characters, the total amount of Arabic letters in font library is 48 characters. 3.2 Data Model The geospatial model is made of point feature, each point has defined attributes: ID, Type, X, Y, Arabic Letter, Old Code, and Microsoft Code. Each point will have a label written on the map from the field "Arabic Letter", displayed at the position of point with a determined offset from the point symbol. ![]() 3.3 Create geospatial data in traditional GIS system
3.4 Create geospatial data in modern GIS system
The same data model will be implemented in both GIS systems: modern and traditional. The 48 features will be created by alphabetic order with same position (X, Y) in both systems, where the X is constant and Y is incremented by constant value so that the features are displayed as array on the map. Each feature will have one label, and this label is just one letter from the Arabic 48 characters. The 48 point features will be created on both GIS systems, and each feature has its own ID. Also, the field named "Microsoft Code" includes the value of the letter code in Microsoft codepage 1256 (Arabic Windows). For example, the Alef character code is 208, and stored as shown in the above table. 3.5 Import the geospatial data from traditional to modern GIS systems The geospatial data created in traditional GIS system will be imported to modern GIS system. The imported Arabic letters are stored and displayed as unreadable characters. The features were imported to modern GIS system, and for each ID the corresponding Arabic letter is known. Each Arabic letter in Microsoft codepage 1256 (Arabic Windows) has a unique code.
3.6 Update "Microsoft Code" field from modern GIS system The imported data inside modern GIS system will update the values of the field "Microsoft Code" from the same data created in the modern GIS system.
3.7 Update "Old Code" field from modern GIS system The code value of the imported character will be computed and updated in the "Old Code" field. A small program will run to determine the current code for the imported character, and this value is stored in the field ‘Old Code”, for "Alef" character it’s 230.
At this stage, the mapping matrix is complete, the "Alef" character when imported to GIS software on Microsoft, it has code 230 instead of 208, and the same for all remaining Arabic characters. 3.8 Create Conversion Matrix Conversion matrix is created for each character, including the code value for the imported character, and code value of its original value. ![]() 3.9 Develop the Conversion Program for the Imported Labels A program will be developed to get the code value for each character (“Old Code”) in the imported geospatial data and substitute it with its associated "Microsoft code" in the conversion matrix, and store the new text annotation. ![]() Sample of the conversion array implemented in the programming language of the modern GIS system. 3.10 Apply Conversion Program The conversion program will be applied to the whole geodatabase imported from traditional GIS system. Figure (3.1) shows the corrected data in the attributes table and Figure (3.2) shows the corrected text displayed on the map. Figure (4.1) displays the steps of the reverse engineering methodology, and Figure (4.2) displays the flow chart of the conversion program. ![]() Sample of the result of the Arabic data corrected after applying the conversion program in attributes ![]() Sample of the result of the Arabic data corrected after applying the conversion program in map ![]() Reverse engineering methodology to correct Arabic data ![]() Flow chart for conversion program 4. Conclusion The reverse engineering methodology was introduced to correct the Arabic labels, developed with special non-standard libraries in traditional GIS systems, inside modern GIS systems. This methodology corrects the Arabic labels without any reference to the original library. The runtime of the conversion program (update program) is relatively long, since it searches for each character in the geodatabase and computes its code value then changes it and commits the changes in the database. Although this methodology was developed initially to solve geospatial problem for Arabic labels, but it can be used also to solve Arabic (or any other language) text data created with font libraries without any documentation in old systems (IS or GIS systems). References Aiken, P. (1996), Data Reverse Engineering: Slaying the Legacy Dragon, McGraw-Hill. Davis, K.H. and Aiken, P. H. (2000), "Data Reverse Engineering: A Historical Survey" Proceedings of the Seventh Working Conference on Reverse Engineering (WCRE'00), 1095-1350/00 © 2000 IEEE Smith-Lee, S (2001), AM/FM/GIS migration: a formula for success, ww.gisdevelopment.net/proceedings/ gita/1997/bepm/19.pdf Montgomery, B. (2001). GIS migration: protecting your data investment, http://wwwsgi.ursus.maine.edu/gisweb/spatdb/amfm/bios/am94053bio.html |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| © GISdevelopment.net. All rights reserved. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||