Migration of Arabic Label to Modern GIS Software.



3. Reverse Engineering Methodology

3.1 Reverse Engineering of Data

The reverse engineering is a known process to acquire knowledge about an existing product without referring to its producer or its documentation. The well known example is to discover the inner components of a mechanical device or an integrated circuit in order to understand how it functions. The same concept is widely used in IT industry to the three main aspects of the system: data, process and control (Davis 2000). The main objective of data reverse engineering is to use structured techniques to reconstitute the data assets of an existing system (Aiken 1996).

The reverse engineering methodology will be used to correct the Arabic labels and text in migrated geospatial data, by analyzing the Arabic alphabetic character by character and determine how it will be displayed. The requirements to solve this problem are:

The traditional GIS system with the non-standard Arabic library (the traditional GIS software on its hardware and operating system platform, and the Arabic font library used to create Arabic text annotation).

The modern GIS system where the geospatial data will be migrated.

Development tool (programming language) on the modern GIS system.

Missing data: documentation, description, or any details about the Arabic font library used to create Arabic labels in the traditional GIS system.

Since the traditional GIS system exists, we can track how the labels are treated when imported to the modern system. The Arabic alphabetic is composed mainly from 28 letters, and due to different letters shape, the Arabic keyboard has 32 different Arabic letters. In addition to numbers and other characters, the total amount of Arabic letters in font library is 48 characters.

3.2 Data Model

The geospatial model is made of point feature, each point has defined attributes: ID, Type, X, Y, Arabic Letter, Old Code, and Microsoft Code. Each point will have a label written on the map from the field "Arabic Letter", displayed at the position of point with a determined offset from the point symbol.

Data model for the spatial feature


3.3 Create geospatial data in traditional GIS system

Sample for Arabic Letter “?” in traditional GIS system
ID Type X Y Arabic Letter Old Code Microsoft Code
1 point 1000 1000 ا 0 0

3.4 Create geospatial data in modern GIS system

Sample for Arabic Letter “?” in modern GIS system
ID Type X Y Arabic Letter Old Code Microsoft Code
1 point 1000 1000 ا 0 208

The same data model will be implemented in both GIS systems: modern and traditional. The 48 features will be created by alphabetic order with same position (X, Y) in both systems, where the X is constant and Y is incremented by constant value so that the features are displayed as array on the map. Each feature will have one label, and this label is just one letter from the Arabic 48 characters.

The 48 point features will be created on both GIS systems, and each feature has its own ID. Also, the field named "Microsoft Code" includes the value of the letter code in Microsoft codepage 1256 (Arabic Windows). For example, the Alef character code is 208, and stored as shown in the above table.

3.5 Import the geospatial data from traditional to modern GIS systems

The geospatial data created in traditional GIS system will be imported to modern GIS system. The imported Arabic letters are stored and displayed as unreadable characters. The features were imported to modern GIS system, and for each ID the corresponding Arabic letter is known. Each Arabic letter in Microsoft codepage 1256 (Arabic Windows) has a unique code.

Sample for Arabic Letter “?” in imported from traditional GIS system
ID Type X Y Arabic Letter Old Code Microsoft Code
1 point 1000 1000 ? 0 0

3.6 Update "Microsoft Code" field from modern GIS system

The imported data inside modern GIS system will update the values of the field "Microsoft Code" from the same data created in the modern GIS system.

Sample for Arabic Letter “?” in imported from traditional GIS system after update
ID Type X Y Arabic Letter Old Code Microsoft Code
1 point 1000 1000 ? 0 208

3.7 Update "Old Code" field from modern GIS system

The code value of the imported character will be computed and updated in the "Old Code" field. A small program will run to determine the current code for the imported character, and this value is stored in the field ‘Old Code”, for "Alef" character it’s 230.

Arabic Letter “?” imported on modern GIS system, and the code value of the imported character is determined and stored in the table in the field "Old Code"
ID Type X Y Arabic Letter Old Code Microsoft Code
1 point 1000 1000 ? 230 208

At this stage, the mapping matrix is complete, the "Alef" character when imported to GIS software on Microsoft, it has code 230 instead of 208, and the same for all remaining Arabic characters.

3.8 Create Conversion Matrix

Conversion matrix is created for each character, including the code value for the imported character, and code value of its original value.

Conversion Matrix of Arabic library


3.9 Develop the Conversion Program for the Imported Labels

A program will be developed to get the code value for each character (“Old Code”) in the imported geospatial data and substitute it with its associated "Microsoft code" in the conversion matrix, and store the new text annotation.


Sample of the conversion array implemented in the programming
language of the modern GIS system.


3.10 Apply Conversion Program

The conversion program will be applied to the whole geodatabase imported from traditional GIS system. Figure (3.1) shows the corrected data in the attributes table and Figure (3.2) shows the corrected text displayed on the map. Figure (4.1) displays the steps of the reverse engineering methodology, and Figure (4.2) displays the flow chart of the conversion program.


Sample of the result of the Arabic data corrected after
applying the conversion program in attributes


Sample of the result of the Arabic data corrected after
applying the conversion program in map


Reverse engineering methodology to correct Arabic data



Flow chart for conversion program

4. Conclusion

The reverse engineering methodology was introduced to correct the Arabic labels, developed with special non-standard libraries in traditional GIS systems, inside modern GIS systems. This methodology corrects the Arabic labels without any reference to the original library.

The runtime of the conversion program (update program) is relatively long, since it searches for each character in the geodatabase and computes its code value then changes it and commits the changes in the database. Although this methodology was developed initially to solve geospatial problem for Arabic labels, but it can be used also to solve Arabic (or any other language) text data created with font libraries without any documentation in old systems (IS or GIS systems).

References

Aiken, P. (1996), Data Reverse Engineering: Slaying the Legacy Dragon, McGraw-Hill.

Davis, K.H. and Aiken, P. H. (2000), "Data Reverse Engineering: A Historical Survey" Proceedings of the Seventh Working Conference on Reverse Engineering (WCRE'00), 1095-1350/00 © 2000 IEEE

Smith-Lee, S (2001), AM/FM/GIS migration: a formula for success, ww.gisdevelopment.net/proceedings/ gita/1997/bepm/19.pdf

Montgomery, B. (2001). GIS migration: protecting your data investment, http://wwwsgi.ursus.maine.edu/gisweb/spatdb/amfm/bios/am94053bio.html