Your Data has a Life - Revive It
Michael Norris
Director, Asset Information Management,SCL Inspiration Through Privitisation Knowledge of the company physical assets is of critical importance for the commercial and safe operation of a business. This subject has come to the forefront in the UK in a big way in recent years, following the "selling off" of public monopolies to the private sector. British Rail (BR) was "broken up" into a number of different railway companies in an effort to breathe new life, investment and competition into what was seen as deteriorating circumstances for the railways. Almost overnight, and as if by magic, there appeared Train Operating Companies (TOCs) who transported passengers, Freight Operating Companies (FOCs) who transported freight, and the one and only company responsible for the maintenance and management of the railway infrastructure - track, signals and major stations - Railtrack. At the time of the privitisation the UK Railways were already separated into 10 Zones (now 7). Each Zone operated with great autonomy. It possessed its own maintenance crews who were responsible for the upkeep of the track and signals within the geography managed by the Zone. The maintenance crews were to form the other "new" key players of the railways - the Infrastructure Maintenance Contractors (IMCs). The focus of this paper is specifically about what happened next - in terms of assets and asset data. The paper relates the data quality strategy deployed in trying to acquire and maintain good quality data. In particular it relates to how a project was kicked-off to provide a comprehensive register of the infrastructure assets and the key findings of the project. The Project The Project was approved in April 1999 and provided a budget to do the following: -
Before we look at the asset data quality as part of the Railtrack Project experience let's consider some of the problems encountered in the handling of data by way of a simple example - data held in the Address Book. The Data in an Address Book A typical address book can be a "minefield" of information. There is a very good chance that the data recorded will be useful since the person who is going to use it put it there. They know when they recorded it, the process by which it was acquired, the source of the data and, generally, the reliance they can put on that data. Hence,
Perhaps two partners wish to share information about their circle of friends for the purpose of putting a single, definitive, common list of names and addresses together. Under those circumstances it would have been useful, before they had started to collect data, to have agreed what kind of data should be recorded and how it should be recorded (a rare event). Let's start with the basics - such as a person's "Name". What exactly do we need to record about "Name" - perhaps, the "Surname "and "Firstname" and a "Middle Initial"? How do we want to record it? Should it be "Surname" followed by "Firstname" or vice-versa? What other detail is normally recorded about someone? We might recommend an "Address"- data item including "Street Number" and "Street Name", and possibly "Town", "County" or "State", "Postcode" or "Zipcode", or "Country"? Telephone details might be useful such as "Home Tel No", "Fax No", "Mobile No"? "Business/Office Nos", or "Pager No"? And, today we might also want to record details such as someone's "Email Address". Some of this information may be considered "basic" and some "extra detail". What is "basic information" and what is the "extra detail" - the latter may be defined as "nice to have" but rarely used if at all. If we are not going to use it why keep it?
But where do we store all of this information? We can use handheld electronic devices (Personal Digital Assistants or PDAs) - Paper based Organisers or Diaries, PCs (with more software applications to choose from), wall calendars or web-based storage areas. And even as individuals we will use different places to record different but related data. This creates tremendous difficulties when we attempt to pull together an up-to-date list of names and addresses.
A critical objective of our project was to have a database populated with asset data of a known and agreed quality. It was recognised that the required level of quality could not be attained immediately and that a progressive, phased approach would be required. We began by devising a Data Quality Strategy - a framework to enable us to achieve our goals. Data quality was seen as a historic weakness in many of the precursor database systems use formerly for holding asset data. For the new asset register data quality was seen as a distinguishing feature that would enable the register to address the needs of the end users more reliably. Confidence in the asset data held within the system was considered vital to its long-term use. An important aspect of the challenge of providing data of the requisite quality was ensuring that the quality was measurable and that the quality level attained was visible to the users. Therefore, user confidence in the data was an essential prerequisite for successful implementation and continued success of the asset register. The end user community needed to embrace the system enthusiastically if the disciplines necessary for data maintenance were to be adhered to. Users would only have confidence in the register if the quality of the data was clearly demonstrated, to a required level. he Need For Cultural Change It was recognised that a major change was needed in the way both Railtrack and its suppliers approach the maintenance of data in database systems. The history of many legacy databases was that they degenerated into disuse through lack of adequate maintenance, leading to poor data and the need to re-collect the data. If data of the required quality was to become the norm, a significant change in attitude and approach to the handling and use of data was required. In developing the strategy for data quality, factors which are specific to the register and which significantly affect the processes for achieving data quality included: -
In general, parties other than the company that owns the assets conduct the vast majority of changes made to the infrastructure. The physical assets can undergo change as part of a planned or unplanned process and the data about the assets will originate from a number of different sources and be presented in many different ways. Originally the asset data was held both by the owning company (Railtrack) and the companies contracted to maintain (the Contractors). Although both sides were attempting to have dialogue about the same assets there existed some fundamental and stubborn problems that tended to prevent a clear and comprehensive view of the assets information. Business View vs Engineering View of Assets The owning company needed a business view of the assets. The contractors desired an engineering view of the assets. An engineering view determined that a larger number of attributes be held about each asset. This was largely unnecessary for the owning company who demanded information that would assist with the management of contracts and investment decisions. The business desired basic information about asset presence ("what it is" and "where it is"), asset performance and the costs incurred to achieve that performance. The engineering view demanded more detail about the widget such as how far into the ground it went, the clearance between the widget and the tunnel wall or the last time it was oiled. However, they were less concerned with the widget's performance as their own. Because their needs were so different, the data required and the use it would be put to were also quite different. In the past attempts had been made to build a comprehensive asset database to try to address the business needs of all - without tangible success. The task was just too great - the elephant was too large to eat. Different Systems and Data Structures In addition to the problems of different desired business objectives and benefits there were the practical considerations of different companies using different asset definitions / labels, different databases (electronic and paper) and different data structures. Some companies were more IT literate, knowledgeable and advanced than others and company size seemed to have little bearing on the IT readiness of a company - even the largest of companies used some form of paper record to hold asset data. The problem is further exacerbated because even within the owning company asset data is held on a number of different systems managed and administered by different disciplines and the same disciplines did not hold the same information about the same asset types across all Zones. Different Standards and Processes The standards laid down with the owning company were widely interpreted within the company - meaning that there were in fact at least 7 different standards. Processes and procedures aimed at applying the standards were equally "loosely" interpreted across the Zonal organisations. That is not to say that any one interpretation was right or wrong - more that it became very difficult to compare and contrast the asset count, condition and performance on a national level. The situation was far worse when our sights were raised to look across the railway companies. Different definitions were being applied to ostensibly the same asset types. There was no agreed way of uniquely identifying an asset thereby and therefore there could be no certainty that information was being exchanged about the same asset. Comparing and evaluating source records across companies for the purpose of assessing the quality was the proverbial stuff of nightmares. Rather than try to find a way of bringing many different and dispirit systems of questionable quality together in one place it seemed more logical and sensible to start with a "clean slate". However, we would not ignore the knowledge gained from the past efforts to implement an asset information system -We realised that to stand half a chance of being successful we needed to know, as a minimum: -
A number of principles were adopted which underlie the developed data quality strategy and included: -
The data acquired by the project and entered into the new asset register went through a number of planned steps to enable it to achieve the desired data quality. These were: - Step 1 - Data Cleansing and Migration (unknown quality) Original data ("legacy") located in a number of different systems (electronic and paper records) was identified, cleansed as far as possible within a fixed time-window and migrated into a new asset register. The users were provided with two interim repositories for storing the "cleansed" data prior to it being uploaded into the asset register. The quality level was considered to be unknown and unmeasured. Step 2 - Data Collection - "Gap-filling and Checking" (to desired AQL) 10 % of the asset data in the register was identified for "gap-fill and verification". An overall high level "data collection" strategy was devised and issued to the Zones and they each worked within this to check their asset records and supply any missing data. However, they had sufficient autonomy to deploy different "local" methods to acquire and check the assets data. Each Zone was commissioned to devise a plan to bring 10% of asset records in an agreed quality level - budgets were allocated against the approved plans. Some concentrated on a track-walk exercise and others conducted a desktop checking exercise to complete their records and assess their quality (using the quality indicators "presence" and "correctness". Some used specialist teams to "collect" the data, some collected using Zone only personnel and others employed a contractor (IMC). Collected data was uploaded in batches into the register and identified and traced by batch number. Batches were first checked visually for record consistency, completeness, integrity and traceability. Batch details were held outside of the repository and standard forms were issued to capture and supply such details. Step 3 - Quality Level Measured - Independent Data Quality Checks Extracts of data from the register were turned into "track-walk lists" and independent data quality engineers sampled the data "on-site" against the "track-walk lists" for "presence" and "correctness". "Missing Assets" were counted as errors and the error count was conducted at attribute (data item) level. The data quality engineers were able to rate the batch samples quantitatively against the set AQLs. Data Quality Reports and Corrective Actions were produced. The Zones then reviewed these. [At the time of writing the Zones are preparing their response to the reports.] Step 4 - Process and Procedure Audits Independent Process Auditors were deployed into each Zone to check on the implementation of the documented business procedures (including the data quality procedures). The auditors were tasked to observe the procedures operating "end-to-end". I.e. observe batches of data representing changed assets being moved through the business and data quality check procedures, into the asset register and out again by way of reports and notifications. Implementation / Key Findings / Recommendations The project feasibility work commenced in the autumn of 1998. By April 1999 enough groundwork was completed to prove that what had to be done could be done. During that time the business requirements were firmed up and the necessary asset data definitions were agreed. Also, a dialogue commenced between Headquarters and the Zones and between the Zones and their suppliers. The organisation and key personnel necessary to deliver the quality data was agreed in anticipation of funding being granted. This enabled the project team to "hit the ground running". The project team had just 8 months to put the system in place, migrate the data and "gap-fill and verify" 10% of all the assets. The programme plan was worked and re-worked to improve the chances of success but even so things did not always go to plan. At the commencement of the project 85 asset types were defined for collection - at a fairly high level. The average maximum number of attributes per asset type was about 20 but we only expected about 12 attributes per asset type in the collection phase. (A number of attributes related to the history of change to the asset - inspections and condition information for example- and would not be available except through the normal maintenance activity). In between funding being awarded and collection commencing it was agreed to store environmental and hazard type information in the same data repository and retire the current systems. There was a serious risk that making late changes to the scope of collection would cause significant delay (scope creep has a tendency to do this) - just changing the data collection forms to handle the new information would have forced a delay. The new data items would demand different quality checks as they were not strictly physical assets. Also, after collection started, a number of additional "supplementary" attributes were added to the data dictionary. However, it was agreed, that in order to minimise the risks of project delay or compromising the quality of the data the new data requirements would be handled separately. "Environmental and Hazard" data would be migrated from the legacy systems by the business - technical support only would be provided by the project team. The project would not take any responsibility for the quality of the data content but would ensure that what was in the legacy systems was transferred intact to the new asset register. The new supplementary attributes were not considered mandatory for data collection but if they existed and could be migrated to the new system they would be - however, they were classified as "AQL = 0" and the project / business would not be judged on the quality of this additional data. This helped to keep the collectors focus on the original data requirements. Despite this more migration effort was required than originally envisaged adding and additional month longer to the programme. Resources that should have been focussed on data collection in July were still tied up in completing migration activity. Slippage in the end date for migration acted as an "open invitation" to the business users to find more data to migrate - unheard of sources of data came to light. The project had a difficult decision to make - migrate the data and delay collection or cut-off the migration activity, alienate the business users and pick up any "newly found" data as part of the ongoing maintenance activity. In the end a compromise was necessary. Migration support continued for an four additional weeks but collection commenced as planned and additional resources were deployed by the core project team. Data not migrated at the end of "migration" was moved into the register as part of data collection and / or maintenance activities. What follows is a distillation of the key issues encountered and the lessons learned, by key project activity, in carrying out this significant project..
There really is a tendency to "want it all NOW". If a business is not sure what data is really important to capture and maintain it will decide to capture everything or capture nothing - neither of these two extremes is helpful. A long drawn out investigation into the data requirements is also not helpful. We found that it was necessary to: - Data collection activity was conducted in different ways within the seven Zones with mixed results. Our findings were: - Data Quality became one of the biggest issues to tackle during the data collection activity. The existing culture did not easily lend itself to "one way of doing things". In addition, there was no easy access to the qualified personnel - the engineers - to check the collected data. The introduction of quantitative methods was also unnerving and unattractive - the whole process of checking data seemed to be too complex, unwieldy and unjustified. In the end the project team supplied a trained, central resource of qualified data quality engineers to visit each Zone in turn to check the quality of the data collected. The engineers were able to written reports of their findings to the Zone and the project team and trial the data quality processes and procedures. The latter was conducted with great success. Our key finding were: Did the project meet all of its declared objectives? Well, almost. We did manage to successfully delivered a populated asset register with a mapping interface. We removed a very diverse set of poorly managed data repositories and replaced them with a single system viewable by all and providing a common set of asset records. We piloted the use of electronic means of collecting and checking asset data using hand held data recorders. (cost, availability and the degree of integration with the asset register remain the outstanding issues here) We successfully migrated the data in the legacy systems that matched the asset dictionary definitions - we actually shifted some 15% more data than was originally projected by the business and spent less than planned. We developed an IT based means of sharing the asset data with parties outside the company (somewhat outside the original remit) and we managed to collect and verify between 7 to 8% of all assets - a little less than the targetted 10%. However, we have to remember that this was essentially a Business lead initiative and not an IT project. The business ambition was the development and implementation of business processes to enable the acquisition and maintenance of good quality data. The project had to show that this could be done and that the "mind set" of the business could be altered to ensure that it was self-perpetuating. To this end we did provide the necessary business and quality processes to ensure that the asset data quality was attained and maintained. We trialled the data quality checking processes and procedures using our own central resources to produce quantitative results. We also devised and gained agreement - company wide - on the definition of asset types to be used within the business and with the other companies within the railway group. However, as the project comes to its natural end and the business system and processes move into the operational environment there are a number of outstanding issues still to be tackled. They are: -
| ||||||||||||||||||||
| © GISdevelopment.net. All rights reserved. |