Skip to main content
SearchLoginLogin or Signup

How to Stop Negative Maintenance of Legacy Systems to Rescue Historical Data

Inspiration from Data Archiving in China with the Participation of Comprehensive Archives

Published onSep 05, 2024
How to Stop Negative Maintenance of Legacy Systems to Rescue Historical Data
·

Abstract – Nowadays, rescuing valuable historical data from legacy systems is an important challenge in digital transformation. Long-term passive maintenance of legacy systems threatens the security and availability of historical data, and also brings economic and technical burdens. Based on a review of the relevant literature, this paper categorizes the existing approaches related to legacy systems into a two-dimensional matrix with four quadrants according to the degree of technological and functional change. Building on these approaches, this article suggests a Chinese solution - data archiving, which is based on China’s public sectors’ dilemma and archive management demands. Data archiving incorporates the regional general archives into the process of solving legacy system problems, takes advantage of the expertise of archives in long-term preservation of resources. It can be concluded that the practice of data archiving simultaneously brings value to organizations plagued by legacy systems, to general archives and to the data itself.

Keywords – Legacy System, Data archiving, General archives, Digital Transformation, China.

This paper was submitted for the iPRES2024 conference on March 17, 2024 and reviewed by Susanne van den Eijkel, Elisa Rodenburg and 2 anonymous reviewers. The paper was accepted with reviewer suggestions on May 6, 2024 by co-chairs Heather Moulaison-Sandy (University of Missouri), Jean-Yves Le Meur (CERN) and Julie M. Birkholz (Ghent University & KBR) on behalf of the iPRES2024 Program Committee.

Introduction

Digital transformation has led to fully digital records management in e-government, bringing both benefits in efficiency and challenges in long-term preservation. Current methods to address these challenges include updating, copying, simulating, and continuing to use legacy systems. These approaches maintain the original storage or processing environment for records and data, ensuring the long-term integrity and controlled access to digital objects and metadata. However, this will create a fragmented and complicated IT environment for the organization, making it difficult to seamlessly connect historical data and current data, and hindering technology updates. A significant issue is the legacy systems, which are obsolete ITC systems still in use because their data cannot be converted to other formats or their application programs cannot be upgraded [1]. This results in an inability to seamlessly exchange data between multiple systems and to reliably store records for a long time.

The situation is even more complicated in China's e-government sector. In China's e-government sector, the situation is even more complex. The public sector faces three major contradictions: uneven regional technological development, which hampers data exchange between regions; a lack of integration between national and local systems, which complicates data exchange; and large-scale domestic technological updates, which create difficulties in connecting historical data.

As a result, the public sector must continue to maintain and use legacy systems to preserve historical data due to factors such as expired contracts with original technology suppliers and the need for domestic technology updates. This passive maintenance of legacy systems introduces economic burdens, data silos, low business efficiency, and data security issues, thus affecting the utilization and long-term preservation of historical data. The key to resolving legacy system issues is to ensure the authenticity, integrity, and credibility of historical data without being restricted by the original technical environment. This requirement aligns with the long-term preservation needs of electronic records. Therefore, the participation of state-endorsed regional general archives will become a new option.

In this article we discuss how general archives can act as a proactive actor to assist the public sector in resolving existing legacy system dilemmas. In the second and third sections, we will describe the background of the problem and review previous work. In Section 4, we will discuss a country-level solution in detail. Then based on this solution, we will conduct related discussions in Section 5.

Legacy System Dilemma in China Public Sector

China's e-government has evolved over more than 30 years since the 1990s. But factors such as collapses among technology suppliers and intellectual property barriers have led some public sectors to face the dilemma of maintaining outdated business systems today. In recent years, driven by safety concerns, China's public sector has widely adopted domestically produced technologies. However, challenges remain in integrating both current and outdated technologies. Public sectors that still rely on outdated legacy systems face a difficult predicament. On one hand, they must allocate substantial funds to maintain these systems to access their extensive and valuable historical data. On the other hand, they need to invest in developing new business systems to integrate with existing national systems and meet the demands of government integration. This dual funding requirement places a significant financial burden on non-profit public sectors. Additionally, there is uncertainty about when the legacy system will finally be retired and how to properly manage its historical data. The legacy system hangs over them like a sword of Damocles.

In theory, only need to export the historical data from the legacy business systems and merge it with current systems’ data to store in a unified database. In reality, legacy business systems have technical barriers that make it impossible to export data or guarantee its validity. For the public sectors, historical data in the legacy systems is still important for current business and cross-departmental data sharing. Therefore, these historical data need to guarantee its validity and be preserved for a long time to meet the needs of public sectors data utilization.

Given these needs, it is essential to have an entity that is trustworthy, authoritative, possesses a large amount of data, and has the capability for long-term preservation. From this perspective, regional general archives are well-suited to meet these requirements.

Firstly, from an institutional perspective, China's archiving system mandates public sectors such as the Human Resources and Social Security Bureau to regularly transfer records within the scope of archiving to the general archives. "Archiving" refers to the process of systematically arranging and transferring records, once completed and deemed to possess preservation value, to the archives. During the archiving process, records progress through three stages: "Business System – Records Office (no mentioned in this paper) – General Archive". The new Regulation on the Implementation of the Archives Law of the People's Republic of China, clearly state that due to the limitations of an organization's storage conditions, records can be transferred to the archives for storage in advance. Therefore, public sectors can collaborate with general archives to transfer their electronic records and management authority in advance. It is worth noting that in China, records are considered as archives not only when they are appraised as having permanent preservation value, but from the moment they enter the archives. This means that the act of archiving is effectively completed as soon as the records are transferred to the general archives.

Secondly, from a technical perspective, China has started building digital archives since 2000, and has built a number of high-level digital archives and offices. At the system level, the three stages of "Business system – Records office – General archive" correspond to "Business system – EDMS & ERMS –Digital archive system & TDR". At the technical standard level, policies such as the Interim Measures for Electronic Document Archiving Management have been issued, and standards such as the Electronic document archiving management specifications and a series of archive information packages based on OAIS have been implemented. Therefore, general archives have a relatively stable technical foundation.

From a management perspective, the involvement of general archives can address the issue of data silos between public sectors. Since the business systems used by various public sectors are not uniform, the digital archive system of general archives can integrate data from different business systems, whether current or outdated. Furthermore, with the trend toward digitization, data can be archived in real-time, making record integration a reality.

Therefore, when faced with the challenge of preserving historical data in legacy systems, public sectors can collaborate with general archives to address this issue effectively.

Literature review

The awkward position of legacy system in the process of digital transformation has posed great threat to its digital preservation. Many organizations are neither able to properly maintain these systems nor completely remove them, which has been described as the state of being "caught between"[2]. Scholars conceptualize this phenomenon as system atrophy [3] or technology renewal [4], and demonstrate the complexity of the process of dealing with legacy systems [5]. Legacy systems introduce a multitude of issues, such as stunted evolution and worsening bugs [3], increased cybersecurity risks of organizations [6], higher maintenance costs [7], failure to meet regulation requirements [8], and non-integrated IT silos [9]. This section primarily reviews the literature on two critical issues related to legacy systems maintained in negative manner: (a) what the causes behind; (b) What are the existing solutions?

Causes of passive maintenance of legacy systems

The causes of passive maintenance of legacy systems are so varied that scholars suggest seven levels of complexity in transforming legacy systems [10], which can be divided into technical and non-technical perspectives. From the technical perspective, one notable cause is the inherent shortcomings of legacy systems. These shortcomings, such as their monolithic architecture [11], non-standard structures and scale [12], pose significant challenges to system updates.. In addition, there is a high degree of interconnectivity between legacy systems and other digital transformation challenges [13], so abruptly discontinuing legacy systems can result in the new system failing to function and provide services properly [14].

Unlike the technical perspective, the non-technical perspective places extra emphasis on the societal factors that trap organizations in passive maintenance of legacy systems. The alignment between organizations and information technology is considered a coevolution process.[15] The organization's economic considerations, cognition, and behavior can all influence the emergence and resolution of legacy system issues. Regarding economic considerations, legacy systems increase technical debt by creating technical inertia [16] and social inertia [17], which further hinders efforts to improve the current situation [18]. In terms of cognition, the structure and components of legacy systems are influenced by the cognitive assumptions of their builders and the surrounding social context [19], leading to deep personalization and specialization [20]. Consequently, organizations find it increasingly difficult to break free from their reliance on these systems. And users' entrenched habits may further inhibit the discontinuation of legacy systems. [21] As for behavior, complex cycles in the client's decision-making process and the vendor’s communicative actions when purchasing new systems [22], along with new ways of working introduced by change management related to legacy systems [23], also contribute to the passive maintenance of these systems.

Antidote for passive maintenance of legacy systems

Solutions to legacy system also can be viewed from both technical and non-technical perspectives. In the technical realm, solutions involve different strategies, approaches and tools.

 Among them, the radical strategy advocates for the complete discontinuation of legacy systems [8], the conservative strategy suggests temporarily intensifying the legitimization of the legacy system [20], while the compromise strategy, which is more widely accepted, views this as a continuous rather than binary process [24].

Fig. 1 conceptualizes approaches resolving passive maintenance of legacy systems as a two-dimensional matrix with four quadrants, based on the degree of change (high and low) in both functionality and technology dimensions. The degree of change cannot be precisely quantified, only a rough distinction between high and low degrees can be made. For example, in the technology dimension, a system requiring a major update or re-architecture represents a high degree of change, with replacement being the highest degree. Conversely, a low degree of change may involve the system remaining mostly static with only minor adjustments, with retention being the lowest degree of change. And different approaches belong to different quadrants, with each quadrant representing a fundamental category of approaches. Quadrant I includes approaches for the radical upgrade or replacement of legacy systems, such as replacement [25], re-architecture [26], revamping [7]. The approaches in Quadrant II aim to enhance the functionality of legacy systems without significant changes or updates in technology dimension. However, during the review process, no approaches matching the characteristics of Quadrant II were found, due to the fact that without technical adjustments such as code or architecture modifications, it is challenging to improve system functionality. Approaches in Quadrant III do not provide significant enhancements to the original functionality of the system nor major technological upgrades. Examples include retain [27], encapsulation [28], mirroring [9]. Quadrant IV encompasses approaches based on a conservative strategy, focusing on optimizing and upgrading the technology dimension without significant changes to functionality, making the changes relatively manageable. Examples include retire/rationalize [27], re-host [29], refactor [30], re-engineering [31], re-implementing [32], transpilation [25], emulation [33]. Regarding tools, they are primarily used to extract knowledge [34] and backup documents [35] from legacy systems.

Fig.1 Categorization of approaches to resolving passive maintenance of legacy systems

In the non-technical realm, scholars have explored various approaches to successfully modernize legacy systems [36], including perspectives from knowledge management [37] and task conflict resolution [38].

Although much can be learned from the literature above, the approaches in each quadrant still have their own drawbacks when scrutinized. For example, approaches in Quadrant I pose high time costs and technical challenges, with the risk of system downtime. Approaches in Quadrant II are based on temporary coping strategies and do not address the root of the problem. Approaches in Quadrant IV, while currently the most applicable, are weak in meeting new functional requirements and may accumulate new technical debt.

Based on the existing research reviewed in this section, this paper aims to explore how the long-term preservation of critical data can partially address legacy system issues while also meeting organizational functional needs, such as data utilization for business purposes, with minimal technical burden and cost.

Data Archiving: A Chinese Approach to Long-term Preservation of Historical Data in Legacy Systems

The information system of the Human Resources and Social Security Bureau of a county-level city in Zhejiang Province, China, was self-built by the local government. It faced the dilemma of the contract with the original technical service provider having expired, and the national information system, constructed systematically, replacing some of its functions. Consequently, the system fell into a state of technological obsolescence and functional degradation, becoming a legacy system.

To retain valuable historical data in the original system, they must continue maintaining the original system. In cooperation with the archives department, they began to rescue data in the legacy system through the data archiving (Fig. 2) to address the current dilemma and prevent the continuous accumulation of technical debt.

Fig.2 Process of data archiving

Formation of Archiving Submission Information Package

To eliminate data dependency on the system, the first step is to restore its original semantic environment by using existing technical documents to analyze the legacy system and extract its intrinsic models, algorithms, and rules. For structured data, it is necessary to analyze its database table structure. Following this, the data archiving scope and specific database columns that need to be archived should be determined according to existing data management requirements and regulations, thereby forming an archiving submission information package.

An archiving submission information package generally consists of structured data sets, unstructured data sets, business form templates, and archiving configuration information from legacy system. Structured data sets are multiple XML files converted from the structured data of the legacy system according to China's "Archival relational database transform into XML files (DA/T 57-2014)". Unstructured datasets consist of fundamental information about the primary business service items of the legacy system's original functions, along with their associated supporting materials. Business form templates include the templates themselves, sources of form data, and a data dictionary. The archiving configuration information set records the available configuration information of the legacy system. Data within the package is standardized for universal usage, enabling migration and long-term preservation, and can be directly restored into the form of relational databases. Furthermore, to guarantee the authenticity and integrity of the data in the package, ensuring traceability and accountability, evidential certification of archival data will be conducted. Evidential certification of archiving data involves operations to solidify, preserve, and verify the data or related information that demonstrates its authenticity and integrity. Technologies such as blockchain, trusted timestamps, and digital digests will be applied in this process.

Transfer to General Archives

Under the "pre-custodial paradigm", paper-based archives are considered the primary objects of archives management. It has been a long process to integrate electronic records and data into the domain of archival management. Therefore, the formation of an archiving submission information package is equivalent to delaying the completion of the archiving task, and it also becomes a great opportunity to rescue business data in the legacy system. The formation of an archiving submission information package ultimately completes the data archiving tasks that were supposed to be finished when the system was still in regular use but remained undone. It also provides an opportunity to rescue and preserve critical business data within the legacy system.

After the package is formed, the organization affected by the legacy system transfers the package and the corresponding handover list to the same level of the Regional General Archives. Upon capturing, counting, and checking the transferred package, archivists will sign the handover form and officially receive the package into their electronic record management system if there are no errors. When checking the archiving submission information package, the first step is to ensure security by scanning for viruses. Subsequently, the package is unpacked and loaded to confirm the successful parsing of structured data, restoring it to the database and verifying its usability. Following this, each entry is compared against the hash value generated during the evidential certification of the archiving data to confirm its authenticity and integrity.

The archives then store the package in a long-term repository within electronic record management system and solidifies the process information automatically formed by the system. To facilitate data use and ensure data security, archived data will be synchronized to a data utilization repository that is physically isolated from the long-term repository. On one hand, data sharing with other public departments that have data access rights is facilitated, enabling the rescue of data from silos caused by legacy systems and re-establishing connections with other datasets. On the other hand, general archives can provide data certification services, such as the Human Resources and Social Security Bureau offering proof of employment duration based on labor data, leveraging their credibility and a thorough, traceable verification process. To provide data certification services, certification templates must be pre-designed based on user needs. The data source location table is then used to identify the positions of the required columns within the structured datasets. For the translation of specific columns, the data dictionary can be consulted.

Discussion

Introducing general archives into the solutions for rescuing historical data from legacy systems in the public sector has a positive impact on both the public sector itself and the general archives.

For Public Sectors    

Public sectors directly benefit from data archiving, as demonstrated by the Human Resources and Social Security Bureau of a county-level city in Zhejiang Province, China. This bureau successfully rescued data resources stranded in legacy systems through effective data archiving. With the national information system systematically constructed to replace certain functions, and general archives taking on the responsibility for the long-term preservation of historical data, organizations are no longer forced to maintain outdated systems. This shift eliminates the need for passive maintenance of legacy systems, directly reducing financial burdens and preventing the continued accumulation of technical debt.

For General Archives

Archiving data from legacy systems has emerged as a new approach for general archives to actively engage in data management and address data challenges. In China, these challenges arise from the high demands on management capabilities posed by data-based archival subjects and the impact of establishing the National Data Bureau.

To support the development of data-related fundamental institutions, China established the National Data Bureau in 2023. But the division of responsibilities between the National Data Bureau and National Archives Administration remains unclear. NAA face disadvantages in terms of technical support and resource ownership, raising concerns about their potential displacement. Through data archiving, the relationship between the Data Bureau and the Archives Administration has shifted from competition to cooperation. Drawing on their extensive experience in information resources long-term preservation, general archives are actively involved in rescuing data from legacy systems. In the future, more public sector information system data can be regularly archived in general archives based on this approach, preventing recurring security issues with historical data from legacy systems. Additionally, general archives can strengthen their connections with front-end business systems—i.e., the creators of the data—during the archiving process, thereby enhancing their social influence and service capabilities.

For Historical Data in Legacy System

Through data archiving, data can be used as evidence just like physical archives. When users initiate a request for data certification, the digital archives system can quickly provide services by automatically extracting the required fields and populating business form templates stored in the archiving submission information package, thereby activating the evidentiary value of the data.

Additionally, data archiving enables historical data from legacy systems to be used for a wider range of purposes. Historical data archived from legacy systems is not only stored in a long-term preservation repository but also replicated to a data utilization repository for easy access. This reconnects the data with user needs, paving the way for deeper development and utilization of data resources.

Conclusion

Based on the institutional context in China, data archiving introduces general archives to address the dilemma posed by legacy systems.  Through a series of operations—including the creation of archiving submission information packages, data certification, data transfer, and reception—this approach ensures that historical data from legacy systems,  is properly preserved and supports further utilization.  Consequently, it establishes a Chinese solution for the rescue and preservation of historical data from legacy systems, balancing both feasibility and validity.

The practice of data archiving in China is still evolving, necessitating further research to address several key questions:

  • What are the best methods for archiving semi-structured and unstructured data?

  • How can various types of archived data be managed collaboratively?

  • How can the general archives enhance their collaboration with the business units that generate the data?

Resolving these questions requires the engagement of diverse stakeholders throughout the data lifecycle, enhanced institutional backing, and the provision of appropriate technical support. These collaborative efforts are crucial to mitigate the challenges posed by legacy systems and to ensure the long-term preservation of data resources.

ACKNOWLEDGMENTS

The authors wish to express their gratitude to Professor Yi Qian (Renmin University of China), Zhaopan Liu (Technical Director of ECOMINFO Technology Co., Ltd), and the leaders and staff of the comprehensive archive in county-level city mentioned above, who provided help and support in the completion of this paper.

Comments
0
comment
No comments here
Why not start the discussion?