Abstract – This paper examines the significant properties of Upper Austrian digital authoritative records (‘digitaler Verwaltungsakt’) archived in an OAIS-conforming digital repository based on Records in Contexts (RiC), while especially considering legal implications pertaining to preservation planning.
Keywords – Significant Properties, Records in Context, Preservation Planning, authoritative records (‘digitaler Verwaltungsakt’)
This paper was submitted for the iPRES2024 conference on March 17, 2024 and reviewed by Inge Hofsink, Karin Bredenberg, Jan Hutar and 1 anonymous reviewer. The paper was accepted with reviewer suggestions on May 6, 2024 by co-chairs Heather Moulaison-Sandy (University of Missouri), Jean-Yves Le Meur (CERN) and Julie M. Birkholz (Ghent University & KBR) on behalf of the iPRES2024 Program Committee.
The Austrian federal and state administrations started implementing e-government processes in the early 2000s. One of the most important steps included introducing a Records Management System (RMS) [1], that would allow for the whole administration to work digitally. In 2004 the federal administration implemented the ELAK (‘elektronischer Akt’ / electronic record) and starting in 2005 the nine states commenced their own implementations. The federal administration uses their own ELAK standard, while the nine states utilize a variation called ‘Länderstandard’ [2]. In the state of Upper Austria, a RMS capable of ELAK was introduced in 2005 and has since been implemented step by step, with 2017 marking the year when almost all subordinate organisational units of the Upper Austrian administration had fully implemented the ELAK. The RMS in use is MoReq2 certified [3] but the ELAK itself has not been formally standardised. With the introduction of digital records in Upper Austria, the ground-breaking decision was made to fully disallow the creation of analogue records unless no other possibility of capture [4] existed.
This digital transformation required a reflection on longstanding traditions of records management principles (or ‘Schriftgutverwaltung’ in the analogue era) on federal and state level and meant adapting what worked well to the digital environment. Records created in this manner, called ‘Akten’, have been based on laws, intraorganizational norms and the written-form principle ‘quod non est in actis, non est in mundo’ [5] for more than two centuries. Analogue records creation has been impacted by the significant technological changes of the 20th century.
As the creation of records is a process determined largely by intraorganizational norms [6] and certain laws [7], the terms ‘Akt’ and especially ‘Verwaltungsakt’ (loosely translated to ‘governmental administrative record’) differ significantly from the English term ‘record’. While ‘record’ is usually interpreted as ‘information created or received and maintained as evidence’ [8] or as ‘documenting activity that was carried out by agents’ [9], ‘Verwaltungsakt’ implies a legal and governmental origin. The term ‘Akt’ describes a certain structuring of documents and information and ‘the orderly physical summarization of all related documentation pertaining to a certain matter or matter of business (Geschäftsfall)’ [10]. Records called ‘Verwaltungsakt’ are always created in accordance with a structured file plan [11] in a governmental organisational unit that has legally determined responsibilities. The probative value [12] of these kinds of records in a legal setting differs from records created by non-governmental persons or institutions, which can at times also be called ‘Akt’.
Records creation is regulated and codified in internal organisational norms, for Upper Austria the AVD (Allgemeine Vorschrift zur Dokumentenverwaltung) [13], which defines the life-cycle of records from creation to archive and describes the required steps in detail. Correspondingly, a single paper (or digital) document is not strictly an ‘Akt’ in this sense, it needs unique identifiers in accordance with organisational norms, structuring to more accurately classify business processes or information units and administrative metadata to describe the highest level of information as well as all subordinate levels. These structural elements of records are called ‘Geschäftsfall’ and ‘Geschäftsstück’ in a governmental setting [13]. The former is not mandatory, describes a unit of information that belongs intellectually to the larger unit of the ‘Akt’ but is frequently contained in itself and if it occurs contains 1-n ‘Geschäftsstücke’. The latter is the result (or evidence) of a completion of a process step resulting in one or more documents, has to occur at least once and has to contain at least one document. Considering these circumstances, a more fitting English equivalent for ‘Verwaltungsakt’ would be ‘authoritative record’ [15], which will henceforth be used in this paper. ISO 30300:2020(E) requires the ‘authoritative record’ to possess authenticity, reliability, integrity and usability [15], which are qualities that the ‘Verwaltungsakt’ is required to possess and which are ascribed to it due to its governmental provenance.
Authenticity is ascribed due to the provenance and structure of records as well as visible markers, such as the use of official graphic elements and signatures, whether digital or analogue. To facilitate control over integrity, reliability and usability, the ELAK has been realised as an XML-scheme called EDIDOC [16] . The ELAK is the structural basis of the currently employed RMS, which at this point in time is the only one in use capable of producing records in the abovementioned sense. The EDIDOC container serves as a vehicle format for transfer of records between systems and administrations.
EDIDOC Version 1.0 [16] came into existence in 2014, after the previous version called EDIAKT-II had been expanded and improved upon. EDIDOC allows for the transfer of records between systems [17] and maintains the integral order and structure of the information, while also making it accessible to the general public if necessary. The EDIDOC container format essentially has a vehicle function and comes into existence only when records leave their creation system. For purposes of uniformity between creation systems, the EDIDOC XML scheme consists of a header, process data, metadata, payload and signatures [18]. The header contains the current version of the EDIDOC package and information about sender, receiver and purpose of transmission. The process data section describes all relevant business and editing processes captured within the package, while the metadata section encompasses administrative and technical metadata. The subsequent payload is composed of content in the form of structural metadata and information about creation, editing processes and agents. Finally, the signature section contains information about the signatures employed in the editing process of the record providing traceability of administrative actions, which record a timestamp and the administrative agent by a personal identifier but do not qualify as digital signatures [19].
To appropriately convey the structural elements of ‘authoritative records’, a system consisting of four layers, which are realised as structured payloads within the XML frame, was devised. The highest and most important layer is Layer 3, which describes the whole ‘authoritative record’ with its substructure and contains administrative and some technical metadata from the creation and editing process [20]. This metadata is partially created automatically within the RMS and partially entered by the editing governmental agent [21]. Layer 2 realises the ‘Geschäftsfall’ with metadata pertaining to this logical unit [22], which is not mandatory in the Upper Austrian implementation. Layer 1 contains the ‘Geschäftsstück’ and its relevant metadata [23]. Layer 0 contains the documents (= Schriftstück) [24], which in turn contain the content information [25]. In the Upper Austrian implementation, the element ‘SpecialData’ [23] occurs only at the end of each EDIDOC XML, which includes certain metadata that are recorded in the RMS and unique to Upper Austrian usage.
EDIDOC packages can be extracted out of the RMS when the ‘authoritative records’ have reached a certain point in their life-cycle, namely when they are not needed frequently [26] by their creation organisation. The editing process determines that every ‘authoritative record’ has to be closed when it is clear that no further additions will be necessary. If no changes have been made within a predetermined period after closure, the ‘authoritative record’ is essentially ready to be archived. Governmental structures and organisational units have file plans as their records management principles, which determine closure periods and appraisal.
Archival appraisal is done in advance and on file plan level for each governmental organisational unit. These file plans are realised in the RMS, so that every ‘authoritative record’ belongs to a specific and predetermined group in accordance with business processes and tasks. On file plan level, time spans and deadlines for archiving are implemented, which activate once the digital ‘authoritative record’ has been closed. Metadata concerning parent organisation, file plan position and appraisal are recorded within the RMS and transferred into the EDIDOC package when it is created at the time of extraction. There are two variations of appraisal attributes, being M (only metadata is archived) and MC (metadata and content are being archived). If the appraisal attribute is M, documents and metadata of Layer 0 objects not extracted out of the RMS.
Thus, only the technical and administrative metadata of Layers 3, 2 and 1 will be archived and serve as a documented ‘register’ of what once existed to facilitate traceability of administrative actions. RiC-CM specifies that every record resource has to have been instantiated at least once [27], which is true for this case although the original instantiation does not manifest itself after the point of extraction. The metadata and content-xml for records with the M attribute constitute a (partial and) derived instantiation of the same record [27] for archival purposes with the original instantiation, being the one represented in the RMS, no longer (fully) in existence.
One authoritative record becomes one EDIDOC package which in turn becomes one SIP [28] and finally one AIP [29]. Within this digital folder lie the actual EDIDOC-container and an XML document that contains some of the administrative and technical metadata compiled from elswhere in the EDIDOC, called the ‘Steuerungs-XML’. This document is created at the time of extraction, is not an intrinsic part of the ‘authoritative record’ and only serves to facilitate automatized processing of the record during the ingest process [32].
Within the EDIDOC-container on the highest level there is a specific metadata document in XML format, called the ‘content-xml’. This is the most important part of an EDIDOC package as it contains the structural information of the layers, the administrative and technical metadata as well as information of the so-called signature section. If the appraisal was M (metadata), Layer 0 is eliminated in the RMS as the last step of the ingest process being completed, but (empty) digital folders that represent the original structure are still being created when a DIP is requested. If the appraisal was MC (metadata and content), digital folders are created for every Layer 2 and 1 information unit that contain the actual Layer 0 objects with the content information [26] as a DIP.
During the ingest process the EDIDOC package is essentially dissolved and the extracted data and documents undergo a variety of documented quality assurance steps that determine the integrity and authenticity of the deposition. Predetermined migration steps are then carried out and archival persistent identifiers (PID) are assigned to the deposition. Subsequently, administrative and structural metadata is extracted out of the EDIDOC package and additional technical metadata (including but not limited to file format identification data, mime-type, size of object) is being generated during the ingest process.
These three kinds of metadata are recorded in a Matterhorn METS file [31] according to a predetermined mapping and each step of the ingest process is documented in the METS file as PREMIS events [32]. The METS file is then converted into a RDF file [35], which is based on RiC-O [34] for structural and descriptive metadata and PREMIS for technical and administrative metadata, and finally, the AIP is imported into the repository. The locally hosted repository is based on Fedora 6 [35] and operates as a linked data platform [33] (see figure 2.).
Once within the repository, the SPARQL-query language [36] can be used to pinpoint information for each RiC-O entity, relation or attribute [37]. This fine granularity of metadata allows for precise targeting of information that is crucial for determining preservation steps. Thus, preservation planning largely depends on this granularity and as migration [38] has been established as the preferred preservation strategy by policy, preservation steps have to be determined in accordance with it. Each object within the repository has a unique PID and due to the network structure of RiC each PID is innately connected with its corresponding metadata. In the Upper Austrian State Archive an authoritative record is understood as a record in the RiC-sense, while multiple records belonging to the same file plan position originating from a single provenance, which is usually one specific governmental organisational unit, constitute a record set and structural elements of the authoritative records are understood as record parts [27]. Finally, rules for preservation planning can be set up that target all record resources or instantiations with a certain attribute or certain relations. For this to be possible and useful, significant properties have to be determined.
Considering the strong legal implications, the archiving workflow was designed with utmost care and emphasis on traceability of applied steps. Considerations from a legal and data protection point of view, as to what were significantly important properties of our authoritative records, were made long before the first ingest took place [39]. Thus, developing a policy for significant properties could be built upon the foundations of these considerations. To these, archival considerations and community developed best practice strategies were added, which adapted previously applied principles. Quickly, it became evident that although there were plenty of resources and studies about significant properties of individual types of documents such as text, images, spreadsheets, none quite resembled the Upper Austrian case.
Considering the prevalence of the term and concept ‘Akt’ especially in central Europe and the German-speaking countries, where slight differences in meaning exist but one would nonetheless expect quite similar manifestations, it was surprising to find that only a handful of archivists from German-speaking archives had published articles about significant properties. This can on one hand be attributed to the fact that, although digitalisation and e-Government processes are becoming more and more prevalent in the German speaking countries, widely different levels of implementations exist. On the other hand, it is evident from an archival point of view that time periods for extracting digital records out of their creation systems for archiving vary considerably, which in turn is a factor in how digital repository projects are conceived.
The Nestor Association [40] constituted a working group on digital preservation [41], which published a guide on digital preservation in 2012 [42]. The Nestor guide addresses strategies and provides examples for defining significant properties of certain information types such as text, images, audio, structured information, GIS and software [43]. Schmidt refers to this guide in considering the definition of significant properties for container formats to be a futile effort [44]. Since containers can hold any possible format, Schmidt argues that it would be technically speaking too imprecise for concrete preservation steps but concurs that significant properties cannot only be technical [44]. In chapter 1, it has been established that the Upper Austrian authoritative record is more than the sum of its parts due to context of creation and usage, and exists in two types, namely M (metadata) and MC (metadata and content). Conclusively, significant properties cannot exclusively be determined for Layer 0 objects, especially as Layer 0 objects exist only for authoritative records that have been appraised with the MC attribute.
Additionally, Layer 0 can contain nearly any type of file format, as the RMS disallows very little (e.g. executables) to facilitate usability for a wide variety of cases. A certain variety of file formats then make it into the repository itself, since predetermined migration steps are employed during the ingest process. This simplifies the necessary preservation planning and allows for already established steps, that have been published and discussed over the course of the last two decades by the digital preservation community, to be employed. Therefore, significant properties of spreadsheets, text documents and similar types are not part of this discussion. Yet, the already established discussion in the digital preservation community does not include records in the abovementioned sense [45].
Firstly, when authoritative records with the attribute M are ingested, the only tangible digital document that survives is the content-xml containing the most important metadata. But this alone does not constitute the authoritative record in its final AIP form. To accurately preserve the legal and structural elements, the data structure realised in entities, attributes and relations, necessarily has to be interpreted as part of the archival manifestation or in RiC terms as instantiation [27] of the digital authoritative record. This implicates that the metadata along with any kind of object, such as the content-xml in M records or Layer 0 objects in MC records, fully represent the authoritative record.
Legally, this does not constitute problems, as the information itself is the target of applicable legislation, which is expressed in a technology-neutral manner. For preservation planning and determining preservation steps, the starting point has to be the AIP as the only surviving and complete version of the record. Therefore, only metadata and objects that constitute the AIP can be taken into consideration [46]. With the dissolution of the EDIDOC package during the ingest process, this signifies that especially for metadata the entities, attributes and relations realised in the RiC-based repository have to be carefully observed as they convey the properties deemed to be significant.
Furthermore, since the vehicle format for initial depositions has already changed twice over the last two decades, from EDIAKT to EDIAKT II to the current EDIDOC, while the structure of the records has remained unchanged, defining the significant properties for the Upper Austrian digital authoritative record cannot be dependant on the current vehicle format. While currently there is only one RMS capable of producing records of this kind in Upper Austria, other RMS for specific governmental organisations are under development at the moment. These will also create output in the EDIDOC format, but the final result might differ slightly in terms of tags used and metadata entered.
Considering this, the significant properties have to be specified first and foremost with the intellectual entity of the record itself or in RiC-terms [27] with the record resource, not the specific instantiation in mind. Conclusively, the Nestor proposed separation of container formats into different groups of digital records with similar properties [47], is only applicable in a technical sense and in the Upper Austrian case for records with the MC attribute. Yet, the structural metadata inherent and central to our authoritative records is completely disregarded in such an approach.
Secondly, a strict separation between provenance and designated community is not entirely possible [48], as the legal circumstances include both facets. The designated communities in this case are on one hand the administration itself and on the other hand citizens, who are affected or involved. For the latter, the archive has to provide traceability and understandability of administrative actions due to legal requirements, in the form of access to records and their contents. For the former the archive serves as a knowledge base of past decisions and processes. In both cases humans are the ones requesting archived records in whatever form they may be and, in both cases, legal prerequisites exist. Both designated communities expect the presented archival units to be intelligible in a way that information can be gained from perceiving (reading, watching etc.) the records. Therefore, the Upper Austrian State Archive has to orient its preservation policies along these (legal) requirements.
While for preservation planning itself, the AIP has to be targeted, pinpointing which properties are significant in the eyes of the two stakeholder groups, especially considering legal aspects, necessarily, has to be done from a pre-ingest point of view on the records [49]. This denotes that for practically defining significant properties, the document containing the most relevant administrative and technical metadata, namely the content-xml, has to be the primary source of information. With the possibility of fine granularity regarding archival metadata in the form of RiC based entities, relations and attributes, the significant properties will be captured as such [50]. This enables a higher degree of traceability of archival measures taken [51].
For the Upper Austrian State Archive, it was central to note that legal considerations were immensely important in coming up with the significant property definitions for this use case. In a way, these are catering to the needs of our previously described designated communities. As these records are born digital, access will be given fully digitally as well. Subsequently, the decision was made to regard the concept of significant properties on two levels. Firstly, considering the predefined structure of our records, it was necessary to define significant properties pertaining to the whole record and to records with both the M and MC appraisal attribute. Secondly, for records with the MC attribute, significant properties for the individual documents on the lowest structural level are defined in concordance with community developed strategies [52]. The latter will not be discussed in this paper.
In an initial phase of analysis, the five categories of significant properties developed in the InSPECT Project were found to be applicable for the Upper Austrian digital authoritative record as they thematically reflect legal and normative requirements. The categories are content, context, rendering, structure and behaviour [53]. Properties printed in bold signify legal requirements. The properties were chosen because they are markers for authenticity, integrity, reliability and correspond with the legal requirements to store and provide access to authoritative records for citizens in a way that each record can be recognised as a genuine authoritative record, which was created in accordance with records creation principles. Overarchingly and although this expresses an adherence to the current technical manifestation of the record, the content-xml has been determined to be uniquely important, as it contains the entirety of administrative, structural and technical metadata of each authoritative record. It will, therefore, have to be preserved in its entirety.
Content Information – MC
Subject of each structural element of the digital authoritative record – M and MC
Timestamps of creation and closure – M and MC
Purpose of the record (self-statement of the digital object: ‘Akt’) – M and MC
Type of structural element (Geschäftsfall/Geschäftsstück/document) for each record part and corresponding hierarchical level – M and MC
As this category primarily concerns authoritative records with the MC attribute and individual types of documents in various formats, for which community developed best practice models exist, no specifics will be discussed here [55].
Purpose of creation (‘for archive’) – M and MC
Provenance (governmental organisational unit and file group reference) – M and MC
Purpose of creation (file plan position) – M and MC
Legal purposes of creation – M and MC
Appraisal (M/MC) – M and MC
Connecting identifier derived from records creation (whole record) – M and MC
Connecting identifier derived from records creation (record parts) – M and MC
Connecting identifier (technical, from RMS) – M and MC
Connectors (relations of entities and attributes in repository) – M and MC
Process data with timestamps, signatures and names – M and MC
Addressee (Geschäftsstück) – M and MC
Sender (Geschäftsstück) – M and MC
Return receipt (existent/not existent) – M and MC
Reference identifiers (to other records) – M and MC
Date of extraction – M and MC
Authenticity/Integrity = hash value of package – M and MC
Authenticity: timestamps of PREMIS events in METS file – M and MC
Digital signature and qualification information (digital documents) – MC
Logical Structure of record and record parts – M and MC
Original number of files – M and MC
Name and version of vehicle format – M and MC
Most behavioural significant properties for individual types of documents can be gathered from community developed best practice models and will not be further discussed here [55]. There is only one overarching requirement in the behaviour category:
Human readability – M and MC
In the Upper Austrian State Archive, we are confronted with a situation where legal requirements pertaining to authoritative records, necessitate thinking outside of the already established concepts for defining significant properties. On one hand, the digital authoritative records are not simply files but more complicated constructs that are being held together by administrative, descriptive, structural and technical metadata. Their archival manifestation differs from their SIP format and appraisal differentiates records in two categories, one containing only metadata, the other containing metadata and content. On the other hand, our stakeholders, being the administration itself and citizens, have legal rights of access and certain expectations regarding the records, namely authenticity, integrity, reliability and usability. Therefore, the defined significant properties follow already established categories, but do so on a slightly more abstract level, namely the level of the intellectual unit or in RiC-terms on the level of the record, not the instantiation. As the authoritative records rarely change in structural composition, but occasionally in technical (vehicle) format, the significant properties were chosen in a way that allows for application, even if formats and creation systems change. Significant properties of concrete digital documents (such as text files, spreadsheets etc.) contained within our authoritative records, with the MC attribute, will be oriented along community developed strategies.
I would like to thank Jakob Wührer and Fabian Müller for their thoughtful suggestions and willingness to discuss this topic in great detail.