FULL -- 2h workshop for 20 in-person
File format identification is generally accepted as a key digital preservation activity as it enables preservation practitioners to understand, manage, and mitigate format-related risks and challenges.
File format identification tools such as DROID and Siegfried use multiple methods to identify file formats, including by extension, by binary signature, and by container signature.
Previous iPres workshops (2018, 2022) have tackled binary signature patterns for file format identification. This interactive workshop will build upon these previous workshops and will explore container-based file format identification and container signatures.
The workshop will help participants gain practical skills in the analysis of complex digital files. There will be a brief recap on binary signatures for file format identification, making this workshop accessible for beginners and experienced practitioners and allowing attendees to become familiar with the workshop themes.
Participants will learn about the various structures of file formats, with a particular emphasis on multi-part formats, and how container signatures are used to identify many of these format types. Tools such as hex editors and file packaging utilities will be demonstrated and used to explore these file formats in more detail.
Participants will learn to recognize the indicators of candidate formats for container signature identification, and will gain an understanding of the structure and syntax of container signatures as used by tools such as DROID and Siegfried.
The workshop will cover research techniques for harvesting file format samples and format specifications that are used to verify findings.
Finally, participants will get hands-on with tools for creating and testing container signatures, and will gain familiarity with the process for submitting their file format research to the PRONOM team.
Experienced instructors will be on hand throughout the session to assist with the workshop activities and answer any questions.
While this is an advanced topic and familiarity with PRONOM, file format identification tools, XML data, 7zip, and hex editors will be helpful, the workshop will be of interest to anybody wanting to learn more about file formats and file format identification in general.
This will be an interactive workshop with icebreaker activities and quizzes to verify levels of understanding. We aim to cater to different learning styles, and it is encouraged to buddy-up with fellow participants.
Participants with no prior knowledge of file format identification will benefit from reviewing these materials:
Ross Spencer’s ‘CAFED00Ds and CAFEBABEs’ blogpost
iPres 2018 - PRONOM in Practice workshop
BitCurator Forum 2023 - the Bits in the Bytes workshop
Participants are encouraged to bring laptops ideally with the utilities DROID, 7zip and HxD (or an alternative Hex Editor as preferred) pre-installed, as there will be opportunities to get hands-on with the tools described during the session to explore the themes discussed.