Digital Cultural Heritage Tools - 2024 Roundup
Continuing my roundup of resources from last year. This time, tools, platforms, standards and other types of useable resources mostly relating to digital cultural heritage but sometimes more general.
AI
Croissant Format - "The Croissant metadata format simplifies how data is used by ML models. It provides a vocabulary for dataset attributes, streamlining how data is loaded across ML frameworks such as PyTorch, TensorFlow or JAX. In doing so, Croissant enables the interchange of datasets between ML frameworks and beyond, tackling a variety of discoverability, portability, reproducibility, and responsible AI (RAI) challenges." - https://docs.mlcommons.org/croissant/docs/croissant-spec.html
Ai2 OpenScholar - "To help scientists effectively navigate and synthesize scientific literature, we introduce Ai2 OpenScholar—a collaborative effort between the University of Washington and the Allen Institute for AI. OpenScholar is a retrieval-augmented language model (LM) designed to answer user queries by first searching for relevant papers in the literature and then generating responses grounded in those sources. Below are some examples:" - https://allenai.org/blog/openscholar
chonkie - The no-nonsense RAG chunking library that's lightweight, lightning-fast, and ready to CHONK your texts - https://github.com/chonkie-ai/chonkie
AI4Culture - "AI4Culture is designed to provide comprehensive training and resources for individuals and institutions interested in applying Artificial Intelligence (AI) technologies in the cultural heritage sector. It aims to empower users with the knowledge and skills needed to leverage AI for preserving, managing, and promoting cultural heritage." - https://ai4culture.eu/
AI Risk Repository - "A comprehensive living database of over 700 AI risks categorized by their cause and risk domain." - https://airisk.mit.edu/
Cultural Heritage Collections Visualisation
Collection Space Navigator - https://datalab.allardpierson.nl/meet_the_data/01_Explore%20the%20Collections.html - Allard Pierson, University of Amsterdam
InTaVia platform for "[...] supports the practices of data retrieval, creation, curation, analysis, and communication with coherent visualization support for multiple types of entities. We illustrate the added value of this open platform for storytelling with four case studies, focusing on (a) the life of Albrecht Dürer (person biography), (b) the Saliera salt cellar by Benvenuto Cellini (object biography), (c) the artist community of Lake Tuusula (group biography), and (d) the history of the Hofburg building complex in Vienna (place biography)." - https://intavia.eu/
Cultural Heritage Cataloguing
Dédalo - https://dedalo.dev/the_project#dd1100_62 - "Dédalo is a project focused on the field of digital humanities, on the need to analyze Cultural Heritage with digital tools, allowing machines to understand the cultural, social, and historical processes that generate Heritage and Memory. " - Project running since 1998
Textual Analysis
Coconut Libtool - https://www.coconut-libtool.com/ - "All-in-one data mining and textual analysis tool for everyone." - A sort of Voyant Tools/Palladio web based tool for analysis of CSV type files
CEDAR - https://voices.uchicago.edu/cedar/ - "CEDAR (Critical Editions for Digital Analysis and Research) is a multi-project digital humanities initiative in which innovative computational methods are employed in textual studies. " -
COLaF - "Through the COLaF project (Corpus et Outils pour les Langues de France, Corpus and Tools for the Languages of France), Inria aims to contribute to the development of free corpora and tools for French and other languages of France[...]" - https://colaf.huma-num.fr/
TextFrame - "The ITF specification is intended to facilitate systematic referencing and reuse of textual resources in repositories in a manner that is both user- and machine-friendly." - https://textframe.io/
Text Recognition
HTR-United - "HTR-United is a catalog that lists highly documented training datasets used for automatic transcription or segmentation models. HTR-United standardizes dataset descriptions using a schema, offers guidelines for organizing data repositories, and provides tools for quality control and continuous documentation" - https://htr-united.github.io/index.html
Data Wrangling
Invisible XML - "Invisible XML (ixml) is a method for treating non-XML documents as if they were XML, enabling authors to write documents and data in a format they prefer while providing XML for processes that are more effective with XML content." - https://invisiblexml.org/
Linked Data
grlc - https://grlc.io/ - "grlc makes all your Linked Data accessible to the Web by automatically converting your SPARQL queries into RESTful APIs. With (almost) no effort!"
Visual Analysis
AIKON - https://aikon-platform.github.io/ - "Aikon is a modular platform designed to empower humanities scholars in leveraging artificial intelligence and computer vision methods for analyzing large-scale heritage collections. It offers a user-friendly interface for visualizing, extracting, and analyzing illustrations from historical documents, fostering interdisciplinary collaboration and sustainability across digital humanities projects. Built on proven technologies and interoperable formats, Aikon's adaptable architecture supports all projects involving visual materials. "
Publishing
Edition Crafter - https://editioncrafter.org/ - An open source and customizable publishing tool, EditionCrafter allows users to easily publish digital editions as feature-rich and sustainable static sites.
Manuscripts
VisColl received a new grant from NEH to fund the VCEditor 2.0 - https://viscoll.org/2024/08/28/vceditor-2-0-has-received-an-neh-digital-humanities-advancement-grant/ - "The grant will support work undertaken by staff in the Schoenberg Institute for Manuscript Studies and the Penn Libraries Digital Library Development team. This funding will support the continued development of VCEditor functionality[...]"
Modelling
ComSES Network - "an international community and cyberinfrastructure to support transparency and reproducibility for computational models & their digital context + educational resources and FAQ's for agent based modeling" - https://www.comses.net/
Databases
GQL: A New ISO Standard for Querying Graph Databases - https://thenewstack.io/gql-a-new-iso-standard-for-querying-graph-databases/
GeoSpatial
OSM Buildings - Free and open source web viewer for 3D buildings - https://osmbuildings.org/documentation/viewer/
Old Maps Online - https://www.oldmapsonline.org
Digitisation
Arkindex -"[...]our platform for managing and processing large collections of digitized documents[...]" - https://teklia.com/blog/arkindex-goes-open-source/
Environmental
Digital Humanities Climate Coalition Toolkit - https://sas-dhrh.github.io/dhcc-toolkit/
Archives
EAD 2002 XML/EAC-CPF to Records in Context (RiC-O) converter - https://github.com/ArchivesNationalesFR/rico-converter
Random
Is my blue your blue? - https://ismy.blue/
EZ-Tree - Procedural tree generator - https://github.com/dgreenheck/ez-tree