Skip to main content

Digital Cultural Heritage Tools - 2024 Roundup

Continuing my roundup of resources from last year. This time, tools, platforms, standards and other types of useable resources mostly relating to digital cultural heritage but sometimes more general.

AI

  • Croissant Format - "The Croissant metadata format simplifies how data is used by ML models. It provides a vocabulary for dataset attributes, streamlining how data is loaded across ML frameworks such as PyTorch, TensorFlow or JAX. In doing so, Croissant enables the interchange of datasets between ML frameworks and beyond, tackling a variety of discoverability, portability, reproducibility, and responsible AI (RAI) challenges." - https://docs.mlcommons.org/croissant/docs/croissant-spec.html

  • Ai2 OpenScholar - "To help scientists effectively navigate and synthesize scientific literature, we introduce Ai2 OpenScholar—a collaborative effort between the University of Washington and the Allen Institute for AI. OpenScholar is a retrieval-augmented language model (LM) designed to answer user queries by first searching for relevant papers in the literature and then generating responses grounded in those sources. Below are some examples:" - https://allenai.org/blog/openscholar

  • chonkie - The no-nonsense RAG chunking library that's lightweight, lightning-fast, and ready to CHONK your texts - https://github.com/chonkie-ai/chonkie

  • AI4Culture - "AI4Culture is designed to provide comprehensive training and resources for individuals and institutions interested in applying Artificial Intelligence (AI) technologies in the cultural heritage sector. It aims to empower users with the knowledge and skills needed to leverage AI for preserving, managing, and promoting cultural heritage." - https://ai4culture.eu/

  • AI Risk Repository - "A comprehensive living database of over 700 AI risks categorized by their cause and risk domain." - https://airisk.mit.edu/

Cultural Heritage Collections Visualisation

  • Collection Space Navigator - https://datalab.allardpierson.nl/meet_the_data/01_Explore%20the%20Collections.html - Allard Pierson, University of Amsterdam

  • InTaVia platform for "[...] supports the practices of data retrieval, creation, curation, analysis, and communication with coherent visualization support for multiple types of entities. We illustrate the added value of this open platform for storytelling with four case studies, focusing on (a) the life of Albrecht Dürer (person biography), (b) the Saliera salt cellar by Benvenuto Cellini (object biography), (c) the artist community of Lake Tuusula (group biography), and (d) the history of the Hofburg building complex in Vienna (place biography)." - https://intavia.eu/

Cultural Heritage Cataloguing

  • Dédalo - https://dedalo.dev/the_project#dd1100_62 - "Dédalo is a project focused on the field of digital humanities, on the need to analyze Cultural Heritage with digital tools, allowing machines to understand the cultural, social, and historical processes that generate Heritage and Memory. " - Project running since 1998

Textual Analysis

  • Coconut Libtool - https://www.coconut-libtool.com/ - "All-in-one data mining and textual analysis tool for everyone." - A sort of Voyant Tools/Palladio web based tool for analysis of CSV type files

  • CEDAR - https://voices.uchicago.edu/cedar/ - "CEDAR (Critical Editions for Digital Analysis and Research) is a multi-project digital humanities initiative in which innovative computational methods are employed in textual studies. " -

  • COLaF - "Through the COLaF project (Corpus et Outils pour les Langues de France, Corpus and Tools for the Languages of France), Inria aims to contribute to the development of free corpora and tools for French and other languages of France[...]" - https://colaf.huma-num.fr/

  • TextFrame - "The ITF specification is intended to facilitate systematic referencing and reuse of textual resources in repositories in a manner that is both user- and machine-friendly." - https://textframe.io/

Text Recognition

  • HTR-United - "HTR-United is a catalog that lists highly documented training datasets used for automatic transcription or segmentation models. HTR-United standardizes dataset descriptions using a schema, offers guidelines for organizing data repositories, and provides tools for quality control and continuous documentation" - https://htr-united.github.io/index.html

Data Wrangling

  • Invisible XML - "Invisible XML (ixml) is a method for treating non-XML documents as if they were XML, enabling authors to write documents and data in a format they prefer while providing XML for processes that are more effective with XML content." - https://invisiblexml.org/

Linked Data

  • grlc - https://grlc.io/ - "grlc makes all your Linked Data accessible to the Web by automatically converting your SPARQL queries into RESTful APIs. With (almost) no effort!"

Visual Analysis

  • AIKON - https://aikon-platform.github.io/ - "Aikon is a modular platform designed to empower humanities scholars in leveraging artificial intelligence and computer vision methods for analyzing large-scale heritage collections. It offers a user-friendly interface for visualizing, extracting, and analyzing illustrations from historical documents, fostering interdisciplinary collaboration and sustainability across digital humanities projects. Built on proven technologies and interoperable formats, Aikon's adaptable architecture supports all projects involving visual materials. "

Publishing

  • Edition Crafter - https://editioncrafter.org/ - An open source and customizable publishing tool, EditionCrafter allows users to easily publish digital editions as feature-rich and sustainable static sites.

Manuscripts

Modelling

  • ComSES Network - "an international community and cyberinfrastructure to support transparency and reproducibility for computational models & their digital context + educational resources and FAQ's for agent based modeling" - https://www.comses.net/

Databases

GeoSpatial

Digitisation

Environmental

Archives

Random