Conversational Search for Cultural Heritage
Summary: LLM/RAG chatbots seem to be a good new way to aid in some types of cultural heritage search. But much more experimentation needed. Also see ethics/environmental/economic costs.
response = graph.invoke({"question": "What was the name of Hogarth's dog?"}) print(response["answer"]) The name of Hogarth's dog was Trump (SystemNumber: O97313). This pug was prominently featured in Hogarth's self-portrait of 1745. Hogarth's combative personality led him to be depicted as Trump in a 1753 caricature by Paul Sandby. response = graph.invoke({"question": "What artworks are associated with greyhounds"}) print(response["answer"]) Two artworks are associated with greyhounds. The first is a painting by Louis Godefroy Jadin (no system number provided), which features greyhounds in a hunting scene. The second is a terracotta statuette of a hound (inv. no. A.10-1954), although it's not clear if it's specifically a greyhound. response = graph.invoke({"question": "Which artworks depict St Dominic with a dog?"}) print(response["answer"]) The artwork that depicts St. Dominic with a dog is a stone and walnut ivory relief made by Diego Reinoso in 1669. The dog, which resembles a Chinese Foo dog, is sitting at the feet of St. Dominic and is one of his main symbols.
(NB the object system number in the first response is the wrong object! -https://collections.vam.ac.uk/item/O97313/figure-unknown/ , it should be https://collections.vam.ac.uk/item/O77828/hogarths-dog-trump-figure-roubiliac-louis-fran%C3%A7ois/)
The desire to find some alterative option for discovery in cultural heritage collection sites beyond the search box seems to have been around since, well, they started. But beyond the addition of faceted search, there hasn't really been any other huge changes to aid in discovery (of course there have been many great one-off projects exploring new options such as adding data visualisation/generous interfaces, but nothing that has become a standard discovery mode used for most collection sites)
So obviously, given this is a blog post in 2025, the answer must be AI ? Well maybe. It seemed worth exploring some options, starting with a conversational AI interface given that chatting with an large language model (LLM) is very fashionable.
To do this requires getting the information we hold in our collection systems into an LLM, so it can respond with the relevant information from our collections and, ideally, not with any information from other sources, and defintely not with AI hallucinations. There seem to be two main approaches for doing this:
Finetuning - Adding in new documents onto a previous tuned LLM
Retrieval Augmented Generation (RAG) - Retrieve selected relevant documents in response to a query and return the result to the user from an LLM applied to those documents and the query.
RAG seems to be the simpler option so I've gone for that for this quick experiment, but finetuning is something that is also worth looking into, although I don't know if the quantity of the data would be enough to make it useful.
There are a lot of tutorials out there on building RAG based chatbots, I've followed one from LangChain in the most part (https://python.langchain.com/docs/tutorials/rag/), but with some modifications as I went through it. The finished Jupyter notebook is here [to be added once I've tidied it up!]. In summary I had to:
Create a CSV file with descriptions of a set of objects. I used a set of V&A museum collection objects that all depict dogs (for, reasons) and took the title and summary description field as well as the museum database system number to identify them.
Create a new prompt (vam/chat-glam - available here for EU usage - https://eu.smith.langchain.com/hub/vam/chat-glam, might need to add it seperately to the USA LangChain Hub, presumably for legal reasons). All the prompt does is try to make it clear the documents are describing artworks and that the answer should give museum system numbers for the objects to guide with searching, but otherwise it's the standard LangChain prompt (rlm/rag-chat)
Pick the right model for the retrieval stage for documents. I got this very wrong for some time and was trying lots of the well known models (LLama3) which brought back mostly irrelevant documents (and took a lot of indexing time in doing so). Eventually I realised (thanks to reading https://www.sbert.net/examples/applications/semantic-search/README.html) that in vector space comparing a short query prompt with a longer text (object description) doesn't really make sense (asymmetric semantic search) without using the right model, so I switched to using all-minilm:l6-v2 (https://ollama.com/library/all-minilm:l6-v2) and that worked much better for results (and faster for indexing). I think there is a lot more that can be done on this.
Otherwise the setup is:
LLM - Mistral AI (free developer access - this could be replaced with ChatOllama for local LLM chat)
Embeddings - Ollama running all-minilm:l6-v2 (remotely on another local computer to try to spread the CPU/Memory load)
Vector Store - FAISS
Orchestration - LangChain
Once all that setup was done it was pretty easy to run and get results from some chat questions (sample at top of post and more in the notebook). Given the dataset I was a bit constrained in the questions (it's really more canine chat than art history chat) but even with this limited and rough implementation I think it showed how for some type of queries a conversational interface could be a useful discovery mode. In particular for asking about:
example objects/object existance (maybe useful for initial exploration, when there is no particular museum object in mind)
summary information (but this only works for subsets of the collection)
short factual statements (to pull out statements from within records, but this is edging towards more dangerous generative AI)
Of course, search/browse based queries would return similiar example objects if the same keywords were searched for, so in this case it's just the difference in the interface as to how this data is returned, but that might be helpful for some users in giving an alternative way to formulate a query.
For summaries, the chat interface does provide something new ('at a glance' results), but it's constrained by the number of results from the retrieval phase passed onto the LLM, and the summary will be based on that, instead of knowledge of the whole collection. For future work, it might be possible to route those type of queries to some other model to be able to ask questions to the collection as a whole.
Returned factual statements is something new but it also brings up concerns around how much context is being given by the LLM in the response, possibly mis-leading the user (and is it correct in it's response, or hallucinating). It might be worth trying this with vocabularly controlled fields only in records rather then free-text to see if this changes the structure of the response.
More posts to follow with testing on some different models/datasets.
Some ideas for further work:
Creating embeddings of all the collection object records would then allow the whole collection to be searched, but this would require some larger compute resource
Exploring the types of query that can and can't be answered by this approach (e.gg. questions about the collection as a whole) and how these questions can be answered by other apporoaches
A huge amount of experimentation is possible on selecting different models and tuning of them
Fine-tuning a model on collection records text
Integrating thumbnail images/object previews into the chat response to give quick access to suggested objects
Digital Cultural Heritage Tutorials and Terminology - 2024 Roundup
Useful tutorials and new terminology that I have come across this year. This is just a record of the first time I have seen the tutorial or the word(s) or phrase used in some new context, the tutorials may be older than this year and the words may have been around for some time, but all new for me.
Tutorials
Data Visualisation
Data Visualisation tutorials from the great UCLAB at FH Potsdam - https://github.com/uclab-potsdam/datavis-tutorials
Data Wrangling
Data Science at the Command Line - https://jeroenjanssens.com/dsatcl/ - "Welcome to the website of the second edition of Data Science at the Command Line by Jeroen Janssens, published by O’Reilly Media in October 2021. This website is free to use. "
AI
llama.cpp guide - Running LLMs locally, on any hardware, from scratch - https://blog.steelph0enix.dev/posts/llama-cpp-guide/
History of Embeddings - https://vickiboykis.com/what_are_embeddings/index.html
Text Recognition
Automatic Text Recognition - "Explore our video tutorials on Automatic Text Recognition (ATR) and learn how to efficiently extract full text from heritage material images. Perfectly tailored for researchers, librarians, and archivists, these resources not only enhance your archival research and preservation efforts but also unlock the potential for computational analysis of your sources." - https://harmoniseatr.hypotheses.org/
Terminology
Meso-level
If macro-level is looking at a whole set of texts, and micro-level is looking at a single instance, then if you want something in-between, try meso-level: "What such midsize sets of texts with intricate relationships need, is a meso-level approach: neither corpus nor edition"
From: Lit, C. van and Roorda, D. (2024) ‘Neither Corpus Nor Edition: Building a Pipeline to Make Data Analysis Possible on Medieval Arabic Commentary Traditions’, Journal of Cultural Analytics, 9(3). Available at: https://doi.org/10.22148/001c.116372.
(I think the claim is being made for coining the usage here ?)
Ousiometrics
"We define ‘ousiometrics’ to be the quantitative study of the essential meaningful components of an entity, however represented and perceived. Used in philosophical and theological settings, the word ‘ousia’ comes from Ancient Greek ουσ´ια and is the etymological root of the word ‘essence’ whose more modern usage is our intended reference. For our purposes here, our measurement of essential meaning will rest on and by constrained by the map presented by language. We place ousiometrics within a larger field of what we will call ‘telegnomics’: The study of remotely-sensed knowledge through low-dimensional representations of meaning and stories."
Dodds, P.S. et al. (2023) ‘Ousiometrics and Telegnomics: The essence of meaning conforms to a two-dimensional powerful-weak and dangerous-safe framework with diverse corpora presenting a safety bias’. arXiv. Available at: https://doi.org/10.48550/arXiv.2110.06847.
(discovered from: Fudolig, M.I. et al. (2023) ‘A decomposition of book structure through ousiometric fluctuations in cumulative word-time’, Humanities and Social Sciences Communications, 10(1), pp. 1–12. Available at: https://doi.org/10.1057/s41599-023-01680-4. )
Thick data
"Assuming digital scholars share annotations in the widest possible sense, one can envision a scenario where a proliferation of user- and computer-generated annotations referring to a single IIIF canvas can be analyzed as ‘thick data’. This term is adapted for researchers engaging big data for ethnographical study, recognizing the idea of ‘thick description’ in the work of anthropologist Clifford Geertz. Thick description refers to ‘an account that interprets, rather than describes’ (Moore, 2018: 56 citing Geertz, 1977). Elaborated by Paul Moore, a ‘thick data’ approach shows that the ways in which data are used is a cultural rather than a technological problem, emphasizing that ‘all technologies are ultimately subject not only to the needs of the user but also to the context in which they are being used’ (2018: 52). "
Westerby, M.J. (2024) ‘Annotating Upstream: Digital Scholars, Art History, and the Interoperable Image’, Open Library of Humanities, 10(2). Available at: https://doi.org/10.16995/olh.17217.
Frictionless Reproducibility
"As a researcher steeped in the theory, practice, and history of machine learning, I was struck by David Donoho’s (2024) articulation of frictionless reproducibility—evaluation through data, code, and competition—as the core force driving progress in data science.[...] Donoho defines frictionless reproducibility by three aspirational pillars. Researchers should make data easily available and shareable. Researchers should provide easily re-executable code that processes this data to desired ends. Researchers should emphasize competitive testing as a means of evaluation"
Recht, B. (2024) ‘The Mechanics of Frictionless Reproducibility’, Harvard Data Science Review, 6(1). Available at: https://doi.org/10.1162/99608f92.f0f013d4.
Structured Extraction
"where an LLM helps turn unstructured text (or image content) into structured data"
Textpocalypse
"It is easy now to imagine a setup wherein machines could prompt other machines to put out text ad infinitum, flooding the internet with synthetic text devoid of human agency or intent: gray goo, but for the written word."
Kirschenbaum, M. (2023) ‘Prepare for the Textpocalypse’, The Atlantic, 8 March. Available at: https://www.theatlantic.com/technology/archive/2023/03/ai-chatgpt-writing-language-models/673318/ (Accessed: 1 January 2025).
(and Wulf, K. (2023) Textpocalypse: A Literary Scholar Eyes the ‘Grey Goo’ of AI, The Scholarly Kitchen. Available at: https://scholarlykitchen.sspnet.org/2023/04/13/textpocalypse-a-literary-scholar-eyes-the-grey-goo-of-ai/ (Accessed: 1 January 2025). )
Alignment Ribbons
"The horizontal alignment ribbon (henceforth simply "alignment ribbon") can be understood (and modeled) variously as a graph, a hypergraph, or a tree, and it also shares features of a table. We find it most useful for modeling and visualization to think of the alignment ribbon as a linear sequence of clusters of clusters, where the outer clusters are alignment points and the inner clusters (except the one for missing witnesses) are groups of witness readings (sequences of tokens) that would share a node in a traditional variant graph"
Birnbaum, D.J. and Dekker, R.H. (2024) ‘Visualizing textual collation’, in. Balisage: The Markup Conference. Available at: https://balisage.net/Proceedings/vol29/print/Birnbaum01/BalisageVol29-Birnbaum01.html (Accessed: 2 January 2025).
Opisthograph
(as I understand it, an early manuscript/scroll with writing on both sides ?)
slop
AI generated low-quality text.
Too many citations to mention.
Digital Cultural Heritage Papers - 2024 Roundup
A roundup of papers, essays, articles, book, blogposts, reports etc published this year I've read (ok, sometimes just scanned) for topics I'm currently interested in, i.e. mostly digital cultural heritage or AI/web/computing related. And some random other stuff.
I reserve the right to return in the future to add other papers published this year that I missed, as my interests change. There are also many other papers I should have added here from earlier in the year, I may add more if I have another burst of enthusiam to trawl through my Zotero.
Infrastructure
Christopher Smith - On funding arts and humanities infrastructure in the UK (with a teaser for 2025 plans!) - https://anatomiesofpower.wordpress.com/2024/12/31/funding-arts-and-humanities/
Waters, D.J. (2023) ‘The emerging digital infrastructure for research in the humanities’, International Journal on Digital Libraries, 24(2), pp. 87–102. Available at: https://doi.org/10.1007/s00799-022-00332-3.
Peter Wells The National Data Library should help people deliver trustworthy data services (Dec 2024) - https://peterkwells.com/2024/12/18/the-national-data-library-should-help-people-deliver-trustworthy-data-services/
Linked Data
Sanderson, R. (2024) ‘Implementing Linked Art in a Multi-Modal Database for Cross-Collection Discovery’, Open Library of Humanities, 10(2). Available at: https://doi.org/10.16995/olh.15407.
Data Wrangling
Beyond HTTP APIs: the case for database dumps in Cultural Heritage - https://literarymachin.es/beyond-api-data-dumps/
Computational Text Analysis
Lit, C. van and Roorda, D. (2024) ‘Neither Corpus Nor Edition: Building a Pipeline to Make Data Analysis Possible on Medieval Arabic Commentary Traditions’, Journal of Cultural Analytics, 9(3). Available at: https://doi.org/10.22148/001c.116372.
(conf paper anstract) "Exploring Zero-Shot Named Entity Recognition in Multilingual Historical Travelogues Using Open-Source Large Language Models" - https://clin34.leidenuniv.nl/abstracts/exploring-zero-shot-named-entity-recognition-in-multilingual-historical-travelogues-using-open-source-large-language-models/
AI
DeepMind - A new golden age of discovery - https://deepmind.google/public-policy/ai-for-science/ - "In this essay, we take a tour of how AI is transforming scientific disciplines from genomics to computer science to weather forecasting. Some scientists are training their own AI models, while others are fine-tuning existing AI models, or using these models’ predictions to accelerate their research"
ODI A data for AI taxonomy - https://theodi.org/news-and-events/blog/a-data-for-ai-taxonomy/ - "[...] we set out to develop a taxonomy of the data involved in developing, using and monitoring foundation AI models and systems. It is a response to the way that the data used to train models is often described as if a static, singular blob, and to demonstrate the many types of data needed to build, use and monitor AI systems safely and effectively."
A Large Language Model walks into an archive... https://cblevins.github.io/posts/llm-primary-sources/
VLM Art Analysis by Microsoft Florence-2 and Alibaba Cloud Qwen2-VL - https://huggingface.co/blog/PandorAI1995/vlm-art-analysis-by-florence-2-b-and-qwen2-vl-2b
OCR Processing and Text in Image Analysis with Florence-2-base and Qwen2-VL-2B - https://huggingface.co/blog/PandorAI1995/ocr-processing-text-in-image-analysis-vlm-models
OpenAI - Introducing SimpleQA - "Factuality is a complicated topic because it is hard to measure—evaluating the factuality of any given arbitrary claim is challenging, and language models can generate long completions that contain dozens of factual claims. In SimpleQA, we will focus on short, fact-seeking queries, which reduces the scope of the benchmark but makes measuring factuality much more tractable." - https://openai.com/index/introducing-simpleqa/
Digital Humanities
The Bloomsbury Handbook to the Digital Humanities - https://www.bloomsbury.com/uk/bloomsbury-handbook-to-the-digital-humanities-9781350452572/#
Digital Editions
On Automating Editions The Affordances of Handwritten Text Recognition Platforms for Scholarly Editing - https://scholarlyediting.org/issues/41/on-automating-editions/
3D Printing
Volpe, Y. et al. (2014) ‘Computer-based methodologies for semi-automatic 3D model generation from paintings’, International Journal of Computer Aided Engineering and Technology, 6(1), p. 88. Available at: https://doi.org/10.1504/IJCAET.2014.058012.
Web Development
"[...] this investigation into JavaScript-first frontend culture and how it broke US public services has been released in four parts." - https://infrequently.org/2024/08/the-landscape/
Conference Proceedings
SWIB24 - Semantic Web in Libraries - https://swib.org/swib24/programme.html
Computational Humanities Research CH 2024 - https://ceur-ws.org/Vol-3834/
Vis4DH 2024 - Didn't happen ?
Journals
New Journals I've come across (or re-discovered):
Interdisciplinary Digital Engagement in Arts & Humanities (IDEAH) - https://ideah.pubpub.org/
Public Humanities - https://www.cambridge.org/core/journals/public-humanities
RIDE - "RIDE is an open access review journal dedicated to digital editions and resources" - https://ride.i-d-e.de/
DH Benelux Journal - "DH Benelux Journal is the official journal of the DH Benelux community, which fosters collaboration between researchers in the digital humanities in Belgium, Luxembourg and the Netherlands. " - https://journal.dhbenelux.org/
Journal of Open Research Software - "The Journal of Open Research Software (JORS) features peer reviewed Software Metapapers describing research software with high reuse potential." - https://openresearchsoftware.metajnl.com/
Transformations - A DARIAH Journal is a multilingual journal created in 2024 by the European research infrastructure DARIAH ERIC. This journal is an ongoing publication with thematic issues in Digital Humanities, humanities, social sciences, and the arts. The journal is particularly interested in the use of digital tools, methods, and resources in a reproducible approach. It welcomes scientific contributions on collections of data, workflows and software analysis. - https://transformations.episciences.org/
Digital Cultural Heritage Tools - 2024 Roundup
Continuing my roundup of resources from last year. This time, tools, platforms, standards and other types of useable resources mostly relating to digital cultural heritage but sometimes more general.
AI
Croissant Format - "The Croissant metadata format simplifies how data is used by ML models. It provides a vocabulary for dataset attributes, streamlining how data is loaded across ML frameworks such as PyTorch, TensorFlow or JAX. In doing so, Croissant enables the interchange of datasets between ML frameworks and beyond, tackling a variety of discoverability, portability, reproducibility, and responsible AI (RAI) challenges." - https://docs.mlcommons.org/croissant/docs/croissant-spec.html
Ai2 OpenScholar - "To help scientists effectively navigate and synthesize scientific literature, we introduce Ai2 OpenScholar—a collaborative effort between the University of Washington and the Allen Institute for AI. OpenScholar is a retrieval-augmented language model (LM) designed to answer user queries by first searching for relevant papers in the literature and then generating responses grounded in those sources. Below are some examples:" - https://allenai.org/blog/openscholar
chonkie - The no-nonsense RAG chunking library that's lightweight, lightning-fast, and ready to CHONK your texts - https://github.com/chonkie-ai/chonkie
AI4Culture - "AI4Culture is designed to provide comprehensive training and resources for individuals and institutions interested in applying Artificial Intelligence (AI) technologies in the cultural heritage sector. It aims to empower users with the knowledge and skills needed to leverage AI for preserving, managing, and promoting cultural heritage." - https://ai4culture.eu/
AI Risk Repository - "A comprehensive living database of over 700 AI risks categorized by their cause and risk domain." - https://airisk.mit.edu/
Cultural Heritage Collections Visualisation
Collection Space Navigator - https://datalab.allardpierson.nl/meet_the_data/01_Explore%20the%20Collections.html - Allard Pierson, University of Amsterdam
InTaVia platform for "[...] supports the practices of data retrieval, creation, curation, analysis, and communication with coherent visualization support for multiple types of entities. We illustrate the added value of this open platform for storytelling with four case studies, focusing on (a) the life of Albrecht Dürer (person biography), (b) the Saliera salt cellar by Benvenuto Cellini (object biography), (c) the artist community of Lake Tuusula (group biography), and (d) the history of the Hofburg building complex in Vienna (place biography)." - https://intavia.eu/
Cultural Heritage Cataloguing
Dédalo - https://dedalo.dev/the_project#dd1100_62 - "Dédalo is a project focused on the field of digital humanities, on the need to analyze Cultural Heritage with digital tools, allowing machines to understand the cultural, social, and historical processes that generate Heritage and Memory. " - Project running since 1998
Textual Analysis
Coconut Libtool - https://www.coconut-libtool.com/ - "All-in-one data mining and textual analysis tool for everyone." - A sort of Voyant Tools/Palladio web based tool for analysis of CSV type files
CEDAR - https://voices.uchicago.edu/cedar/ - "CEDAR (Critical Editions for Digital Analysis and Research) is a multi-project digital humanities initiative in which innovative computational methods are employed in textual studies. " -
COLaF - "Through the COLaF project (Corpus et Outils pour les Langues de France, Corpus and Tools for the Languages of France), Inria aims to contribute to the development of free corpora and tools for French and other languages of France[...]" - https://colaf.huma-num.fr/
TextFrame - "The ITF specification is intended to facilitate systematic referencing and reuse of textual resources in repositories in a manner that is both user- and machine-friendly." - https://textframe.io/
Text Recognition
HTR-United - "HTR-United is a catalog that lists highly documented training datasets used for automatic transcription or segmentation models. HTR-United standardizes dataset descriptions using a schema, offers guidelines for organizing data repositories, and provides tools for quality control and continuous documentation" - https://htr-united.github.io/index.html
Data Wrangling
Invisible XML - "Invisible XML (ixml) is a method for treating non-XML documents as if they were XML, enabling authors to write documents and data in a format they prefer while providing XML for processes that are more effective with XML content." - https://invisiblexml.org/
Linked Data
grlc - https://grlc.io/ - "grlc makes all your Linked Data accessible to the Web by automatically converting your SPARQL queries into RESTful APIs. With (almost) no effort!"
Visual Analysis
AIKON - https://aikon-platform.github.io/ - "Aikon is a modular platform designed to empower humanities scholars in leveraging artificial intelligence and computer vision methods for analyzing large-scale heritage collections. It offers a user-friendly interface for visualizing, extracting, and analyzing illustrations from historical documents, fostering interdisciplinary collaboration and sustainability across digital humanities projects. Built on proven technologies and interoperable formats, Aikon's adaptable architecture supports all projects involving visual materials. "
Publishing
Edition Crafter - https://editioncrafter.org/ - An open source and customizable publishing tool, EditionCrafter allows users to easily publish digital editions as feature-rich and sustainable static sites.
Manuscripts
VisColl received a new grant from NEH to fund the VCEditor 2.0 - https://viscoll.org/2024/08/28/vceditor-2-0-has-received-an-neh-digital-humanities-advancement-grant/ - "The grant will support work undertaken by staff in the Schoenberg Institute for Manuscript Studies and the Penn Libraries Digital Library Development team. This funding will support the continued development of VCEditor functionality[...]"
Modelling
ComSES Network - "an international community and cyberinfrastructure to support transparency and reproducibility for computational models & their digital context + educational resources and FAQ's for agent based modeling" - https://www.comses.net/
Databases
GQL: A New ISO Standard for Querying Graph Databases - https://thenewstack.io/gql-a-new-iso-standard-for-querying-graph-databases/
GeoSpatial
OSM Buildings - Free and open source web viewer for 3D buildings - https://osmbuildings.org/documentation/viewer/
Old Maps Online - https://www.oldmapsonline.org
Digitisation
Arkindex -"[...]our platform for managing and processing large collections of digitized documents[...]" - https://teklia.com/blog/arkindex-goes-open-source/
Environmental
Digital Humanities Climate Coalition Toolkit - https://sas-dhrh.github.io/dhcc-toolkit/
Archives
EAD 2002 XML/EAC-CPF to Records in Context (RiC-O) converter - https://github.com/ArchivesNationalesFR/rico-converter
Random
Is my blue your blue? - https://ismy.blue/
EZ-Tree - Procedural tree generator - https://github.com/dgreenheck/ez-tree
Digital Cultural Heritage Projects - 2024 Roundup
An attempt to record new research projects, research infrastructures, websites, etc I've come across this year to try to keep track of them, hopefully of use to others but also for my own selfish reasons to free up space in my poor crowded brain (and reduce web browser tabs). It's also because I've found it hard to note them otherwise, it's surprising to me there is still no good way I know of to record projects in progress (to keep track of them); any upcoming conferences (for the call for papers, and then to read proceedings) and for journals of interest (CFP/published). There must be someone working on the Zotero equivalent for these standard workflows of research ?
I'm hoping for this to be an annual tradition (tradition in the sense that this is the first time I've done it and who knows, I may do it again next year). But given the 8 year gap between this blog post and the previous one, the odds are not great)
Other posts will follow with new or updated research related tools, another with articles/reports of particular interest, and then (most importantly?) any new terminology I've come across this year, again mostly to help myself having to look up for the umpteenth time what 'meso-level' actually means.
Project Outputs
These projects may have been around for a long time but I have possibly just discovered their outputs (i.e. the website with the results of the projects work or some in-progress publication/media).
Mapping Color in History - https://mappingcolor.fas.harvard.edu/ - "Mapping Color in History™ is a searchable database of pigment analysis in Asian paintings"
Language of Bindings - https://www.ligatus.org.uk/lob/ - "The Language of Bindings Thesaurus (LoB) includes terms which can be used to describe historical binding structures."
CATALOG of DISTINCTIVE TYPE (CDT), Restoration England (1660-1700) - https://cdt.library.cmu.edu
Sloane Lab Knowledge Base - https://knowledgebase.sloanelab.org/resource/Start - "The Sloane Lab Knowledge Base is an interactive portal reuniting the collections of Sir Hans Sloane, which consist of the founding collections of the British Museum, the Natural History Museum and the British Library. It is still in development, with more features being built and datasets in our ingestion pipeline. The Sloane Lab Knowledge Base enables the cross-searching and investigation of object, records, people and places in the Sloane Collections. The Sloane Lab Knowledge Base makes also possible the crosslinking of Sloane’s historical manuscript catalogue entries with contemporary database records found in museum collections today. "
imagineRio - "A searchable digital atlas that illustrates the social and urban evolution of Rio de Janeiro, as it existed and as it was imagined. Views of the city created by artists, maps by cartographers, and site plans by architects or urbanists are all located in both time and space. It is a web environment that offers creative new ways for scholars, students, and residents to visualize the past by seeing historical and modern imagery against an interactive map that accurately presents the city since its founding." - https://www.imaginerio.org/en
Zamani Project - "Zamani Project undertakes data collection and analysis, heritage communication, and training and capacity building for experts and the public so that they have access to high-quality spatial heritage data, and can learn from, conserve, and protect heritage." - https://zamaniproject.org/
Data/Culture - "Data/Culture is a sandbox for the re-use of data and tools in the humanities and arts, in ways that develop high-quality research and strong collaborative communities. " - https://www.turing.ac.uk/research/research-projects/dataculture-building-sustainable-communities-around-arts-and-humanities
A Gamification Approach to Literary Network Analysis - Battle of the Plays playing cards (!) - https://sukultur.de/produkt/battle-of-the-plays-a-gamification-approach-to-literary-network-analysis/ (why don't more research projects create games from their outputs!)
Project Announcements
Infrastructure
ECHOES - https://www.echoes-eccch.eu/ - "The Cultural Heritage Cloud (ECCCH) is a shared platform designed to provide heritage professionals and researchers with access to data, scientific resources, training, and advanced digital tools tailored to suit their needs" - Part of the Cultural Heritage Cloud Horizon Infrastructure funding.
Manuscripts
LostMA - ‘LostMa: The Lost Manuscripts of Medieval Europe: Modelling the Transmission of Texts | École Nationale Des Chartes - PSL’. Accessed 30 December 2024. https://www.chartes.psl.eu/en/research/centre-jean-mabillon/research-projects/lostma-lost-manuscripts-medieval-europe-modelling-transmission-texts. - Jan 2024 - Dec 2028 - Ecole nationale des Chartes - France
INSULAR - https://le.ac.uk/news/2024/april/insular-project - "The European Research Council (ERC) has awarded a prestigious Advanced Grant, for €2.5m, to an ambitious new project to study early medieval manuscripts made in Britain and Ireland between AD 600–900." - Institutions: University of Leicester, University of Göttingen, Bodleian Library, University of Cambridge, Trinity College Dublin, BNF, Det Kongelige Bibliotek, Leuven University -
DISCOVER - https://erc-discover.github.io/ - "Our goal is to develop approaches to assist experts in identifying and analyzing patterns. Indeed, while the success of deep learning on visual data is undeniable, applications are often limited to the supervised learning scenario where the algorithm tries to infer a label for a new image based on the annotations made by experts in a reference dataset."
STEMMA - "This project develops and applies a data-driven approach in order to provide the first macro-level view of the circulation of early modern English poetry in manuscript. It focuses on English verse manuscripts written and used between the introduction of printing in England in 1475 and 1700, by which time the rapid changes in both literary taste and publishing norms ushered in by the Restoration had fully transformed literary culture. The project includes manuscripts circulating in England and anywhere else English was spoken and read, including Ireland, the North American colonies, and continental exile communities." - https://stemma.universityofgalway.ie/
AI
Project StoryMachine - https://www.dfg.de/en/news/news-topics/announcements-proposals/2024/ifr-24-110 - Exploring Implications of Recommender Based Spatial Hypertext Systems for Folklore and the Humanities - Hochschule Hof, Ludwig-Maximilians-Universität München, Universität Regensburg, University of London
Developing a public-interest training commons of books - https://www.authorsalliance.org/2024/12/05/developing-a-public-interest-training-commons-of-books/
Project Updates
(i.e. ongoing projects I've discovered this year or re-discovered after some announcement)
Network Analysis
DISSINET - https://dissinet.cz/ - "The “Dissident Networks Project” (DISSINET), hosted at Masaryk University’s Centre for the Digital Research of Religion, is a research initiative exploring dissident and inquisitorial cultures in medieval Europe from the perspective of social network analysis, geographic information science, and computational text analysis"
AI
RePAIR Project - https://www.repairproject.eu/ - "an acronym for Reconstructing the Past: Artificial Intelligence and Robotics meet Cultural Heritage. State-of-the-art technology will, for the first time, be employed in the physical reconstruction of archaeological artefacts, which are mostly fragmentary and difficult to reassemble."