Digital Cultural Heritage Tutorials and Terminology - 2024 Roundup
Useful tutorials and new terminology that I have come across this year. This is just a record of the first time I have seen the tutorial or the word(s) or phrase used in some new context, the tutorials may be older than this year and the words may have been around for some time, but all new for me.
Tutorials
Data Visualisation
Data Visualisation tutorials from the great UCLAB at FH Potsdam - https://github.com/uclab-potsdam/datavis-tutorials
Data Wrangling
Data Science at the Command Line - https://jeroenjanssens.com/dsatcl/ - "Welcome to the website of the second edition of Data Science at the Command Line by Jeroen Janssens, published by O’Reilly Media in October 2021. This website is free to use. "
AI
llama.cpp guide - Running LLMs locally, on any hardware, from scratch - https://blog.steelph0enix.dev/posts/llama-cpp-guide/
History of Embeddings - https://vickiboykis.com/what_are_embeddings/index.html
Text Recognition
Automatic Text Recognition - "Explore our video tutorials on Automatic Text Recognition (ATR) and learn how to efficiently extract full text from heritage material images. Perfectly tailored for researchers, librarians, and archivists, these resources not only enhance your archival research and preservation efforts but also unlock the potential for computational analysis of your sources." - https://harmoniseatr.hypotheses.org/
Terminology
Meso-level
If macro-level is looking at a whole set of texts, and micro-level is looking at a single instance, then if you want something in-between, try meso-level: "What such midsize sets of texts with intricate relationships need, is a meso-level approach: neither corpus nor edition"
From: Lit, C. van and Roorda, D. (2024) ‘Neither Corpus Nor Edition: Building a Pipeline to Make Data Analysis Possible on Medieval Arabic Commentary Traditions’, Journal of Cultural Analytics, 9(3). Available at: https://doi.org/10.22148/001c.116372.
(I think the claim is being made for coining the usage here ?)
Ousiometrics
"We define ‘ousiometrics’ to be the quantitative study of the essential meaningful components of an entity, however represented and perceived. Used in philosophical and theological settings, the word ‘ousia’ comes from Ancient Greek ουσ´ια and is the etymological root of the word ‘essence’ whose more modern usage is our intended reference. For our purposes here, our measurement of essential meaning will rest on and by constrained by the map presented by language. We place ousiometrics within a larger field of what we will call ‘telegnomics’: The study of remotely-sensed knowledge through low-dimensional representations of meaning and stories."
Dodds, P.S. et al. (2023) ‘Ousiometrics and Telegnomics: The essence of meaning conforms to a two-dimensional powerful-weak and dangerous-safe framework with diverse corpora presenting a safety bias’. arXiv. Available at: https://doi.org/10.48550/arXiv.2110.06847.
(discovered from: Fudolig, M.I. et al. (2023) ‘A decomposition of book structure through ousiometric fluctuations in cumulative word-time’, Humanities and Social Sciences Communications, 10(1), pp. 1–12. Available at: https://doi.org/10.1057/s41599-023-01680-4. )
Thick data
"Assuming digital scholars share annotations in the widest possible sense, one can envision a scenario where a proliferation of user- and computer-generated annotations referring to a single IIIF canvas can be analyzed as ‘thick data’. This term is adapted for researchers engaging big data for ethnographical study, recognizing the idea of ‘thick description’ in the work of anthropologist Clifford Geertz. Thick description refers to ‘an account that interprets, rather than describes’ (Moore, 2018: 56 citing Geertz, 1977). Elaborated by Paul Moore, a ‘thick data’ approach shows that the ways in which data are used is a cultural rather than a technological problem, emphasizing that ‘all technologies are ultimately subject not only to the needs of the user but also to the context in which they are being used’ (2018: 52). "
Westerby, M.J. (2024) ‘Annotating Upstream: Digital Scholars, Art History, and the Interoperable Image’, Open Library of Humanities, 10(2). Available at: https://doi.org/10.16995/olh.17217.
Frictionless Reproducibility
"As a researcher steeped in the theory, practice, and history of machine learning, I was struck by David Donoho’s (2024) articulation of frictionless reproducibility—evaluation through data, code, and competition—as the core force driving progress in data science.[...] Donoho defines frictionless reproducibility by three aspirational pillars. Researchers should make data easily available and shareable. Researchers should provide easily re-executable code that processes this data to desired ends. Researchers should emphasize competitive testing as a means of evaluation"
Recht, B. (2024) ‘The Mechanics of Frictionless Reproducibility’, Harvard Data Science Review, 6(1). Available at: https://doi.org/10.1162/99608f92.f0f013d4.
Structured Extraction
"where an LLM helps turn unstructured text (or image content) into structured data"
Textpocalypse
"It is easy now to imagine a setup wherein machines could prompt other machines to put out text ad infinitum, flooding the internet with synthetic text devoid of human agency or intent: gray goo, but for the written word."
Kirschenbaum, M. (2023) ‘Prepare for the Textpocalypse’, The Atlantic, 8 March. Available at: https://www.theatlantic.com/technology/archive/2023/03/ai-chatgpt-writing-language-models/673318/ (Accessed: 1 January 2025).
(and Wulf, K. (2023) Textpocalypse: A Literary Scholar Eyes the ‘Grey Goo’ of AI, The Scholarly Kitchen. Available at: https://scholarlykitchen.sspnet.org/2023/04/13/textpocalypse-a-literary-scholar-eyes-the-grey-goo-of-ai/ (Accessed: 1 January 2025). )
Alignment Ribbons
"The horizontal alignment ribbon (henceforth simply "alignment ribbon") can be understood (and modeled) variously as a graph, a hypergraph, or a tree, and it also shares features of a table. We find it most useful for modeling and visualization to think of the alignment ribbon as a linear sequence of clusters of clusters, where the outer clusters are alignment points and the inner clusters (except the one for missing witnesses) are groups of witness readings (sequences of tokens) that would share a node in a traditional variant graph"
Birnbaum, D.J. and Dekker, R.H. (2024) ‘Visualizing textual collation’, in. Balisage: The Markup Conference. Available at: https://balisage.net/Proceedings/vol29/print/Birnbaum01/BalisageVol29-Birnbaum01.html (Accessed: 2 January 2025).
Opisthograph
(as I understand it, an early manuscript/scroll with writing on both sides ?)
slop
AI generated low-quality text.
Too many citations to mention.