Skip to main content

Conversational Search for Cultural Heritage

Summary: LLM/RAG chatbots seem to be a good new way to aid in some types of cultural heritage search. But much more experimentation needed. Also see ethics/environmental/economic costs.

response = graph.invoke({"question": "What was the name of Hogarth's dog?"})
print(response["answer"])

The name of Hogarth's dog was Trump (SystemNumber: O97313). This pug was prominently featured in Hogarth's self-portrait of 1745. Hogarth's combative personality led him to be depicted as Trump in a 1753 caricature by Paul Sandby.

response = graph.invoke({"question": "What artworks are associated with greyhounds"})
print(response["answer"])

Two artworks are associated with greyhounds. The first is a painting by Louis Godefroy Jadin (no system number provided), which features greyhounds in a hunting scene. The second is a terracotta statuette of a hound (inv. no. A.10-1954), although it's not clear if it's specifically a greyhound.

response = graph.invoke({"question": "Which artworks depict St Dominic with a dog?"})
print(response["answer"])

The artwork that depicts St. Dominic with a dog is a stone and walnut ivory relief made by Diego Reinoso in 1669. The dog, which resembles a Chinese Foo dog, is sitting at the feet of St. Dominic and is one of his main symbols.

(NB the object system number in the first response is the wrong object! -https://collections.vam.ac.uk/item/O97313/figure-unknown/ , it should be https://collections.vam.ac.uk/item/O77828/hogarths-dog-trump-figure-roubiliac-louis-fran%C3%A7ois/)

The desire to find some alterative option for discovery in cultural heritage collection sites beyond the search box seems to have been around since, well, they started. But beyond the addition of faceted search, there hasn't really been any other huge changes to aid in discovery (of course there have been many great one-off projects exploring new options such as adding data visualisation/generous interfaces, but nothing that has become a standard discovery mode used for most collection sites)

So obviously, given this is a blog post in 2025, the answer must be AI ? Well maybe. It seemed worth exploring some options, starting with a conversational AI interface given that chatting with an large language model (LLM) is very fashionable.

To do this requires getting the information we hold in our collection systems into an LLM, so it can respond with the relevant information from our collections and, ideally, not with any information from other sources, and defintely not with AI hallucinations. There seem to be two main approaches for doing this:

  • Finetuning - Adding in new documents onto a previous tuned LLM

  • Retrieval Augmented Generation (RAG) - Retrieve selected relevant documents in response to a query and return the result to the user from an LLM applied to those documents and the query.

RAG seems to be the simpler option so I've gone for that for this quick experiment, but finetuning is something that is also worth looking into, although I don't know if the quantity of the data would be enough to make it useful.

There are a lot of tutorials out there on building RAG based chatbots, I've followed one from LangChain in the most part (https://python.langchain.com/docs/tutorials/rag/), but with some modifications as I went through it. The finished Jupyter notebook is here [to be added once I've tidied it up!]. In summary I had to:

  • Create a CSV file with descriptions of a set of objects. I used a set of V&A museum collection objects that all depict dogs (for, reasons) and took the title and summary description field as well as the museum database system number to identify them.

  • Create a new prompt (vam/chat-glam - available here for EU usage - https://eu.smith.langchain.com/hub/vam/chat-glam, might need to add it seperately to the USA LangChain Hub, presumably for legal reasons). All the prompt does is try to make it clear the documents are describing artworks and that the answer should give museum system numbers for the objects to guide with searching, but otherwise it's the standard LangChain prompt (rlm/rag-chat)

  • Pick the right model for the retrieval stage for documents. I got this very wrong for some time and was trying lots of the well known models (LLama3) which brought back mostly irrelevant documents (and took a lot of indexing time in doing so). Eventually I realised (thanks to reading https://www.sbert.net/examples/applications/semantic-search/README.html) that in vector space comparing a short query prompt with a longer text (object description) doesn't really make sense (asymmetric semantic search) without using the right model, so I switched to using all-minilm:l6-v2 (https://ollama.com/library/all-minilm:l6-v2) and that worked much better for results (and faster for indexing). I think there is a lot more that can be done on this.

Otherwise the setup is:

  • LLM - Mistral AI (free developer access - this could be replaced with ChatOllama for local LLM chat)

  • Embeddings - Ollama running all-minilm:l6-v2 (remotely on another local computer to try to spread the CPU/Memory load)

  • Vector Store - FAISS

  • Orchestration - LangChain

Once all that setup was done it was pretty easy to run and get results from some chat questions (sample at top of post and more in the notebook). Given the dataset I was a bit constrained in the questions (it's really more canine chat than art history chat) but even with this limited and rough implementation I think it showed how for some type of queries a conversational interface could be a useful discovery mode. In particular for asking about:

  • example objects/object existance (maybe useful for initial exploration, when there is no particular museum object in mind)

  • summary information (but this only works for subsets of the collection)

  • short factual statements (to pull out statements from within records, but this is edging towards more dangerous generative AI)

Of course, search/browse based queries would return similiar example objects if the same keywords were searched for, so in this case it's just the difference in the interface as to how this data is returned, but that might be helpful for some users in giving an alternative way to formulate a query.

For summaries, the chat interface does provide something new ('at a glance' results), but it's constrained by the number of results from the retrieval phase passed onto the LLM, and the summary will be based on that, instead of knowledge of the whole collection. For future work, it might be possible to route those type of queries to some other model to be able to ask questions to the collection as a whole.

Returned factual statements is something new but it also brings up concerns around how much context is being given by the LLM in the response, possibly mis-leading the user (and is it correct in it's response, or hallucinating). It might be worth trying this with vocabularly controlled fields only in records rather then free-text to see if this changes the structure of the response.

More posts to follow with testing on some different models/datasets.

Some ideas for further work:

  • Creating embeddings of all the collection object records would then allow the whole collection to be searched, but this would require some larger compute resource

  • Exploring the types of query that can and can't be answered by this approach (e.gg. questions about the collection as a whole) and how these questions can be answered by other apporoaches

  • A huge amount of experimentation is possible on selecting different models and tuning of them

  • Fine-tuning a model on collection records text

  • Integrating thumbnail images/object previews into the chat response to give quick access to suggested objects