Skip to main content

Mistral AI Illustrations Detection Tests

Mistral OCR test on National Art Library collections from C11-20th

Following on from the announcement about Mistral's OCR model, I thought I would (unfairly!) test it against some examples of manuscripts and books from each century from the 11th to the 20th; to see how well it can recognise illustrations amongst text (or vice-versa). All examples taken from the National Art Library at the V&A.

First some setup (all taken from Mistral's examples - https://docs.mistral.ai/capabilities/document/

import base64
import requests
import os
from dotenv import load_dotenv
from mistralai import Mistral
from mistralai.models import OCRResponse
from IPython.display import Markdown, display

load_dotenv()  # take environment variables/config for API key

api_key = os.environ["MISTRAL_API_KEY"]
client = Mistral(api_key=api_key)

def replace_images_in_markdown(markdown_str: str, images_dict: dict) -> str:
    for img_name, base64_str in images_dict.items():
        markdown_str = markdown_str.replace(f"![{img_name}]({img_name})", f"![{img_name}]({base64_str})")
    return markdown_str

def get_combined_markdown(ocr_response: OCRResponse) -> str:
  markdowns: list[str] = []
  for page in ocr_response.pages:
    image_data = {}
    for img in page.images:
      image_data[img.id] = img.image_base64
    markdowns.append(replace_images_in_markdown(page.markdown, image_data))

  return "\n\n".join(markdowns)

11th Century

Part of a leaf from a Lectionary

ca.1075-1100

https://collections.vam.ac.uk/item/O1754458/part-of-a-leaf-from-manuscript/

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2010EB1329/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

miguem mit ratur. quax do quifgrp dif poft mortem gebenne. concr-mata o muentr. quidum u uerer. fructum operaf facere recufunt. Ingemmat dif dnf bonarum malarumqo arborum monem cum adbuc fubtungtr dicenf. r. ex fructbuf coy cognof ceafcof. (a) u enum fructu{ inquibamide arboref abonuf trnurtur. pauluf apti oftendtr dicenf. seta aut funt operacarnuf. que funt. for nonef. unmundrace. inpudicrue. luxu idolorum feruruf. Yueneficue. inmuet contertionef. enultrionef. ux. tixe. onfionef. fecte. muidue. homicidur. retatef. commef fationef. er fuf fimilu dia quaguate regnum di nonconfequent. ulubuf per bucremum ppfium dicetur. dicauf bomo quconfidur in bomune. erpo carnem brachum fuum. eradrio recedtr cuuf. Frit enim quafi murice indeferto. on uadebir cum uenertr boni. fedbabi tr inflectare infiltudine intera fal. quuf erint. bals. Xruen boni arbor dacf. uctuf pfece. ad- quamfidur indrio qun unufelt df. beneficuf. er demonef tre- munt. ercontremefcunt. (a) uof dnif per ppfium rephut dicenf. Populuf bielabuf mehonort corautè coy longe ême. Talef tempore uudictr. quia fine fructu boni operif clamabunt dirc.dric apernobuf. uidure mereburtur. amendico uobif nefio uof. (a) uergo uulintrare unreguit ceton. nonfolum fare defiderer quid ucltr df. (a) ue fed eram unplere. quod uber di. (a) ue ficut dnif ut inguglio. Betuqui audtunt uerbum di er cuftodunt illud. Et tim di fepuluf. Beztertaf fifecertafca. 1552 DON. X. PIPENTEOSIEN. img-0.jpeg

ARABOLF SAEo monuf filu daud regifift. Adfaenda fapicrtam erdafea plinam. adtratlegrenda uerba prudertue erfu fapicendum eruditionem doctrine. ufftrum. eruducui. eracq crtem ut decur piruuluf aftetarido lefecmabuf unellectuf. Audienf fars

Evaluation

The handwriting recognition is plausibly wrong, but it does impressively partially detect the initial within the text (compare with later illuminated manuscript tests where this separation doesn't happen - not sure why the model picks up on this one works and not the others)

12th Century

Initial from Gratian's Decretum

1160-1165

https://collections.vam.ac.uk/item/O125653/initial-from-gratians-decretum-manuscript-cutting-unknown/

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2009CP6384/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

img-0.jpeg

Evaluation

The whole page/cutting is recognised as an illustration, but the text is not recognised (perhaps Mistral doesn't recognise latin?) and the initial is not recognised as a seperate illustration.

13th Century

Glazier-Rylands Bible Manuscript Cutting

ca.1260-1270

https://collections.vam.ac.uk/item/O130464/glazier-rylands-bible-manuscript-cutting-unknown/

Two columns of text with initial and marginal bar

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2006BC3631/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

img-0.jpeg

Evaluation

The page is recognised a single illustration but no text is recognised and the initial (and marginal bar) are not recognised as seperate illustrations

14th Century

Manuscript Cutting

https://collections.vam.ac.uk/item/O1262522/manuscript-cutting/

A page from an illuminated manuscript with five initials, text and a decorated column

2013GJ1058

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2013GJ1058/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

img-0.jpeg

Evaluation

The whole page is recognised as an image but otherwise no text is recognised and the individual images of the initials and the decorative column are not split out.

15th Century

Melusina (1481)

A woodcut with an illustration at the top and text underneath with an initial.

https://collections.vam.ac.uk/item/O115385/melusina-woodcut-pr%C3%BCss-johann/

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2006AA4119/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

img-0.jpeg

Evaluation

The whole page is recognised as an image, with none of the text at the bottom recognised, possibly due to the font? The initial is not recognised as an illustration of its own.

16th Century

Book

1554

An illustration with text underneath

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2011FD8917/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

img-0.jpeg que de crinin alla Aubin Pumme, e delle, su uide, pili si poi armino siluide, uide gambistimo yiddice. su, in pio, yomo siluodile, uo si pumino di petto sfumino, olle gelo, e fellissimo, nicbo, anche lui, si poi fuo il rumio dapplèo o si miltese d'oro.

Evaluation

The text is rather off, but the illustration is well seperated out. But given this one is rather easier to split the text and the illustration, lets try a much harder page from the same book...

## Book

<https://collections.vam.ac.uk/item/O1025805/book-filippo-orso/>
img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2011FD8898/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

img-0.jpeg

Evaluation

This time the whole page is taken as an illustration , which is reasonable enough, but oddly none of the surrounding text (even the caption at the bottom of the page) is recognised as text.

17th Century

1673-74 - Les plaisirs de l'Isle enchantée

Illustration at the top of the page and within the printed text at the bottom of the page

https://collections.vam.ac.uk/item/O1637296/les-plaisirs-de-lisle-enchantee-book/

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2021NA1039/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

img-0.jpeg

LES PLAISIRS
D E L'ISLE
ENCHANTÉE.
COURSE DE BAGUE:
COLLATION ORNÉE DE MACHINES; COMEDIE, MESLÉE DE DANSE
ET DEMUSIQUE;
BALLET DU PALAIS DALCINE;
FEU D'ARTIFICE:
ET AUTRES FESTES GALANTES
ET MAGNIFIQUES,
FAITES PAR LE ROY A VERSAILLES,
LE VIL MAY M. DC. LXIV.
ET CONTINUÉES PLUSIEURS AUTRES JOURS.

img-1.jpeg

E ROY, voulant donner aux Reines \& à toute fa Cour le plaifir de quelques Fettes peu communes, dans un lieu orné de tous les agrémens qui peuvent faire admirer une Maifon de Campagne, choifit Verfailles, à quatre lieuës de Paris. C'eft un Chafteau qu'on peut nommer un Palais enchanté, tant les ajuttemens de A ij

Evaluation

A very successful result with the perfect seperation of text and illustrations; the text all seems correct.

18th Century

1708-1710 - Aesop's Fables

A printed copy of Aesop's Fables with hand-drawn illustrations by the owner similiar to illuminated manuscripts

https://collections.vam.ac.uk/item/O1719295/the-tenterden-aesop-book-lestrange-roger-sir/

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2015HJ8363/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

80 Esop's Fables.

And this Caution holds good in all the Bufinefs of a Sober Man's Life ; as Marriage, Studies, Pleafures, Society, Commerce, and'the like ; Tis in fome fort, with Friends (Pardon the Coarfenefs of the Illuftration) as it is with Dogs in Comples. They fhould be of the Same Size, and Humour, and that which pleafes the One fhould Pleafe the Other; But if they Draw Several Ways, and if One be too Strong for T other, they'l be really to Hang themfelves upon every Gate or Stile they come at. This is the Moral of the Friendthip betwixt a Thrufo and a Sinalow, that can never Live together.

FAB. 66. A fouler and a 19 igron.

S a Country Fellow was making a Shoot at a Pigeon, he trod upon a Snake that bit him by the Leg. The Surprize startled him, and away flew the Bird.

The Morat.

We are to Diftinguilh betwixt the Benefes of Good Will, and thefe of Providence: For the Latter are immediately from Heaven, where no Human Intention Intervenes.

REFLEXION.

THE Mifchief that we Meditate to Others, falls commonly upon our Own Heads, and Ends in a Judgment, as well as a Difappointment. Take it Another Way, and it may ferve to Mind us how Happily People are Diverted many Times from the Execution of a Malicious Defigu, by the Grace and Goodnefs of a Preventing Providence. A Piftol's not taking Fire may fave the Life of a Good Man; and the Innocent Pigeon had Dy'd, if the Spiteful Snake had not broken the Fowler's Aim: That is to fay ; Good may be drawn out of Evil, and a Body's Life may be Sav'd without having any Obligation to his Preferver.

FA B. 67. A Crumpeter Taken Prifoner. img-0.jpeg

Evaluation

The text is almost perfectly recognised (Crumpeter rather than Trumpeter is an amusing error though). The illustration at the bottom of the page is well recognised, although the smaller illustrations in the margin have been ignored.

Nouvelles Cartes de la République Française

1793

French text and card designs in a grid with captions (below and on the side) with some text as well within each card design.

https://collections.vam.ac.uk/item/O126731/nouvelles-cartes-de-la-r%C3%A9publique-print-jean-d%C3%A9mosth%C3%A8ne-dugourc/

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2013GU7329/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

NOUVELLES CARTES DE LA RÉPUBLIQUE FRANÇAISE. PLUS DE ROIS, DE DAMES, DE VALETS; LE GÉNIE, LA LIBERTÉ, L'ÉGALITÉ LES REMPIACENT, LA LOI $A B U B B S T A V-D E S S U S D^{\prime} B U X$.

Si les vrais seuls de la philosophie et de l'humanité ont concrqués avec plaisir, parmi les types de l'Égalité, le Etern-Calore et le Migne; ils alimentent non-motif le voir LA LOI, SEULE SOTTENAIRE N'UN FRUfLE STERE, coribement L'Âs de sa suprême puissance; dont les faissances sont l'Europe, et lui donnent son acris. On doit donc dire; Quatorze ou Lue, les Grets, les Libérés ou l'Égalite; en lieu de Quatorze d'Âs, de Bois, de Dames ou de Valois; et Diexayildes, Seialdnes, Quinte, Quartieme ou Tierce ou GèPre, à la Libérée ou à l'Égalite; en lieu de les nommer ou Roi, à la Deme ou en Voir : LA LOI donne seule la désomination de Mahone. Ann Jean où les Voiets de Truffle ou de Coup ont une valeur particulières, comme un Hecency ou à la Miracle, il faut substituer L'ÉGALITE DE DEVORA ou celle de DANTE. img-0.jpeg

Evaluation

The French text at the top has been perfectly recognised, the illustration is reconised as one whole image which is understandable although it would be even better if there was some way to instruct it to break the image up into the component squares of the grid. The caption text in the illustration is ignored.

C19th Century

1883 - Histoire des quatre fils Aymon : très nobles et très vaillans chevaliers

A double-page spread with text inbetween and top of illustrations and patterns

https://collections.vam.ac.uk/item/O1744376/histoire-des-quatre-fils-aymon-book-gillot-charles/

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2006AM9812/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

img-0.jpegimg-1.jpeg

Evaluation

A little unfair to give a double-page spread perhaps, it does recognise at least the two different pages but apart from that no text is recognised.

C20th Century

Cautionary Tales for Children by Hilarie Belloc

https://collections.vam.ac.uk/item/O1370845/cautionary-tales-for-children--book-belloc-hilaire/

Illustrations in-between printed text

img_response = client.ocr.process(
    model="mistral-ocr-latest",
    include_image_base64=True,
    document={
        "type": "image_url",
        "image_url": f"https://framemark.vam.ac.uk/collections/2016JL3856/full/full/0/default.jpg" 
    }
)

display(Markdown(get_combined_markdown(img_response)))

Jim, Who ran away from his Nurse, and was eaten by a Lion.

There was a Boy whose name was Jim; His Friends were very good to him. They gave him Tea, and Cakes, and Jam, And slices of delicious Ham, And Chocolate with pink inside, And little Tricycles to ride, And

Evaluation

The text is perfectly recognised, but the illustration of the lion has been ignored for some reason.