Mistral AI Illustrations Detection Tests
Mistral OCR test on National Art Library collections from C11-20th
Following on from the announcement about Mistral's OCR model, I thought I would (unfairly!) test it against some examples of manuscripts and books from each century from the 11th to the 20th; to see how well it can recognise illustrations amongst text (or vice-versa). All examples taken from the National Art Library at the V&A.
First some setup (all taken from Mistral's examples - https://docs.mistral.ai/capabilities/document/
import base64 import requests import os from dotenv import load_dotenv from mistralai import Mistral from mistralai.models import OCRResponse from IPython.display import Markdown, display load_dotenv() # take environment variables/config for API key api_key = os.environ["MISTRAL_API_KEY"] client = Mistral(api_key=api_key) def replace_images_in_markdown(markdown_str: str, images_dict: dict) -> str: for img_name, base64_str in images_dict.items(): markdown_str = markdown_str.replace(f"", f"") return markdown_str def get_combined_markdown(ocr_response: OCRResponse) -> str: markdowns: list[str] = [] for page in ocr_response.pages: image_data = {} for img in page.images: image_data[img.id] = img.image_base64 markdowns.append(replace_images_in_markdown(page.markdown, image_data)) return "\n\n".join(markdowns)
11th Century
Part of a leaf from a Lectionary
ca.1075-1100
https://collections.vam.ac.uk/item/O1754458/part-of-a-leaf-from-manuscript/
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2010EB1329/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
miguem mit ratur. quax do quifgrp
dif poft mortem gebenne. concr-mata
o muentr. quidum u uerer. fructum
operaf facere recufunt. Ingemmat
dif dnf bonarum malarumqo arborum
monem cum adbuc fubtungtr dicenf.
r. ex fructbuf coy cognof ceafcof. (a) u
enum fructu{ inquibamide arboref abonuf
trnurtur. pauluf apti oftendtr dicenf.
seta aut funt operacarnuf. que funt. for
nonef. unmundrace. inpudicrue. luxu
idolorum feruruf. Yueneficue. inmuet
contertionef. enultrionef. ux. tixe.
onfionef. fecte. muidue. homicidur.
retatef. commef fationef. er fuf fimilu
dia quaguate regnum di nonconfequent.
ulubuf per bucremum ppfium dicetur.
dicauf bomo quconfidur in bomune. erpo
carnem brachum fuum. eradrio recedtr
cuuf. Frit enim quafi murice indeferto.
on uadebir cum uenertr boni. fedbabi
tr inflectare infiltudine intera fal.
quuf erint. bals. Xruen boni arbor
dacf. uctuf pfece. ad-
quamfidur indrio
qun unufelt df. beneficuf. er demonef tre-
munt. ercontremefcunt. (a) uof dnif per
ppfium rephut dicenf. Populuf bielabuf
mehonort corautè coy longe ême. Talef
tempore uudictr. quia fine fructu boni
operif clamabunt dirc.dric apernobuf.
uidure mereburtur. amendico uobif nefio
uof. (a) uergo uulintrare unreguit ceton.
nonfolum fare defiderer quid ucltr df. (a) ue
fed eram unplere. quod uber di. (a) ue
ficut dnif ut inguglio. Betuqui audtunt
uerbum di er cuftodunt illud. Et tim di
fepuluf. Beztertaf fifecertafca.
1552
DON. X. PIPENTEOSIEN.
ARABOLF SAEo monuf filu daud regifift. Adfaenda fapicrtam erdafea plinam. adtratlegrenda uerba prudertue erfu fapicendum eruditionem doctrine. ufftrum. eruducui. eracq crtem ut decur piruuluf aftetarido lefecmabuf unellectuf. Audienf fars
Evaluation
The handwriting recognition is plausibly wrong, but it does impressively partially detect the initial within the text (compare with later illuminated manuscript tests where this separation doesn't happen - not sure why the model picks up on this one works and not the others)
12th Century
Initial from Gratian's Decretum
1160-1165
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2009CP6384/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
Evaluation
The whole page/cutting is recognised as an illustration, but the text is not recognised (perhaps Mistral doesn't recognise latin?) and the initial is not recognised as a seperate illustration.
13th Century
Glazier-Rylands Bible Manuscript Cutting
ca.1260-1270
https://collections.vam.ac.uk/item/O130464/glazier-rylands-bible-manuscript-cutting-unknown/
Two columns of text with initial and marginal bar
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2006BC3631/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
Evaluation
The page is recognised a single illustration but no text is recognised and the initial (and marginal bar) are not recognised as seperate illustrations
14th Century
Manuscript Cutting
https://collections.vam.ac.uk/item/O1262522/manuscript-cutting/
A page from an illuminated manuscript with five initials, text and a decorated column
2013GJ1058
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2013GJ1058/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
Evaluation
The whole page is recognised as an image but otherwise no text is recognised and the individual images of the initials and the decorative column are not split out.
15th Century
Melusina (1481)
A woodcut with an illustration at the top and text underneath with an initial.
https://collections.vam.ac.uk/item/O115385/melusina-woodcut-pr%C3%BCss-johann/
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2006AA4119/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
Evaluation
The whole page is recognised as an image, with none of the text at the bottom recognised, possibly due to the font? The initial is not recognised as an illustration of its own.
16th Century
Book
1554
An illustration with text underneath
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2011FD8917/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
que de crinin alla Aubin Pumme, e delle, su uide, pili si poi armino siluide, uide
gambistimo yiddice. su, in pio, yomo siluodile, uo si pumino di petto sfumino, olle
gelo, e fellissimo, nicbo, anche lui, si poi fuo il rumio dapplèo o si miltese d'oro.
Evaluation
The text is rather off, but the illustration is well seperated out. But given this one is rather easier to split the text and the illustration, lets try a much harder page from the same book...
## Book <https://collections.vam.ac.uk/item/O1025805/book-filippo-orso/>
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2011FD8898/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
Evaluation
This time the whole page is taken as an illustration , which is reasonable enough, but oddly none of the surrounding text (even the caption at the bottom of the page) is recognised as text.
17th Century
1673-74 - Les plaisirs de l'Isle enchantée
Illustration at the top of the page and within the printed text at the bottom of the page
https://collections.vam.ac.uk/item/O1637296/les-plaisirs-de-lisle-enchantee-book/
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2021NA1039/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
LES PLAISIRS
D E L'ISLE
ENCHANTÉE.
COURSE DE BAGUE:
COLLATION ORNÉE DE MACHINES; COMEDIE, MESLÉE DE DANSE
ET DEMUSIQUE;
BALLET DU PALAIS DALCINE;
FEU D'ARTIFICE:
ET AUTRES FESTES GALANTES
ET MAGNIFIQUES,
FAITES PAR LE ROY A VERSAILLES,
LE VIL MAY M. DC. LXIV.
ET CONTINUÉES PLUSIEURS AUTRES JOURS.
E ROY, voulant donner aux Reines \& à toute fa Cour le plaifir de quelques Fettes peu communes, dans un lieu orné de tous les agrémens qui peuvent faire admirer une Maifon de Campagne, choifit Verfailles, à quatre lieuës de Paris. C'eft un Chafteau qu'on peut nommer un Palais enchanté, tant les ajuttemens de A ij
Evaluation
A very successful result with the perfect seperation of text and illustrations; the text all seems correct.
18th Century
1708-1710 - Aesop's Fables
A printed copy of Aesop's Fables with hand-drawn illustrations by the owner similiar to illuminated manuscripts
https://collections.vam.ac.uk/item/O1719295/the-tenterden-aesop-book-lestrange-roger-sir/
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2015HJ8363/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
80 Esop's Fables.
And this Caution holds good in all the Bufinefs of a Sober Man's Life ; as Marriage, Studies, Pleafures, Society, Commerce, and'the like ; Tis in fome fort, with Friends (Pardon the Coarfenefs of the Illuftration) as it is with Dogs in Comples. They fhould be of the Same Size, and Humour, and that which pleafes the One fhould Pleafe the Other; But if they Draw Several Ways, and if One be too Strong for T other, they'l be really to Hang themfelves upon every Gate or Stile they come at. This is the Moral of the Friendthip betwixt a Thrufo and a Sinalow, that can never Live together.
FAB. 66. A fouler and a 19 igron.
S a Country Fellow was making a Shoot at a Pigeon, he trod upon a Snake that bit him by the Leg. The Surprize startled him, and away flew the Bird.
The Morat.
We are to Diftinguilh betwixt the Benefes of Good Will, and thefe of Providence: For the Latter are immediately from Heaven, where no Human Intention Intervenes.
REFLEXION.
THE Mifchief that we Meditate to Others, falls commonly upon our Own Heads, and Ends in a Judgment, as well as a Difappointment. Take it Another Way, and it may ferve to Mind us how Happily People are Diverted many Times from the Execution of a Malicious Defigu, by the Grace and Goodnefs of a Preventing Providence. A Piftol's not taking Fire may fave the Life of a Good Man; and the Innocent Pigeon had Dy'd, if the Spiteful Snake had not broken the Fowler's Aim: That is to fay ; Good may be drawn out of Evil, and a Body's Life may be Sav'd without having any Obligation to his Preferver.
FA B. 67. A Crumpeter Taken Prifoner.
Evaluation
The text is almost perfectly recognised (Crumpeter rather than Trumpeter is an amusing error though). The illustration at the bottom of the page is well recognised, although the smaller illustrations in the margin have been ignored.
Nouvelles Cartes de la République Française
1793
French text and card designs in a grid with captions (below and on the side) with some text as well within each card design.
https://collections.vam.ac.uk/item/O126731/nouvelles-cartes-de-la-r%C3%A9publique-print-jean-d%C3%A9mosth%C3%A8ne-dugourc/
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2013GU7329/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
NOUVELLES CARTES DE LA RÉPUBLIQUE FRANÇAISE. PLUS DE ROIS, DE DAMES, DE VALETS; LE GÉNIE, LA LIBERTÉ, L'ÉGALITÉ LES REMPIACENT, LA LOI $A B U B B S T A V-D E S S U S D^{\prime} B U X$.
Si les vrais seuls de la philosophie et de l'humanité ont concrqués avec plaisir, parmi les types de l'Égalité, le Etern-Calore et le Migne; ils alimentent non-motif le voir LA LOI, SEULE SOTTENAIRE N'UN FRUfLE STERE, coribement L'Âs de sa suprême puissance; dont les faissances sont l'Europe, et lui donnent son acris. On doit donc dire; Quatorze ou Lue, les Grets, les Libérés ou l'Égalite; en lieu de Quatorze d'Âs, de Bois, de Dames ou de Valois; et Diexayildes, Seialdnes, Quinte, Quartieme ou Tierce ou GèPre, à la Libérée ou à l'Égalite; en lieu de les nommer ou Roi, à la Deme ou en Voir : LA LOI donne seule la désomination de Mahone.
Ann Jean où les Voiets de Truffle ou de Coup ont une valeur particulières, comme un Hecency ou à la Miracle, il faut substituer L'ÉGALITE DE DEVORA ou celle de DANTE.
Evaluation
The French text at the top has been perfectly recognised, the illustration is reconised as one whole image which is understandable although it would be even better if there was some way to instruct it to break the image up into the component squares of the grid. The caption text in the illustration is ignored.
C19th Century
1883 - Histoire des quatre fils Aymon : très nobles et très vaillans chevaliers
A double-page spread with text inbetween and top of illustrations and patterns
https://collections.vam.ac.uk/item/O1744376/histoire-des-quatre-fils-aymon-book-gillot-charles/
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2006AM9812/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
Evaluation
A little unfair to give a double-page spread perhaps, it does recognise at least the two different pages but apart from that no text is recognised.
C20th Century
Cautionary Tales for Children by Hilarie Belloc
https://collections.vam.ac.uk/item/O1370845/cautionary-tales-for-children--book-belloc-hilaire/
Illustrations in-between printed text
img_response = client.ocr.process( model="mistral-ocr-latest", include_image_base64=True, document={ "type": "image_url", "image_url": f"https://framemark.vam.ac.uk/collections/2016JL3856/full/full/0/default.jpg" } ) display(Markdown(get_combined_markdown(img_response)))
Jim, Who ran away from his Nurse, and was eaten by a Lion.
There was a Boy whose name was Jim; His Friends were very good to him. They gave him Tea, and Cakes, and Jam, And slices of delicious Ham, And Chocolate with pink inside, And little Tricycles to ride, And
Evaluation
The text is perfectly recognised, but the illustration of the lion has been ignored for some reason.