Scripto and Transkribus?

Hello All, and Happy New Year!

I’ve been looking into Scripto for transcription of both letter and manuscripts on a site I am planning, and at the same time I have been tinkering with Transkribus, which is an AI-driven platform for handwritten text recognition. Transkribus | AI powered Handwritten Text Recognition

Of course, it would be wonderful if these could be linked up in some way – sothat crowdsourced transcription could also power the training of an algorithm for recognising specific handwritings… That would mean that the investment of time of crowdsourced volunteers could have exponentially bigger dividends for them as researchers, and in general for Omeka S sites with handwritten document collections we wish to share online.

Is it possible? Could it be done using the background Wikimedia sites that Scripto requires as some kind of backbone?

Just a thought! :wink:

1 Like

In fact, we’re at work on something along those lines this year. It’ll be a long road, but we also think it’s a good idea!

1 Like

Great News! It’s a fantastic development – thanks! I know it’s hard to estimate this kind of interoperability project, but do you have a sense of when the Omeka S team might be trialing/Beta-ing such functionality?

I wish I had a way to estimate that, but alas, things are still kind of foggy on that front. We’ll let you know when we get closer, though.

1 Like

Hello @CBRfan,

This is not exactly what you had in mind, but I work on an Omeka/Scripto platform : https://transcrire.huma-num.fr/
There is a project for which part of the manual transcription work was used as training data for HTR. A few months ago, at the request of the researcher, I uploaded the HTR back into Scripto so that they could correct the errors from the automated transcription. Unfortunately, Scripto does not work with segmentation and image coordinates so it’s only the txt that was uploaded (it would be interesting to see this using IIIF annotation for example).

To do so, I used the Mediawiki extension Data Transfer which allows me to create/modify Mediawiki pages from a single CSV file. With Openrefine I was able to unite in one single file all the .txt files and map the txt filenames to correspond to the mediawiki pages (“scripto project number”:“item identifier”:“media identifier”). The transcriptions will automatically appear in Scripto.

I hope this can be of help for you or anyone else.
And i’m curious to see how Scripto evolves this year, keep me posted too!

2 Likes

Thank you! It is very generous of you to share these tips and tests. I too have researchers who are transcribing letters/manuscripts onto their desktops and I can see the huge value of their work in terms of HTR training for the medium-sized corpus that we are dealing with. I will share this with the colleague who is building our Scripto instance, and he may contact you. https://transcrire.huma-num.fr/ is a fantastic site - it was wonderful to discover its contents as a bonus to your message! All best wishes

2 Likes

Re: Transkribus

Maybe a bit of news that might interest the Omeka team if they’re working on Scripto and Transkribus. The company Teklia has made its arkindex tool (to treat HTR) open source: https://teklia.com/blog/arkindex-goes-open-source/

Hello Alyx, thank you so much for informing us all about this interesting development. I did not know about Teklia, and am happy to have been introduced to their work. Good that they are making their Arkindex tool open source. Are you thinking of working with it on transcrire.huma.num? All best wishes!