Scripto and Transkribus?

CBRfan · January 8, 2024, 10:40am

Hello All, and Happy New Year!

I’ve been looking into Scripto for transcription of both letter and manuscripts on a site I am planning, and at the same time I have been tinkering with Transkribus, which is an AI-driven platform for handwritten text recognition. Transkribus | AI powered Handwritten Text Recognition

Of course, it would be wonderful if these could be linked up in some way – sothat crowdsourced transcription could also power the training of an algorithm for recognising specific handwritings… That would mean that the investment of time of crowdsourced volunteers could have exponentially bigger dividends for them as researchers, and in general for Omeka S sites with handwritten document collections we wish to share online.

Is it possible? Could it be done using the background Wikimedia sites that Scripto requires as some kind of backbone?

Just a thought!

SharonLeon · January 8, 2024, 3:25pm

In fact, we’re at work on something along those lines this year. It’ll be a long road, but we also think it’s a good idea!

CBRfan · January 8, 2024, 3:48pm

Great News! It’s a fantastic development – thanks! I know it’s hard to estimate this kind of interoperability project, but do you have a sense of when the Omeka S team might be trialing/Beta-ing such functionality?

SharonLeon · January 8, 2024, 4:07pm

I wish I had a way to estimate that, but alas, things are still kind of foggy on that front. We’ll let you know when we get closer, though.

Alyx_TJ · March 11, 2024, 12:30pm

Hello @CBRfan,

This is not exactly what you had in mind, but I work on an Omeka/Scripto platform : https://transcrire.huma-num.fr/
There is a project for which part of the manual transcription work was used as training data for HTR. A few months ago, at the request of the researcher, I uploaded the HTR back into Scripto so that they could correct the errors from the automated transcription. Unfortunately, Scripto does not work with segmentation and image coordinates so it’s only the txt that was uploaded (it would be interesting to see this using IIIF annotation for example).

To do so, I used the Mediawiki extension Data Transfer which allows me to create/modify Mediawiki pages from a single CSV file. With Openrefine I was able to unite in one single file all the .txt files and map the txt filenames to correspond to the mediawiki pages (“scripto project number”:“item identifier”:“media identifier”). The transcriptions will automatically appear in Scripto.

I hope this can be of help for you or anyone else.
And i’m curious to see how Scripto evolves this year, keep me posted too!

CBRfan · March 17, 2024, 11:11am

Thank you! It is very generous of you to share these tips and tests. I too have researchers who are transcribing letters/manuscripts onto their desktops and I can see the huge value of their work in terms of HTR training for the medium-sized corpus that we are dealing with. I will share this with the colleague who is building our Scripto instance, and he may contact you. https://transcrire.huma-num.fr/ is a fantastic site - it was wonderful to discover its contents as a bonus to your message! All best wishes

Alyx_TJ · May 3, 2024, 1:24pm

Re: Transkribus

Maybe a bit of news that might interest the Omeka team if they’re working on Scripto and Transkribus. The company Teklia has made its arkindex tool (to treat HTR) open source: https://teklia.com/blog/arkindex-goes-open-source/

CBRfan · May 4, 2024, 10:29am

Hello Alyx, thank you so much for informing us all about this interesting development. I did not know about Teklia, and am happy to have been introduced to their work. Good that they are making their Arkindex tool open source. Are you thinking of working with it on transcrire.huma.num? All best wishes!

Alyx_TJ · October 22, 2024, 9:04am

Hi again @CBRfan. I am sorry I left you on read all these months ago.
For now, there is no project to develop new HTR tools for Transcrire, although it might be interesting to ease the transfer of transcriptions from one software to another. And we’ll see what Digital Scholar has in store for us with Scripto.

If you’re interested in more swiss-knife tools, TACTEO (developed at the Grenoble Alps University) might do the trick as it has some HTR functionalities, but we’re far from Omeka.

CBRfan · October 23, 2024, 1:08pm

Thanks @Alyx_TJ ! It is nice to hear from you again – no worries about the time-lag to response, we are all busy . TACTEO looks interesting and I will talk to my developer friend about it. Right now I am trying to find ways to harmonise / crosswalk between DC Terms (Omeka S) and Bibo Terms (Zotero) so when I import reference literature into my Omeka S site, the search functionality will work across both sets of terms. Will be posting about that elsewhere and will ‘@’ you in case you have some brilliant library advice! CBRfan

system · October 18, 2025, 1:09pm

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.