Tagging specific words or phrases?

I’m working with a new documentary editing project and we are preparing to publish our transcripts and images online. We’re considering Omeka, but are concerned about the lack of ability to tag within the document itself. Some of our documents are lengthy letters, journals spanning hundreds of pages or autobiographies of a few hundred pages. There are perhaps 10,000 documents all together.

I’ve been told that in Omeka we wouldn’t be able to tag words or phrases within the document, but would only be able to provide metadata for the document as a whole. Is this accurate (in both Omeka Classic and Omeka S)? Or is there a way to “tag” or link to specific words or phrases within the document itself? Or would it be possible to write a plugin that could enable that? I’m not a tech person, and am just getting my feet wet with these programs, so would most grateful for any insights. Thank you!

Tagging can have a number of meanings, so some of the answer depends on your approach.

It also sounds like you might want to think about how you’re presenting these very long documents. Are you expecting people to page through the whole thing? Are you trying to indicate concepts on a specific page or in a certain chapter? If your documents are OCRd or machine readable, you might just use a publication system that allows search. But if not, you want to stop and think about how you want/expect users to navigate these hundreds of pages.
How would you expect ‘tags’ to be linked to a specific point in a document, in any sort of system (not just Omeka)?
Since you mention transcriptions, it’s might be worth thinking about search - see more info for Classic and S.

You might find it helpful to browse the Omeka Classic Showcase and Directory, and the Omeka S Directory, to get a sense of how people use Omeka to present information.

More specifically, take a look at: the Colored Convention Project’s Omeka site (and their recent blog posts about their process); Papers of the War Department, which is built on Omeka S and includes document-level keywords for persons, places, and ideas; the Jane Addams Digital Edition; and the Civil War GOvernors of Kentucky.

Tagging can have a number of meanings, so some of the answer depends on your approach.

Thanks so much for your reply! You have a lot of great thoughts. I’ll try to respond to each. The forum only allows two links for new users per post, so I’ll have to reply in multiple posts. :slight_smile:

Good point about “tagging.” Perhaps “linking” is a better term? Without knowing exact technical terms, by “tagging” I meant creating a hyperlink which links a word or phrase in the transcript of a document to a bio/glossary entry about that person (or simply to a list of other documents about this person/place/event). An example is found at the link below. In the transcription of this document, a click on “Chas W. Quiggins" links you to his bio page. I’m open to learning if there is a more specific or better way to refer to that function that I called “tagging,” since you make a good point that there are multiple meanings for that term.

It also sounds like you might want to think about how you’re presenting these very long documents. Are you expecting people to page through the whole thing? Are you trying to indicate concepts on a specific page or in a certain chapter? If your documents are OCRd or machine readable, you might just use a publication system that allows search. But if not, you want to stop and think about how you want/expect users to navigate these hundreds of pages.

Excellent points. Thank you. These are exactly some of the questions that we’re working through at the moment. Since ours are mostly handwritten documents from the 19th century, we will provide transcripts for each. We are still undecided on the presentation of our documents, in part until we determine if we can tag within a document transcription.

If we could tag within each document, we could provide a search function that would search the transcripts, annotation, metadata, etc. across the website, and then return results where the user is sent directly to the page in the journal or autobiography that contains their search term (see, for example, a search for “Edward Partridge” at the Joseph Smith Papers website returns a list of results for Edward Partridge across the website, linked to the exact spots he is found in various documents, including the specific pages, such as this one, of a 200-page journal). An ability to link to exact spot within a document could possibly eliminate the second search step of linking users only to the document itself (which might be 20 or more pages even if we break longer documents into sections), and then requiring them to do a second word search through the pages of the transcription once they get to the transcription. (Does that make sense?) I don’t know that it’s possible to do this in Omeka, from what I understand though (to tag or link within the document)? And most Omeka sites I’ve seen deal with shorter documents, so I don’t know if it’s been a concern in other sites.

How would you expect ‘tags’ to be linked to a specific point in a document, in any sort of system (not just Omeka)?

I don’t yet understand how all of the backend formatting works, but on the front end, I would expect them to be links within the document transcription itself. They would probably be a different color to make them stand out. I’m interested in something similar to how one can “tag” in XML to specific words, as Civil War Governors of Kentucky, referenced above, does (I believe they use XML for their links within the text) or how they link at the Papers of Abraham Lincoln Digital Library (see here for a randomly chosen example), though I don’t know how the Lincoln Papers formats on the back end. Did that even answer what you were asking?

Since you mention transcriptions, it’s might be worth thinking about search - see more info for Classic and S.

Thank you. We are definitely interested in making sure that our transcriptions are searchable and will be learning more about the various options.

You might find it helpful to browse the Omeka Classic Showcase and Directory, and the Omeka S Directory, to get a sense of how people use Omeka to present information.

Thank you. Yes, I’ve looked in the past few weeks at a few dozen websites, and that has been helpful! The focus of many does seem to be on much shorter documents, such as letters or physical artifacts, which is partly why we’ve been trying to think through how to present lengthy documents (and why tagging within a document could be a great resource).

More specifically, take a look at: the Colored Convention Project’s Omeka site (and their recent blog posts about their process); Papers of the War Department, which is built on Omeka S and includes document-level keywords for persons, places, and ideas; the Jane Addams Digital Edition; and the Civil War GOvernors of Kentucky.

Thanks for the suggestions! I had not seen the Colored Convention Project’s Omeka site and blog posts. I just spent some time with that site and looked again at the Papers of the War Department. The other sites I have spent several hours browsing through in the past few weeks and have been impressed with them. Omeka seems to be a fabulous product. Thank you again so much for your help!

1 Like

Thank you for the example of linking/tagging. The Civil War Governors of Kentucky have a detailed report on using Omeka - you might reach out to them about how it all fits together, but I suspect the tagging you’re seeing there is produced via the Dropbox TEI plugin that it looks like someone on their team built.

I’m not sure what platform the Joseph Smith Papers is running on, so I don’t know how their search works. It does seem like each page is being treated as an object or record - if i click on a link from the search page you pointed to, I only get the specific page - note that if you flip to page 1, it’s a different URL.

Which gets back to the question of how you think people are going to be engaging with these very long documents. If you’ve got something with 200 pages, that’s going to be a very long transcription to scroll through on a single webpage. Many projects, old and new, break out individual pages as records, with the ability to move forward and backward in the document coded into the page navigation somewhere.

I also want to point out that just because something doesn’t currently exist for Omeka doesn’t necessarily mean it can’t be built, as the Civil War Governors of Kentucky shows.

Thank you for your help! :slight_smile:

Hi @Phebe38,

It seems that the Text Encoding Initiative (TEI) standard is what is mainly used to structure and tag transcriptions of documents in the way you seem to be interested in. It is a very thorough standard, but unfortunately the tooling related to viewing the tagged documents is lacking.

Our library had a small project where letters from our special collections were transcribed to TEI. As part of the project, I investigated options available at the time for displaying those transcriptions including hyperlinking and searching.

Omeka and Omeka-S could host the documents, but a basic installation makes it difficult to manage hyperlinks, as all documents are given an unique identifier on upload, but you need to add the hyperlinks ahead of time to the document before you know what that unique identifier may be. It appears that the Civil War Governors of Kentucky project addressed that issue by using the Clean Url plugin to use a specific identifier in URLs that is already determined.

We also wanted documents that referenced specific names and dates to be reachable at the reference to that name or date from a search. It appears that the Kentucky project does not have a similar feature, instead providing a generic full-text search, and a search for metadata applied to the Item record.

We ultimately went with a custom interface built around the BaseX XML Database instead of Omeka, though I can’t say that was particularly easy. I would certainly reevaluate if better tools are added to Omeka or Omeka-S, or were provided by some other system.