Batch import of HTML media

We want to import a database consisting of items (persons) with a html media attached to each item (a long biography). We use Filesideload module for the importation. I uploaded these html files on the server into the FileSideload directory.

We have a ods table which contains metadata and, inside a specific column, the name of the html file (with out extension) for each line. For this column, we indicate “filesideload” as source of the media.

Importation is completed but we have a link to the html in media instead a direct display of the html content (default behavior when you add manually a html media to an item with html ingestor).

How can we have a result similar to the html ingestor but for a batch importation?

When I add an HTML media to an item in S, it shows up in the right hand drawer when I view the item, and to see the content of the HTML media item, I have to click on that media link. 1. Is this the same behavior you are seeing with manually added HTML media? 2. What behavior are you seeing with sideload + csv import?

Thanks for the answer.

  1. No, when I add manually some html text, I can preview the text (with markup) directly. When I visit the item I have to click on the media title like you but after that I have access to the text directly.

Here is how I proceed for: create item, add html media, use the minimal html editor for adding title and pasting some text. Once I have done that, I can actually read the html text when visiting admin/media/your_media_id

Result:

  1. With sideload and CSVImport, I have access to a link which redirect me to the html text but this link is something like files/original/62beb501e321e0f5e9e28fdb94634d79833ebd8a.html and is display outside Omeka S css template.

Here is a screenshot of the same admin/media/your_media_id page but for a html file importer from sideload.

I would like to be able to batch import html media but with a display such as the first screenshot.

Is the issue here just the size of the HTML you’re importing, that has you separating them out into files?

The CSV importer has an “HTML” option that works on cell content that allows you to batch upload HTML media, but the file-based options just upload files. There’s no existing option to take a file and use it as the content for an HTML media.

In fact, we have plenty of html file for historical reasons (previous website was a wiki, each page has been exported and we have recreated metadata inside an ods file for the new Omeka S ). The HTML file are quite big (few A4 pages each) and, for this reason, not easy to merge with the ods file containing the metadata.

What is your advice? Merge the html file into the ods stylesheet? Is there some way to automate this (there are 160 html files)? I can make a direct import in the mysql database if needed.

Merging into the sheet is the best way to make it work with what the importer can do.

Direct import into the database is an option but I’m not sure if it will be more efficient…

A new media ingester that takes in a URL to a file and produces an HTML media at the end is an option too, but it would require a decent amount of work to do.