I’m uploading a number of PDFs to our archives, and before uploading them, I put a lot of work into remediating them for accessibility so that they are fully searchable and tagged for screen reader compatibility.
However, in spot-checking a few of the files I’ve uploaded recently, some of the PDFs don’t retain the accessibility tagging when downloaded from Omeka.
I’ve just discovered this, so I don’t have a whole lot of information other than this so far. At first I thought it was happening only on larger (70-some page) files, but now I’ve found some smaller (2-3 page) files that have had the same thing happen.
There is a small chance that this is a “user error” issue and that I accidentally uploaded the non-remediated originals instead of the original versions. However, I’d be more convinced of this being the answer if I’d only found a couple instances; this seems to be happening often enough that I’m suspecting other factors, as I’m not entirely convinced that my processes are that clumsy.
So, I figured I’d ask here to see if there’s a possibility that Omeka is doing any processing on the PDFs beyond the renaming the file and and extracting the first page to generate a preview image, and if so, if that process might be stripping the tag info from the file.
(I could add links if requested, but they wouldn’t really show much more than that some PDFs are properly tagged and some aren’t, so I haven’t done that yet.)