File URL read error for Book Reader objects

sfcinematheque · October 24, 2023, 3:53am

I am using CSV import to import objects that are hosted on the Internet Archive, which has worked with no issues in the past. However when I use a file URL (which is displayed in a book reader viewer on IA), I run into errors. Although the files do import into our Omeka instance, when we click into the object so that it can be viewed, we get a 404 error code. What appears to be happening is that the full file URL (https://ia800509.us.archive.org/BookReader/BookReaderImages.php?zip=/22/items/casfc_000024/casfc_000024_access_jp2.zip&file=casfc_000024_access_jp2/casfc_000024_access_0001.jp2&id=casfc_000024&scale=2&rotate=0) is being shortened or being read incorrectly. The file on IA is .jp2, but the error I receive reads: ‘/files/original/70d159118f37b32e7413e64415e00c6b.php is not a valid URL.’ Seems like it’s pulling the wrong file extension? When I used a url shortener for the IA file URLs, this remedied the issue, however we don’t have a plan for a URL shortener and can’t scale this workflow for 100s of files. Any ideas why this might be happening?

jflatnes · October 25, 2023, 7:52pm

Do the thumbnails show? I suspect the import actually worked fine and the only problem is the filename; the file is actually a JPEG just with the wrong extension. The file that’s returned there is actually a JPEG, not a JP2 (it’s probably just a derived version used to make the image visible on the IA website).

Here the problem is that we can’t properly detect what extension should be used for the file: we look at the URL and see that the path ends in .php so that’s what we pick, and of course that’s wrong in this case. You can’t normally store a .php file at all with default settings, so you presumably added it to the allowed extensions or disabled file upload validation?

We don’t have any way for you to override the extension that gets detected from the URL, so I can only think of one way forward: you could do the import as is, then rename the .php files in the files/original folder to .jpg. The problem is, you’d also need to modify the database, the filename column of the omeka_files table, to do the same renaming. Obviously that’s a somewhat involved solution.

sfcinematheque · November 2, 2023, 7:42pm

This is a little too involved for our current timeline (and my skillset!), so I’m exploring alternate options. Appreciate your help – thank you!