Hi all - looking for some assistance/advice here. I am trying to import a large number of images (just over 5000 child objects associated with 2092 parents) via the CSV Import module. My instance of Omeka is on a Reclaim server, and the images are on a server in my library. I’m pointing to them using a URL in the CSV file. Thus far, I’ve only been able to append images to 749 of the 2092 parents, with multiple imports, as each import was failing at around 98 objects (or fewer). Reclaim has tried a few different solutions - they increased the PHP upload limit and max post sizes; advised me to disable the Log module; increased the number of database connections; adjusted the max allowed packets, etc. but none of these have really fixed the issue of the aborted ingests. Any suggestions at this point? I could try File Sideload, but I really don’t want to upload all of those JPEG files to the Reclaim server, as I’ll need to migrate my repo to our server at some point. It just makes sense to keep those files on our server and not have them duplicated elsewhere. This is so frustrating, and I’d really like some input, as Reclaim can’t seem to do anything else to resolve this. They referred me to an Omeka dev. Sigh.
What’s happening to these jobs, are they reporting an error? Is there anything in the job log?
You’d previously done a large import of images, right? You didn’t have this problem then?
I’ve previously done large imports using IIIF manifest files, but not with a CSV file. I wasn’t even getting logs before (I am now), but the past few errors have showed a disconnect from MySQL and issues with the max_allowed_packets or wait_timeout. This was Reclaims’s response: “As these are both server wide changes and this is a Shared Hosting account, we are only able to adjust them so much without causing issues.”
They advised me to try another import, and at this point, it’s ingesting even fewer images than before.
These aren’t kinds of errors we’ve seen people mention much for imports… they might be somewhat specific to your hosting situation.
When you do an import, there’s a setting under the Advanced tab for “number of rows to process by batch”… you could try reducing this number and seeing if that helps at all; it will make the importer work in smaller “chunks.” But I’m not certain it will help in your situation.
A few things i think about when i have these issues:
How large are your JPG files?
How many lines are on the csv file? 5000? or 1000? Whenever this happens, I reduce the number of imports to figure out what is happening.
can you do a test where you copy your images onto Reclaimhosting server to see if you get the same issue? If not, then like jflatnes stated, it could be a hosting issue.
I have already mentioned this above. My CSV file is 2092 lines, and I have tried smaller imports with a smaller CSV file (around 100 lines). Each time I tried that, it imported fewer and fewer images. Reclaim did as much as they can to assist, but acknowledged that they could not change more settings without affecting other installations on the shared server. I may indeed have to change hosts at this point. I’m looking in recommended by someone in this forum.
I’ve also uploaded 10 images to the Reclaim server to test using File Sideload, but that’s not an option right now for a few reasons.
how about URL upload from Reclaimhosting?
I cannot keep all 5000+ images on the Reclaim server because Reclaim is a temporary host. They are currently hosted on one of our own servers here at my workplace. Right now I am using a URL for import (pointing to the images on our server via a URL in the CSV file).
You should look into those error messages, the “exception” ones: they’re saying that a file wasn’t uploaded due to some validation problem, say with the extension or mime-type of the file you’re trying to upload. That wouldn’t be a problem specific to doing an import.
Pasting the full text of one or more of them could be revealing.
It’s a CSV UTF-8 file.
I see what the issue is - trying again. Found the stack trace.
This is my process as well - uploading my images to a temp folder on Reclaim, entering the URL in a spreadsheet with all my other data, saving as a CSV UTF-8 file, and then uploading. Sometimes, it “skips” items. Sometimes I have the filename in the URL wrong. But sometimes there seems to be absolutely no reason for the skip.