Archiving via Warc

Wallygee · December 8, 2022, 10:42pm

Would be interested in hearing from anyone who has successfully archived an Omeka or Omeka-S site into a warc file. I’m trying with wget but the results leave much to be desired. Static pages are captured but the item-level content doesn’t get included. Perhaps it’s not possible, maybe I’m just using the wrong command line arguments. In any case, if someone’s having success I’d love to talk with you.

jflatnes · December 9, 2022, 7:18pm

I haven’t done it personally but I would think with wget’s “mirror” mode you should be getting item pages, as long as the item browse page is linked somewhere on the site that wget can crawl to (usually this is in the main navigation): once you crawl one browse page you should be getting the items linked off that page and then the rest of the browse pages via the pagination.

Wallygee · December 9, 2022, 7:49pm

You’re right…"–mirror" is what I needed. I am getting the right result with this command:

wget “URL for the site” --mirror --warc-file=“nameforWarcFile”

The Browse link was part of the site so that wasn’t my issue. My problem was that I was using --recursive and that wasn’t working very well…

system · December 4, 2023, 7:49pm

This topic was automatically closed 360 days after the last reply. New replies are no longer allowed.