I have never, before July, thought about APIs or Python or used Omeka. I need to import a bunch of data into Omeka.net to update/correct data on a large scale. I am told the APIs are the same and I can post here.
I have now made a dozen or so scripts in Python (with guidance from the Omeka Team) and updated certain collections with line-by-line field entries, even for hundreds of items (I cheated and used FileMaker Pro to have a calculation make the scripts), but when there is a great deal of formatted text, HTML, or special characters in a field, it takes a lot of manual editing.
It has been implied that I could “easily” create a script that would instead loop through a CVS file and make that unnecessary.
Since I have NEVER worked in these matters (Python, API, etc.) before, it would be extremely useful to see an example of a Python script that would do a loop through a CSV item by item and replace the data in Omeka.
Reading out the data to send to the API for an update from a CSV file in Python isn’t necessarily “easy” if you aren’t familiar with Python, but is a small amount of code:
import csv
update_data = {}
with open('updates.csv', newline='') as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
id, element_id, text = int(row['id']), int(row['element_id']), row['text']
if id not in update_data:
update_data[id] = {'element_texts': []}
update_data[id]['element_texts'].append({
'element': {'id': element_id},
'text': text,
'html': False,
})
This code assumes you have a CSV file named updates.csv
in the same folder as your script, with columns with headings called id
, element_id
, and text
, where id
is the ID of the item to update, element_id
is the ID of the element to update for that item, and text
is the text to set for that element and item.
After this code the update_data
variable will have data to send to the API, keyed by each item’s ID, and you can loop over it and make a request for each item. The code to make a request should be the same from your script you already have that makes a single request, just inside a loop:
for id, data in update_data.items():
# the code to make one update would go here, using the id and data variables
1 Like
Thank you! I’ll be experimenting with this over the coming weekend .
Thank you!! I have just successfully updated 2 items using this script with a little tweak or two. I wish the help desk folks would have suggested the forums 2 months ago; would have saved me a LOT of hours. I am still very surprised that this level of technical effort is needed for managing a collection in Omeka, but I am very grateful that you shared this expertise.
In case someone else like me comes along looking for it, the end result combined code (I am slow, took a second to realize this needed to be in one script together, lol) was as follows:
import csv
update_data = {}
with open('updates.csv', newline='') as csv_file:
reader = csv.DictReader(csv_file)
for row in reader:
id, element_id, text = int(row['id']), int(row['element_id']), row['text']
if id not in update_data:
update_data[id] = {'element_texts': [ ]}
update_data[id]['element_texts'].append({
'element': {'id': element_id},
'text': text,
'html': False,
})
import json
import urllib.request
# API url, place your own inside the quotation marks
api_url = 'https://myURL/api'
# API key, place your own inside the quotation marks
key = 'myAPIkey'
# this loops over the data given above and sends one API request for each item
for id, data in update_data.items():
url = '{}/items/{}?key={}'.format(api_url, id, key)
request = urllib.request.Request(url=url, data=json.dumps(data).encode('utf-8'), method='PUT', headers={'Content-Type': 'application/json'})
urllib.request.urlopen(request)