Python script to import from CSV and loop

Timoshi · October 26, 2024, 8:48pm

I have never, before July, thought about APIs or Python or used Omeka. I need to import a bunch of data into Omeka.net to update/correct data on a large scale. I am told the APIs are the same and I can post here.

I have now made a dozen or so scripts in Python (with guidance from the Omeka Team) and updated certain collections with line-by-line field entries, even for hundreds of items (I cheated and used FileMaker Pro to have a calculation make the scripts), but when there is a great deal of formatted text, HTML, or special characters in a field, it takes a lot of manual editing.

It has been implied that I could “easily” create a script that would instead loop through a CVS file and make that unnecessary.

Since I have NEVER worked in these matters (Python, API, etc.) before, it would be extremely useful to see an example of a Python script that would do a loop through a CSV item by item and replace the data in Omeka.

jflatnes · October 28, 2024, 3:11pm

Reading out the data to send to the API for an update from a CSV file in Python isn’t necessarily “easy” if you aren’t familiar with Python, but is a small amount of code:

import csv

update_data = {}
with open('updates.csv', newline='') as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        id, element_id, text = int(row['id']), int(row['element_id']), row['text']
        if id not in update_data:
            update_data[id] = {'element_texts': []}
        update_data[id]['element_texts'].append({
            'element': {'id': element_id},
            'text': text,
            'html': False,
        })

This code assumes you have a CSV file named updates.csv in the same folder as your script, with columns with headings called id, element_id, and text, where id is the ID of the item to update, element_id is the ID of the element to update for that item, and text is the text to set for that element and item.

After this code the update_data variable will have data to send to the API, keyed by each item’s ID, and you can loop over it and make a request for each item. The code to make a request should be the same from your script you already have that makes a single request, just inside a loop:

for id, data in update_data.items():
    # the code to make one update would go here, using the id and data variables

Timoshi · October 30, 2024, 9:53pm

Thank you! I’ll be experimenting with this over the coming weekend .

Timoshi · November 2, 2024, 8:35pm

Thank you!! I have just successfully updated 2 items using this script with a little tweak or two. I wish the help desk folks would have suggested the forums 2 months ago; would have saved me a LOT of hours. I am still very surprised that this level of technical effort is needed for managing a collection in Omeka, but I am very grateful that you shared this expertise.

Timoshi · November 2, 2024, 8:40pm

In case someone else like me comes along looking for it, the end result combined code (I am slow, took a second to realize this needed to be in one script together, lol) was as follows:

import csv

update_data = {}
with open('updates.csv', newline='') as csv_file:
    reader = csv.DictReader(csv_file)
    for row in reader:
        id, element_id, text = int(row['id']), int(row['element_id']), row['text']
        if id not in update_data:
            update_data[id] = {'element_texts': [    ]}
        update_data[id]['element_texts'].append({
            'element': {'id': element_id},
            'text': text,
            'html': False,
        })


import json
import urllib.request

   # API url, place your own inside the quotation marks
api_url = 'https://myURL/api'



# API key, place your own inside the quotation marks
key = 'myAPIkey'



# this loops over the data given above and sends one API request for each item
for id, data in update_data.items():
    url = '{}/items/{}?key={}'.format(api_url, id, key)
    request = urllib.request.Request(url=url, data=json.dumps(data).encode('utf-8'), method='PUT', headers={'Content-Type': 'application/json'})
    urllib.request.urlopen(request)