UTF-8 not recognised in text file attached to an item

I uploaded some plain text files in UTF-8 format to attach to Omeka items. When I view the site in my browser (either the front end or the back end) the text files open OK, but characters like curly quotes are mangled. What could I be doing wrong?

I haven’t checked any server settings – not sure I’d know how – but this is a standard shared Unix server with Reclaim Hosting.

I located one of the offending files in files/original, downloaded it, and on my local machine it shows up correctly as UTF-8. So it isn’t the uploading process that’s causing the trouble.

And the header for the published site says: <meta charset="utf-8">.

This is when viewing the text files directly? Omeka doesn’t really do much if anything with the way files display when you view them directly.

Can you share a link to one? Will be easier to check for possible headers coming from the server.

Many thanks, John. I’ve confirmed that Omeka uploads and stores the text file without breaking anything. For example, this parent item has this associated text file. If I use a browser to download the file, all is good.

But if I click on the file name in Omeka, so as to view the file in my browser, the associated HTML lacks the <charset="utf-8"> tag and so the non-ASCII characters don’t display properly.

In this case I’m not clear who’s responsible for creating the HTML wrapper. You’d probably say it’s not Omeka! But do you have any advice? I’d find it useful to be able to attach plain text files to Omeka items, but then it’s inevitable that people will view them in a browser, and it’s no good if they don’t display properly.

<html><head><meta name="color-scheme" content="light dark"></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">Test file –Â&nbsp;please ignore!

This is a plain Unicode (UTF-8) file with “curly quotes” (and ‘here’) as well as a few accented characters: é ß ö.

It was created in BBEdit 14 on macOS 14.1.2. Line endings are set to “Unix” (LF).

The HTML you’re seeing there is just what the browser uses to display a text file: it’s not something you or Omeka control.

The problem here is the header coming from the server for this text file: it’s Content-Type: text/plain. This is fine, but includes no information on the file’s character set, so the browser has to guess.

Try this: you can tell Apache to include a charset in the Content-Type for text files. Add this line to your .htaccess:

AddCharset utf-8 .txt

That worked perfectly. Sorry that wasn’t an Omeka problem, and many thanks for your help.