Encountering multiple errors with CSV Import plugin

I’m trying to do a batch import of item metadata using the CSV Import plugin and running into multiple issues. First off, I want to say that I have double-checked my file encoding multiple times and it is UTF-8. When I first tried to import my CSV file, I got the following error:

2018-08-22T14:47:15-04:00 DEBUG (7): [CsvImport][#2] Queued import.
2018-08-22T14:47:16-04:00 DEBUG (7): [CsvImport][#2] Started import.
2018-08-22T14:47:16-04:00 DEBUG (7): [CsvImport][#2] Running item import loop. Memory usage: 8012384
2018-08-22T14:47:16-04:00 ERR (3): [CsvImport][#2] Zend_Db_Statement_Mysqli_Exception: Mysqli statement execute error : Incorrect string value: ‘\x85"’ for column ‘text’ at row 1 in /var/www/html/omeka-2.6/application/libraries/Zend/Db/Statement/Mysqli.php:214
Stack trace:
#0 /var/www/html/omeka-2.6/application/libraries/Zend/Db/Statement.php(303): Zend_Db_Statement_Mysqli->_execute(Array)
#1 /var/www/html/omeka-2.6/application/libraries/Zend/Db/Adapter/Abstract.php(480): Zend_Db_Statement->execute(Array)
#2 /var/www/html/omeka-2.6/application/libraries/Omeka/Db.php(79): Zend_Db_Adapter_Abstract->query(‘INSERT INTO `om…’, Array)
#3 /var/www/html/omeka-2.6/application/libraries/Omeka/Db.php(255): Omeka_Db->__call(‘query’, Array)
#4 /var/www/html/omeka-2.6/application/libraries/Omeka/Record/AbstractRecord.php(541): Omeka_Db->insert(‘omeka_element_t…’, Array)
#5 /var/www/html/omeka-2.6/application/models/Mixin/ElementText.php(670): Omeka_Record_AbstractRecord->save()
#6 /var/www/html/omeka-2.6/application/models/Mixin/ElementText.php(93): Mixin_ElementText->saveElementTexts()
#7 /var/www/html/omeka-2.6/application/libraries/Omeka/Record/AbstractRecord.php(254): Mixin_ElementText->afterSave(Array)
#8 /var/www/html/omeka-2.6/application/libraries/Omeka/Record/AbstractRecord.php(283): Omeka_Record_AbstractRecord->delegateToMixins(‘afterSave’, Array, true)
#9 /var/www/html/omeka-2.6/application/libraries/Omeka/Record/AbstractRecord.php(548): Omeka_Record_AbstractRecord->runCallbacks(‘afterSave’, Array)
#10 /var/www/html/omeka-2.6/application/libraries/Omeka/Record/Builder/AbstractBuilder.php(67): Omeka_Record_AbstractRecord->save()
#11 /var/www/html/omeka-2.6/application/libraries/globals.php(539): Omeka_Record_Builder_AbstractBuilder->build()
#12 /var/www/html/omeka-2.6/plugins/CsvImport/models/CsvImport/Import.php(707): insert_item(Array, Array)
#13 /var/www/html/omeka-2.6/plugins/CsvImport/models/CsvImport/Import.php(588): CsvImport_Import->_addItemFromRow(Array)
#14 /var/www/html/omeka-2.6/plugins/CsvImport/models/CsvImport/Import.php(336): CsvImport_Import->_importLoop(0)
#15 /var/www/html/omeka-2.6/plugins/CsvImport/models/CsvImport/ImportTask.php(39): CsvImport_Import->start()
#16 /var/www/html/omeka-2.6/application/libraries/Omeka/Job/Process/Wrapper.php(29): CsvImport_ImportTask->perform()
#17 /var/www/html/omeka-2.6/application/scripts/background.php(61): Omeka_Job_Process_Wrapper->run(Array)
#18 {main}

I made sure to put any fields that contained a comma in double quotation marks, and I’m unsure, but it looks like that might have been creating the issue here? I consulted with our Digital Initiatives Librarian who thought that perhaps the string issue was related to the PHP and suggested I try adding a backslash before each quotation mark after checking the following info on this site: https://secure.php.net/manual/en/language.types.string.php

If the string is enclosed in double-quotes ("), PHP will interpret the following escape sequences for special characters:
Escaped characters
Sequence Meaning
\n linefeed (LF or 0x0A (10) in ASCII)
\r carriage return (CR or 0x0D (13) in ASCII)
\t horizontal tab (HT or 0x09 (9) in ASCII)
\v vertical tab (VT or 0x0B (11) in ASCII) (since PHP 5.2.5)
\e escape (ESC or 0x1B (27) in ASCII) (since PHP 5.4.4)
\f form feed (FF or 0x0C (12) in ASCII) (since PHP 5.2.5)
\ backslash
$ dollar sign
" double-quote

Once I tried importing again, I got the following error message:

Clearly the backslash solution caused a formatting issue, but we are stumped as to how to get this import to work.

Can you share the file itself? Or a representative subset that still causes the problem?

That “Incorrect string value” error you mention first is pretty typical of a text encoding issue. I know you said you made sure the file is UTF-8, but there could be some other similar issue at work.

The forum won’t let me attach a CSV file so I’ve uploaded it to Google Drive so I can share a link: https://drive.google.com/file/d/1lQQKCbBEtBzE7d6JbJd9Cp77-csn5wnv/view?usp=sharing

Let me know if you need any more info!

I forgot to add that in initial imports the first item imported and the second item appears to have triggered the error.

This does seem to be a text encoding issue after all, the specific culprit being the ellipsis in the title of the second item. The file appears to be encoded as Windows-1252, the standard Windows “Western” encoding, not UTF-8.

If you’re using Excel to create the output, it tends to want to output in that encoding. LibreOffice Calc gives you an explicit choice of what encoding to use when exporting to CSV and tends to work better.

Thanks for taking a look at this. Should have figured Windows was causing the issue!

You can also pretty easily convert the file you’ve already got: open it up in Notepad and choose “Save As,” and there should be an option for encoding near the bottom: if you change it from “ANSI” to “UTF-8” and save the file, it should be converted to UTF-8.

1 Like

You can also specify the encoding in Excel… Save As, change file type to csv, then click on the Tools button (next to Save) > Web Options > select Unicode (UTF-8).

1 Like

This was exactly what I did with this file but it still didn’t work.

Huh. I wasn’t aware of the “Web Options” route, but maybe that’s because it doesn’t work? Or perhaps it only works for certain file export types, and CSV isn’t one of them?

LibreOffice would be my general recommendation for producing UTF-8 CSV files. The “convert with Notepad” approach I mentioned works fine as long as all the characters involved are part of Windows-1252, but once you’re dealing with anything that isn’t, you’ll run into problems.

I believe Google Sheets also has a painless export to UTF-8 CSV option.

Yes, anything exported from GoogleSheets as csv should work

A post was split to a new topic: Characters disappearing from ends of values with S CSVImport module

This topic was automatically closed after 250 days. New replies are no longer allowed.