Unicode in a Non-Unicode Database

Ran into an odd issue when moving file attachments from the database to file storage.  When users attached files sometimes those filenames contained unicode characters and the process PeopleSoft employees to addAttachments and/or putAttachments allows those characters to pass into the database, however because the database is non-unicode in order to reference those filenames the unicode character references need to be scrubbed out of the filenames in order for the copyAttachment function to work.

Oracle Allows you to use the function unistr to put a unicode reference into a string.  In order to work through them I found references for unicode characters, this allowed list out the original file name which had references like <80><99> from a unix output file, so I generated a file that removed the <80><99> reference and added in a || unistr(‘\characternumber’) || reference:

update ps_pv_att_db_srv set attachsysfilename = ‘USER12014-01-01-12.12.12.461The_Fish_Restaurant.pdf’  where attachsysfilename = ‘USER12014-01-01.12.12.12.461The_F’ || unistr(‘\2019’) || ‘ish_Restaurant.pdf’;

It isn’t pretty but it works.  I had a whole series of characters in the database that just should not have been there which included commas, quotas, semi-colons, question marks, many odd ball characters like trademark, copyright, registered, french accent characters etc…. Once I had the file attachment record fixed, make sure that you also update the file attachment reference record as well to the same name.

I didn’t try it out in SQL Server as I didn’t have any references to it but there is a UNICODE function that should behave very closely to Oracle’s UNISTR.