Unicode in a Non-Unicode Database

Ran into an odd issue when moving file attachments from the database to file storage.  When users attached files sometimes those filenames contained unicode characters and the process PeopleSoft employees to addAttachments and/or putAttachments allows those characters to pass into the database, however because the database is non-unicode in order to reference those filenames the unicode character references need to be scrubbed out of the filenames in order for the copyAttachment function to work.

Oracle Allows you to use the function unistr to put a unicode reference into a string.  In order to work through them I found references for unicode characters, this allowed list out the original file name which had references like <80><99> from a unix output file, so I generated a file that removed the <80><99> reference and added in a || unistr(‘\characternumber’) || reference:

update ps_pv_att_db_srv set attachsysfilename = ‘USER12014-01-01-12.12.12.461The_Fish_Restaurant.pdf’  where attachsysfilename = ‘USER12014-01-01.12.12.12.461The_F’ || unistr(‘\2019’) || ‘ish_Restaurant.pdf’;

It isn’t pretty but it works.  I had a whole series of characters in the database that just should not have been there which included commas, quotas, semi-colons, question marks, many odd ball characters like trademark, copyright, registered, french accent characters etc…. Once I had the file attachment record fixed, make sure that you also update the file attachment reference record as well to the same name.

I didn’t try it out in SQL Server as I didn’t have any references to it but there is a UNICODE function that should behave very closely to Oracle’s UNISTR.

PeopleSoft – Move File Attachments (Database to SFTP)

This is the scenario, client needs to move from database stored file attachments to an actual file share, this allows them to remove 1/4 of their entire database storage needs out to a storage area.  Clearly this is a good thing. So, here is the quick run down on what we did:

1. Copy all the existing URL references that contained record:// for backup purposes.

2. Setup a SSH account with a SSH key pair for security

3. Install the SSH key pair into the digital certificates

4. Change all the existing URLs that point to the database storage to the new SFTP storage locations.

5. Run the Orphan cleanup process for File Attachments

6. Copy the file attachments from the OLDURL to the NEWURL

7. Setup the default file attachment server to use the new URL attachment server.

8. Generate a list of all the attachments moved into the file storage and compare that against what was actually in the attachment record

9. Purged the attachment record of all the migrated files.

10.  Updated attachment reference records when necessary if they referenced the old URL in any way.

This is where we ran into an interesting problem with unicode characters.  The process was relatively slow, and we found that to move approximately 200,000 files took about 40 hours.  So to get into production what we did was made a copy of production and run the process in a test environment.  Then we moved all the files that we copied to the production file share and than removed all the entries from the database attachment record that were already moved and then ran the tiny subset of new attachments out to the storage so we could minimize our downtime.