Today's post was written by Neal Sullivan, Senior Database Developer at Sonoma Partners.
As we work with clients to migrate data into Microsoft Dynamics CRM we often run into challenges with the incoming data.
Our migration process typically consists of moving the source data into a staging SQL Server database prior to the actual migration to CRM. Among other reasons, this gives us a place to do data cleansing prior to the CRM migration.
We run into many common issues such as field length differences and data type mismatches that are often found during the data mapping process with the customer. One less common issue we encounter in testing a migration is that some characters in the source data are not supported in CRM when importing data via the API. There are certain non-printable characters that are supported such as carriage-return and line-feed however others like record separator [char 30] or vertical tab [char 11] often are not accepted when migrating data to CRM.
We've developed a common SQL framework we use to allow us to do some data analysis and clean-up of these invalid characters in our staging tables prior to doing our push of the data to CRM. In most cases we run the data migration without any cleansing and capture any failed rows into an error table where we keep the source system record id and the CRM API error message. From there we can determine if any entities had errors around invalid characters. Here is an example of what that error would look like. In our example we are using the KingswaySoft CRM Adapter for SQL Server Integration Services.
Once we know what entities and fields have invalid characters we can start to build our cleanup routine from our base framework.
So which characters are going to cause us a problem?
In our research we found that the Microsoft Dynamics CRM API does not like asci characters below character number 32 (which is the space " " character). So we start with a list of 1 – 31 to represent potential bad characters. We also know that horizontal tab, carriage return, and line feed (CHAR(9), CHAR(13), and CHAR(10) respectively) are valid in CRM and should not be in this list of bad characters.
For the sake of examples, here is a sample script to spin up 10 ‘note’ records with potentially bad data in the NoteText field.
To create the list of bad characters, we used a Common Table Expression (CTE). The below script gives a numbered list containing the asci character values of known-bad characters called, ‘BadCharacters’.
From there it's a matter of writing a query that will join the known bad characters CTE with your stage table and have it review each character in the field that was reported as having bad characters in your error logging of your data migration. Here the SQL Cross Apply clause comes in handy to make this a simple process. In this example we are migrating notes into the CRM notes entity. I know from my error logging shown above that the notetext field in my stage table has some bad characters that CRM did not like. So I cross apply my BadCharacters table with my notes staging table and have it inspect the notetext field for bad characters (using the above CTE definition).
Here are the results of the above query on my data set. I can see exactly what records, what the raw value is currently in that field, what the bad character was reported and where in the string it exists.
After I do my analysis and confirm that it’s acceptable to replace these characters an update script is run against my stage table. Here is my final script that I can include in my data migration process to swap out any bad characters with a blank string in my stage table prior to sending these records to CRM.
It should be noted that the CTE spins up 256 possible rows to cover every possible ascii character. In this case we know that we only want to do the cross apply on a subset of these potential values. But the BadCharacters CTE could be amended to include/exclude any ascii characters.