When we use Excel (Excel source in case of excel 2003 & OLEDB Source in case of excel 2007) as our data source in SSIS, we come across the fact that SSIS decides over the data type of a column of excel sheet based upon the contained metadata. Data source scans the metadata of the excel sheet and determine the data type of columns.
But in this process, there are certain other parameters or settings which also play an important role in this and once decided; all other values with other data type in the same column are imported as NULL.
Example: If we have string and numbers values in the same column then either it will take it as string or number while configuring the excel data source. But again if it considers it as number then string values would be imported as NULL and in case no. of string values are more (in first 8 cells) then it would take string data type which would allow both the values to be imported.
And interesting point to be noted here is, even if we forcefully select ‘DT_WSTR’ (string) data type in the first case, then also it would behave the same.
There are the following settings which cause this unwanted output & these are also the ways to configure them to have the correct results:
- Setting “IMEX=1” in the connection of the Excel sheet. This can be set in the connection string in case of Excel Source (Excel 2003) and can be set using ‘Extended Properties’ in case of OLE DB Source (Excel 2007). This setting tells that excel sheet is having mixed data types and thus import it as ‘Text’ values.
Then we can forcefully select ‘DT_NTEXT’ data type in the data source.
- Registry setting “TypeGuessRows=8”: This setting/value tells the connection manager to consider the first 8 rows of a column and decide over its data type based upon the maximum no. of values type in those 8 values. So if we have first 5 rows/cells of a column of type string but rest of the 100 values are numbers then it would import all the numbers as NULL since data type in this would be String (DT_WSTR). This will be applied for all the packages i.e. on server level.
So if we use a higher value for TypeGuessRows (say 20) then it would take it as Integer.
But again here also it would import rest of the string values as NULL.
- Using ‘SELECT * FROM [SheetName$]’ query instead of direct table/sheet name. This will allow all type of values to get imported as ‘DT_WSTR’ which is quite acceptable because anyhow it’s getting all the values of a column. This is really a good option to select.
Note: In all the cases where in we are getting string data (along with number data) be means of some or the other setting, we have to have String column in the destination.