Yesterday evening I got into a fight with the SSIS XML Source Data Flow Source. It actually was the first time I used this Data Flow Source. I had no expectations other than, point to your XML-file and get the data. Wrong... I did point to an XML-file, generate a schema and... no columns! I ended up with the error:
Validation error. Data Flow Task: XML Source [000]: A component does not have any inputs or outputs.
In cases like this, Google should be your friend... well I read a lot about SSIS in general, some even about the XML Source but nothing to provide me with answers or even help me out. It was after reading a post by Oren Eini that I decided I was on the wrong trouble shooting track and a good night sleep would be the best next step.
This morning I started with clear vision and an open mind. No answers through Google, nothing useful on Connect, so I tried if I could reproduce my problem with an other document. The document I created was of a very simple structure;
<people>
<person>
<firstname>Jan</firstname>
<lastname>Aerts</lastname>
</person>
<person>
<firstname>Anne</firstname>
<lastname>Mulders</lastname>
</person>
</people>
It worked! But now I had to find out why the document above worked, and the other one didn't. Again I read about SSIS in general and a little something about the XML Source. In particular (SSIS in general), I stumbled upon a post by Jamie Thomson, sounding familiar and one about the XML Source I wish I came across earlier: Using XML Source by Matt Masson. I could already agree with his opening comment, especially the various degrees of success. While reading Matt's article I had this feeling my XML document might actually be to simple... it occurred to me that the XML Source was not just going to read XML, it was trying to represent the XML as one or more tables.
A very simple representation of my original document is;
<person id="1"> <firstname>Jan</firstname> <lastname>Aerts</lastname> </person>
| <row column1="value"> <column2>value</column2> <column3>value</column3> </row> |
The simplest representations Matt used, are:
<rootgoo> <goo> <subgoo>value</subgoo> <moregoo>1</moregoo> </goo> <goo> <subgoo>value</subgoo> <moregoo>2</moregoo> </goo> </rootgoo> | <table> <row> <column1>value</column1> <column2>value</column2> </row> <row> <column1>value</column1> <column2>value</column2> </row> </table> |
AND
<root> <row CustomerID="1" TerritoryID="1" AccountNumber="AW00000001" /> <row CustomerID="2" TerritoryID="1" AccountNumber="AW00000002" /> </root> | <table> <row column1="value" column2="value" column3="value" /> <row column1="value" column2="value" column3="value" /> </table> |
So my document could never be translated to a table... to get back to Oren's post: If only SSIS had told me so with a clear error or even a dialog in the XML Source, that would have saved me a couple of hours!
Or better, since XML Source tries to get data from the XML, it could do a best effort as wrapping something that looks a single row into a table (and to take it one step simpler, represent a single value as a table with just one row and one column.). If you'd like to see some improvement here too, take a moment to vote on FeedbackID 361057.
On a version note, it happens with SQL Server 2005 (SP2) and SQL Server 2008 (PreRelease).