XML File Extractor¶
Extracts structured data from an XML file and transforms it into rows for ETL processing. Supports bulk processing and integration with pipeline configurations.
Configuration¶
Name |
Type |
Description |
|---|---|---|
|
Attribute |
Path to the XML file to be processed |
|
Attribute |
Number of rows extracted at once (default: 200) |
Example¶
Input data¶
Example XML file:
<root>
<row>
<id>1</id>
<name>Product A</name>
<price>10.99</price>
</row>
<row>
<id>2</id>
<name>Product B</name>
<price>15.49</price>
</row>
<row>
<id>3</id>
<name>Product C</name>
<price>8.99</price>
</row>
<row>
<id>4</id>
<name>Product D</name>
<price>12.75</price>
</row>
</root>
Pipeline configuration¶
<xml-file-extractor filename="data/import/products.xml" bulkSize="200" />
Output data¶
id |
name |
price |
|---|---|---|
1 |
Product A |
10.99 |
2 |
Product B |
15.49 |
3 |
Product C |
8.99 |
4 |
Product D |
12.75 |
Error Handling¶
Error |
Description |
|---|---|
File Not Found |
Throws an exception if the XML file does not exist. |
Invalid XML Format |
Throws an exception when the XML structure is invalid. |
Bulk Processing Issue |
Ensures correct batch sizes based on |
Performance Considerations¶
Consideration |
Description |
|---|---|
Use an appropriate ``bulkSize`` |
Small sizes improve memory efficiency but increase processing time. |
Ensure XML is correctly formatted |
Prevents parsing issues. |
Use optimized XML parsing |
|
Key Features¶
Feature |
Description |
|---|---|
Supports Bulk Extraction |
Processes data in chunks. |
Handles XML-to-JSON Transformation |
Enables easy data extraction. |
Optimized for Large XML Files |
Uses batch processing for efficiency. |
Pipeline Integration |
Compatible with ETL pipeline configurations. |