XML File Extractor

Extracts structured data from an XML file and transforms it into rows for ETL processing. Supports bulk processing and integration with pipeline configurations.

Configuration

Name

Type

Description

filename

Attribute

Path to the XML file to be processed

bulkSize

Attribute

Number of rows extracted at once (default: 200)

Example

Input data

Example XML file:

<root>
    <row>
        <id>1</id>
        <name>Product A</name>
        <price>10.99</price>
    </row>
    <row>
        <id>2</id>
        <name>Product B</name>
        <price>15.49</price>
    </row>
    <row>
        <id>3</id>
        <name>Product C</name>
        <price>8.99</price>
    </row>
    <row>
        <id>4</id>
        <name>Product D</name>
        <price>12.75</price>
    </row>
</root>

Pipeline configuration

<xml-file-extractor filename="data/import/products.xml" bulkSize="200" />

Output data

id

name

price

1

Product A

10.99

2

Product B

15.49

3

Product C

8.99

4

Product D

12.75

Error Handling

Error

Description

File Not Found

Throws an exception if the XML file does not exist.

Invalid XML Format

Throws an exception when the XML structure is invalid.

Bulk Processing Issue

Ensures correct batch sizes based on bulkSize.

Performance Considerations

Consideration

Description

Use an appropriate ``bulkSize``

Small sizes improve memory efficiency but increase processing time.

Ensure XML is correctly formatted

Prevents parsing issues.

Use optimized XML parsing

simplexml_load_file() loads the file efficiently.

Key Features

Feature

Description

Supports Bulk Extraction

Processes data in chunks.

Handles XML-to-JSON Transformation

Enables easy data extraction.

Optimized for Large XML Files

Uses batch processing for efficiency.

Pipeline Integration

Compatible with ETL pipeline configurations.