CSV File Extractor

Extracts structured rows from a CSV file. Supports configurable separator, enclosure, escape characters, and bulk size processing.

Configuration

Name

Type

Description

filename

Attribute

Path to the CSV file to be processed

bulkSize

Attribute

Number of rows extracted at once (default: 1000)

separator

Attribute

Character separating values in the CSV (default: ,)

enclosure

Attribute

Character wrapping values containing the separator (default: ")

escape

Attribute

Character escaping special characters inside enclosed values (default: \\)

Example

Input data

category_key

category_product_order

abstract_sku

digital-cameras

16

001

digital-cameras

22

002

digital-cameras

34

003

Pipeline configuration

<csv-file-extractor filename="data/import/products.csv" bulkSize="1000" separator="," enclosure='"' escape="\\" />

Output data

category_key

category_product_order

abstract_sku

digital-cameras

16

001

digital-cameras

22

002

digital-cameras

34

003

Error Handling

Error

Description

File Not Found

Throws an exception if the file does not exist.

Invalid CSV Format

Fails if the file lacks a header row.

Incorrect Encoding

Parsing issues may arise with non-UTF-8 files.

Performance Considerations

Consideration

Description

Use an appropriate `bulkSize`

Small sizes improve memory efficiency but increase processing time.

Ensure CSV encoding is UTF-8

Avoids character misinterpretation.

Validate CSV before extraction

Ensures data integrity.

Key Features

Feature

Description

Supports Custom Separators & Enclosures

Configurable for different CSV formats.

Handles Bulk Extraction

Optimized for large datasets.

Efficient Memory Usage

Processes rows in chunks.

Ensures Data Consistency

Maps headers correctly to data.

Error Handling & Logging

Detects file errors and invalid formats.