JSON Line File Extractor

Extracts structured data from a JSON Lines (.jsonl) file and transforms it into rows for ETL processing. Supports bulk processing and integration with pipeline configurations.

Configuration

Name

Type

Description

filename

Attribute

Path to the JSON Lines file to be processed

bulkSize

Attribute

Number of rows extracted at once (default: 200)

Example

Input data

Example JSON Lines file:

{"id": 1, "name": "Product A", "price": 10.99}
{"id": 2, "name": "Product B", "price": 15.49}
{"id": 3, "name": "Product B", "price": 8.99}
{"id": 4, "name": "Product B", "price": 12.75}

Pipeline configuration

<json-line-file-extractor filename="data/import/common/products.jsonl" bulkSize="200" />

Output data

id

name

price

1

Product A

10.99

2

Product B

15.49

3

Product C

8.99

4

Product D

12.75

Error Handling

Error

Description

File Not Found

Throws an exception: “Can’t load file ‘<filename>’”

Invalid JSON Format

Throws an exception when a line contains invalid JSON.

Bulk Processing Issue

Ensures correct batch sizes based on bulkSize.

Performance Considerations

Consideration

Description

Use an appropriate `bulkSize`

Small sizes improve memory efficiency but increase processing time.

Ensure JSON Lines format

Each line must be a valid JSON object.

Stream Processing

Efficiently processes large .jsonl files without loading everything into memory.

Key Features

Feature

Description

Supports Bulk Extraction

Processes data in chunks.

Handles JSON Validation

Prevents errors due to invalid input.

Optimized for Large JSON Files

Uses line-by-line processing to reduce memory usage.

Pipeline Integration

Compatible with pipeline configurations and console commands.