JSON Line File Extractor¶
Extracts structured data from a JSON Lines (.jsonl) file and transforms it into rows for ETL processing.
Supports bulk processing and integration with pipeline configurations.
Configuration¶
Name |
Type |
Description |
|---|---|---|
|
Attribute |
Path to the JSON Lines file to be processed |
|
Attribute |
Number of rows extracted at once (default: 200) |
Example¶
Input data¶
Example JSON Lines file:
{"id": 1, "name": "Product A", "price": 10.99}
{"id": 2, "name": "Product B", "price": 15.49}
{"id": 3, "name": "Product B", "price": 8.99}
{"id": 4, "name": "Product B", "price": 12.75}
Pipeline configuration¶
<json-line-file-extractor filename="data/import/common/products.jsonl" bulkSize="200" />
Output data¶
id |
name |
price |
|---|---|---|
1 |
Product A |
10.99 |
2 |
Product B |
15.49 |
3 |
Product C |
8.99 |
4 |
Product D |
12.75 |
Error Handling¶
Error |
Description |
|---|---|
File Not Found |
Throws an exception: “Can’t load file ‘<filename>’” |
Invalid JSON Format |
Throws an exception when a line contains invalid JSON. |
Bulk Processing Issue |
Ensures correct batch sizes based on |
Performance Considerations¶
Consideration |
Description |
|---|---|
Use an appropriate `bulkSize` |
Small sizes improve memory efficiency but increase processing time. |
Ensure JSON Lines format |
Each line must be a valid JSON object. |
Stream Processing |
Efficiently processes large |
Key Features¶
Feature |
Description |
|---|---|
Supports Bulk Extraction |
Processes data in chunks. |
Handles JSON Validation |
Prevents errors due to invalid input. |
Optimized for Large JSON Files |
Uses line-by-line processing to reduce memory usage. |
Pipeline Integration |
Compatible with pipeline configurations and console commands. |