Creating a Pipeline

Last updated 2 years ago

An enrichment pipeline has one or more steps associated with one or more source.

Creating an enrichment pipeline:

Drag and drop one or more steps that need to be completed.

Use the filters tab to limit what a pipeline processes.

File Extensions - Enter all of the file extensions you want to include. You can then choose if these are part of an allow or a deny list using the check box.
- Do not include the period prefix (i.e. docx and not .docx).
File Size - Choose to limit the file sizes to a range.
Modified Date Ranges - Select a Modified Date Range to limit this.
Entities - Only include items with selected entities.
Querystring - An Elasticsearch query string to limit what is processed.
Ready Status - Item statues to processes.
- For most scenarios 'false' will be fine.
File Actions - File actions to process (see Manage user guide).
Regenerate Thumbnails - Check to generate thumbnails for new and updated content.

Select the agent where you want the pipeline to run:

Set the schedule for the pipeline. Either select a CRON schedule, a timetable, run manually, or disable the pipeline.

Success Status - The status to set when an item is successfully enriched.
Error Status - The status to set when an item fails enrichment.
The defaults should be fine for the following:
- Initial Buffer Capacity - Controls the amount of work to queue at the start of the pipeline.
- Update Maximum Degree of Parallelism - Maximum number of messages that can be processed by the pipeline concurrently.
- Update Bounded Capacity - Maximum number of messages that can be queued for updating Elasticsearch.
- Scroll Windows - How long to keep the scroll windows in memory for.
- Scroll Size - Number of items to retrieve per request.
- Queue Rate - Number of items to queue per-minute
  - 0 will leave the pipeline to calculate.
- Rest Window - How long to wait before checking for new items to process.
If you are getting errors (perhaps you are using a step with an unreliable API), then consider using the following:
- Enable Circuit Breaker - Stop the pipeline if errors occur.
- File Exceptions Trigger Circuit Breaker - Stop the pipeline if file and indexing exceptions occur.

Last updated 2 years ago

An enrichment pipeline has one or more steps associated with one or more source.

Creating an enrichment pipeline:

Drag and drop one or more steps that need to be completed.

Use the filters tab to limit what a pipeline processes.

File Extensions - Enter all of the file extensions you want to include. You can then choose if these are part of an allow or a deny list using the check box.
- Do not include the period prefix (i.e. docx and not .docx).
File Size - Choose to limit the file sizes to a range.
Modified Date Ranges - Select a Modified Date Range to limit this.
Entities - Only include items with selected entities.
Querystring - An Elasticsearch query string to limit what is processed.
Ready Status - Item statues to processes.
- For most scenarios 'false' will be fine.
File Actions - File actions to process (see Manage user guide).
Regenerate Thumbnails - Check to generate thumbnails for new and updated content.

Select the agent where you want the pipeline to run:

Set the schedule for the pipeline. Either select a CRON schedule, a timetable, run manually, or disable the pipeline.

Success Status - The status to set when an item is successfully enriched.
Error Status - The status to set when an item fails enrichment.
The defaults should be fine for the following:
- Initial Buffer Capacity - Controls the amount of work to queue at the start of the pipeline.
- Update Maximum Degree of Parallelism - Maximum number of messages that can be processed by the pipeline concurrently.
- Update Bounded Capacity - Maximum number of messages that can be queued for updating Elasticsearch.
- Scroll Windows - How long to keep the scroll windows in memory for.
- Scroll Size - Number of items to retrieve per request.
- Queue Rate - Number of items to queue per-minute
  - 0 will leave the pipeline to calculate.
- Rest Window - How long to wait before checking for new items to process.
If you are getting errors (perhaps you are using a step with an unreliable API), then consider using the following:
- Enable Circuit Breaker - Stop the pipeline if errors occur.
- File Exceptions Trigger Circuit Breaker - Stop the pipeline if file and indexing exceptions occur.