Creating a Pipeline
Last updated
Last updated
An enrichment pipeline has one or more steps associated with one or more source.
Creating an enrichment pipeline:
Select Enrichment from New Configurations.
Provide a Configuration ID
Enter a Configuration Description
Select one or more sources from Indices
Drag and drop one or more steps that need to be completed.
Use the filters tab to limit what a pipeline processes.
File Extensions - Enter all of the file extensions you want to include. You can then choose if these are part of an allow or a deny list using the check box.
Do not include the period prefix (i.e. docx and not .docx).
File Size - Choose to limit the file sizes to a range.
Modified Date Ranges - Select a Modified Date Range to limit this.
Entities - Only include items with selected entities.
Querystring - An Elasticsearch query string to limit what is processed.
Ready Status - Item statues to processes.
For most scenarios 'false' will be fine.
File Actions - File actions to process (see Manage user guide).
Regenerate Thumbnails - Check to generate thumbnails for new and updated content.
Select the agent where you want the pipeline to run:
Set the schedule for the pipeline. Either select a CRON schedule, a timetable, run manually, or disable the pipeline.
Success Status - The status to set when an item is successfully enriched.
Error Status - The status to set when an item fails enrichment.
The defaults should be fine for the following:
Initial Buffer Capacity - Controls the amount of work to queue at the start of the pipeline.
Update Maximum Degree of Parallelism - Maximum number of messages that can be processed by the pipeline concurrently.
Update Bounded Capacity - Maximum number of messages that can be queued for updating Elasticsearch.
Scroll Windows - How long to keep the scroll windows in memory for.
Scroll Size - Number of items to retrieve per request.
Queue Rate - Number of items to queue per-minute
0 will leave the pipeline to calculate.
Rest Window - How long to wait before checking for new items to process.
If you are getting errors (perhaps you are using a step with an unreliable API), then consider using the following:
Enable Circuit Breaker - Stop the pipeline if errors occur.
File Exceptions Trigger Circuit Breaker - Stop the pipeline if file and indexing exceptions occur.