Creating a Pipeline

An enrichment pipeline has one or more steps associated with one or more source.

Creating an enrichment pipeline:

  • Select Enrichment from New Configurations.

General

  • Provide a Configuration ID

  • Enter a Configuration Description

  • Select one or more sources from Indices

Steps

Drag and drop one or more steps that need to be completed.

Filters

Use the filters tab to limit what a pipeline processes.

  • File Extensions - Enter all of the file extensions you want to include. You can then choose if these are part of an allow or a deny list using the check box.

    • Do not include the period prefix (i.e. docx and not .docx).

  • File Size - Choose to limit the file sizes to a range.

  • Modified Date Ranges - Select a Modified Date Range to limit this.

  • Entities - Only include items with selected entities.

  • Querystring - An Elasticsearch query string to limit what is processed.

  • Ready Status - Item statues to processes.

    • For most scenarios 'false' will be fine.

  • File Actions - File actions to process (see Manage user guide).

  • Regenerate Thumbnails - Check to generate thumbnails for new and updated content.

Agents

Select the agent where you want the pipeline to run:

Schedule

Set the schedule for the pipeline. Either select a CRON schedule, a timetable, run manually, or disable the pipeline.

Advanced

  • Success Status - The status to set when an item is successfully enriched.

  • Error Status - The status to set when an item fails enrichment.

  • The defaults should be fine for the following:

    • Initial Buffer Capacity - Controls the amount of work to queue at the start of the pipeline.

    • Update Maximum Degree of Parallelism - Maximum number of messages that can be processed by the pipeline concurrently.

    • Update Bounded Capacity - Maximum number of messages that can be queued for updating Elasticsearch.

    • Scroll Windows - How long to keep the scroll windows in memory for.

    • Scroll Size - Number of items to retrieve per request.

    • Queue Rate - Number of items to queue per-minute

      • 0 will leave the pipeline to calculate.

    • Rest Window - How long to wait before checking for new items to process.

  • If you are getting errors (perhaps you are using a step with an unreliable API), then consider using the following:

    • Enable Circuit Breaker - Stop the pipeline if errors occur.

    • File Exceptions Trigger Circuit Breaker - Stop the pipeline if file and indexing exceptions occur.

Last updated