Data File Cataloguer

Connect a Data File Cataloguer to Aiimi Insight Engine and make the most of your data. Once you have selected a Source System type more detail will expand to customise this.

General Settings

Enter the sources to crawl data file for.
Check the number of Sample Rows that will be run.
- This is the number of sample rows that will be extracted from each data file.
Enter the Download Folder path.
- This is where the downloaded files will be stored while being catalogued.
- This location must be accessible by the machine running the crawl.
- If left blank or NULL the system temp folder will be used.

Additional Settings

Attempt direct access
- Check if files are on a file share to improve performance.
- If direct access fails or cannot be used the file will be downloaded locally.
Limit deltas to new or deleted file only
- If checked only new or deleted files will processed. Modified files will not update.
- This improves performance but the accuracy of content can suffer.
Multipart parquet support
- If checked it will process groups of parquet files in a folder as one file with multiple parts.
- Each part must be name "part-(#)-tid-(guide)-(name).parquet" for this to work.
Approximate multipart parquet row count
- If checked the row count will count the first part of a multipart parquet and multiply that by the number of parts.

PreviousContent Server NextDocument Store

Last updated 1 year ago