Data File Cataloguer

Connect your Data File Cataloguer to Aiimi Insight Engine to make the most of the data.

Sources: Enter the sources the should be crawled for data files.
Sample Rows: Enter the number of rows that should be run as a test.
- This is the number of sample rows that will be extracted from each data file.
Download folder: Enter the folder path that will store download files as they are being catalogued.
- This location must be accessible by the machine running the crawl.
- If left blank or NULL the system temp folder will be used.

Attempt direct access: If checked this improves performance of files on a file share.
- If direct access fails or cannot be used the file will be downloaded locally.
Limit deltas to new or deleted file only: If checked only new or deleted files will processed. Modified files will not update.
- This improves performance but the accuracy of content can suffer.
Multipart parquet support: If checked it will process groups of parquet files in a folder as one file with multiple parts.
- Each part must be name "part-(#)-tid-(guide)-(name).parquet" for this to work.
Approximate multipart parquet row count:
- If checked it will count the first part of a multipart parquet and multiply that by the number of parts.