Data File Cataloguer
Connect your Data File Cataloguer to Aiimi Insight Engine to make the most of the data.
Connecting Data File Cataloguer with Aiimi Insight Engine
Source System: Select Data File Cataloguer from the dropdown.
Initial Configuration Steps
Sources: Enter the sources the should be crawled for data files.
Sample Rows: Enter the number of rows that should be run as a test.
This is the number of sample rows that will be extracted from each data file.
Download folder: Enter the folder path that will store download files as they are being catalogued.
This location must be accessible by the machine running the crawl.
If left blank or NULL the system temp folder will be used.
Additional Settings
Attempt direct access: If checked this improves performance of files on a file share.
If direct access fails or cannot be used the file will be downloaded locally.
Limit deltas to new or deleted file only: If checked only new or deleted files will processed. Modified files will not update.
This improves performance but the accuracy of content can suffer.
Multipart parquet support: If checked it will process groups of parquet files in a folder as one file with multiple parts.
Each part must be name "part-(#)-tid-(guide)-(name).parquet" for this to work.
Approximate multipart parquet row count:
If checked it will count the first part of a multipart parquet and multiply that by the number of parts.