Data File Cataloguer

Connect your Data File Cataloguer to Aiimi Insight Engine to make the most of the data.

Connecting Data File Cataloguer with Aiimi Insight Engine

  1. Source System: Select Data File Cataloguer from the dropdown.

Initial Configuration Steps

  1. Sources: Enter the sources the should be crawled for data files.

  2. Sample Rows: Enter the number of rows that should be run as a test.

    • This is the number of sample rows that will be extracted from each data file.

  3. Download folder: Enter the folder path that will store download files as they are being catalogued.

    • This location must be accessible by the machine running the crawl.

    • If left blank or NULL the system temp folder will be used.

Additional Settings

  1. Attempt direct access: If checked this improves performance of files on a file share.

    • If direct access fails or cannot be used the file will be downloaded locally.

  2. Limit deltas to new or deleted file only: If checked only new or deleted files will processed. Modified files will not update.

    • This improves performance but the accuracy of content can suffer.

  3. Multipart parquet support: If checked it will process groups of parquet files in a folder as one file with multiple parts.

    • Each part must be name "part-(#)-tid-(guide)-(name).parquet" for this to work.

  4. Approximate multipart parquet row count:

    • If checked it will count the first part of a multipart parquet and multiply that by the number of parts.