Data File Cataloguer

Connect a Data File Cataloguer to Aiimi Insight Engine and make the most of your data. Once you have selected a Source System type more detail will expand to customise this.

General Settings

  1. Enter the sources to crawl data file for.

  2. Check the number of Sample Rows that will be run.

    • This is the number of sample rows that will be extracted from each data file.

  3. Enter the Download Folder path.

    • This is where the downloaded files will be stored while being catalogued.

    • This location must be accessible by the machine running the crawl.

    • If left blank or NULL the system temp folder will be used.

Additional Settings

  1. Attempt direct access

    • Check if files are on a file share to improve performance.

    • If direct access fails or cannot be used the file will be downloaded locally.

  2. Limit deltas to new or deleted file only

    • If checked only new or deleted files will processed. Modified files will not update.

    • This improves performance but the accuracy of content can suffer.

  3. Multipart parquet support

    • If checked it will process groups of parquet files in a folder as one file with multiple parts.

    • Each part must be name "part-(#)-tid-(guide)-(name).parquet" for this to work.

  4. Approximate multipart parquet row count

    • If checked the row count will count the first part of a multipart parquet and multiply that by the number of parts.

Last updated