The classify step uses Aiimi’s clustering and classification models to classify documents using pre-trained models.
To use this step, you will need a ‘model set’ that has been trained and built for your documents. Creation of this is outside the scope of this document and can be found in the documentation for the classification framework.
A model set must consist of the following files (with the exact names). You should create a folder in the ‘models’ subfolder, which can be found in the root of the Python REST Service, and then place them inside. Here we have an ‘aiimi’ model set that has been put in the models subfolder.
By default, the classification will be stored in a metadata field call ‘classification’ so this will need to be created using Control Hub. Be sure to create it as a metadata field and not an entity and set it to be of type keyword. Depending on the version of Aiimi Insight Engine that you are running, this metadata field may already exist.
To invoke the classify step simply create a REST enrichment step in your pipeline and configure it to call the ‘classify’ endpoint. You will also need to pass a parameter to invoke the model set specific to your documents. To do this simply add ‘model_set=name’ to your configuration, for example:
classify?model_set=aiimi
Configuration for the step can be found in config/endpoints/classify.json
metadata_field_name – make sure the metadata field exists and is of type keyword
default_model_set – this will be used if you do not specify a model_set in your REST step configuration
number_of_models_to_cache – classification models are not thread-safe, so we build a set of models to use for inbound requests. Setting this to a larger number will mean you potentially get more work done faster, but you will also use more memory. There is little point setting this to a value higher than that of the REST step concurrency setting. Some empirical testing will help arrive at the ideal setting (Aiimi can advise on this).
The endpoint for this enrichment step is ‘classify’