Hugging Face Fine-tune Models
The following action steps utilize Pimcore Fine-Tuning Service to fine-tune AI models based on data managed within Pimcore.
Supported tasks are image and text classifications. They rely on data filtered by
Filter Data Objects or
Filter Assets action steps and consist of two steps - preparing data and starting training.
It is recommended, to use the Cleanup Tmp Files
action step at the end to clean up temporary.
The training itself is then executed on a Pimcore Fine-Tuning Service instance and monitored by the start training action step. The Pimcore Fine-Tuning Service can be hosted on-premises or in a Hugging face space, details also see the Readme. It is recommended to use GPU instances for training, depending on your training data size and fine-tuned models, different sizes of GPUs will be necessary.
Once a training job is finished, the fine-tuned model is uploaded to Hugging face hub and can be used in other Copilot action steps to execute classifications to newly added data.
Image Classification
Image classification is based on tags assigned to the image assets.
Preparing Data
Data preparation includes following steps:
- Read filtered assets from job run context.
- Extracts classification from asset tags based on defined
parentTagPath
setting. Uses first leave tag it finds, assets with no corresponding classification tag are skipped. - Calculates thumbnails of assets based on configuration (default is 300px width JPG). It is beneficial to use the same thumbnail definition when utilizing the fine-tuned model for classification tasks.
- Packs all thumbnails with classification folder structure into a zip file named
huggingface-training-export/JOBRUN_ID.zip
in temp folder. - Adds zip file to clean up list for later clean up by the
Cleanup Tmp Files
action step.
Starting Training
Starting training includes following steps:
- Gets training file from job run context.
- Upload training file and starts training at configured Pimcore Fine-tuning Service instance.
- Waits for training to finish.
Settings for the training include:
project_name
: Project name - also used as name for resulting modelbase_url
: URL of the Pimcore fine-tuning serviceaccess_token
: Access token for Pimcore fine-tuning service. Needs to be the same token as theAUTHENTICATION_TOKEN
defined in the Pimcore Fine-tuning Service instance.source_model
: Model to be used as a base for fine-tuningepochs
: Number of epochs for traininglearning_rate
: Learning rate for training
Text Classification
Text classification is based on data fields assigned to the data objects. The value based on which classification should take place can be defined via a twig template.
Preparing Data
Data preparation includes following steps:
- Read filtered data objects from job run context.
- Extracts classification from data object using
target_field
setting. - Generates value based on
value_template
setting. - Packs all rows into a csv file named
huggingface-training-export/JOBRUN_ID.csv
in temp folder. - Adds csv file to clean up list for later clean up by the
Cleanup Tmp Files
action step.
Starting Training
Starting training includes following steps:
- Gets training file from job run context.
- Starts training at Pimcore fine-tuning service.
- Waits for training to finish.
Settings for the training include:
project_name
: Project name - also used as name for resulting modelbase_url
: URL of the Pimcore fine-tuning serviceaccess_token
: Access token for Pimcore fine-tuning service. Needs to be the same token as theAUTHENTICATION_TOKEN
defined in the Pimcore Fine-tuning Service instance.source_model
: Model to be used as a base for fine-tuningepochs
: Number of epochs for traininglearning_rate
: Learning rate for training
Sample Training Action Configuration
A typical training action for asset classification fine-tuning will consist of the following steps:
- Filter Assets
- Hugging Face Prepare Training Asset Classification
- (Optional) Hugging Face Start Hugging face space
- Hugging Face Start Training Asset Classification
- (Optional) Hugging Face Stop Hugging face space
- Cleanup Tmp Files