Training an AI for Text Classification

Aktualisiert am 13.6.2025

Tobias Bueck

Hello! I am Tobias Bueck, founder of Softoft. As a computer scientist, I share insights on my blog about software development, particularly the OTOBO ticket system, as well as automation tools like UIPath and Zapier.

This article describes how to train an Artificial Intelligence (AI) to classify tickets in a ticket system like OTOBO. This process involves data preparation, model training, and evaluation.

Requirements

Python 3.10+
Libraries: datasets, transformers[torch], psutil, gputil, nvidia_smi, huggingface_hub, nlpaug, nltk, sentencepiece

Install the required packages with:

pip install datasets transformers[torch] psutil gputil nvidia_smi huggingface_hub nlpaug nltk sentencepiece

or in a Jupyter Notebook:

!pip install datasets transformers[torch] psutil gputil nvidia_smi huggingface_hub nlpaug nltk sentencepiece

Step 1: Data Preparation

First, the ticket data must be prepared. This includes loading the data, cleaning, and preprocessing the text. For this tutorial, we use the following data:

Example Data

subject	body	priority	queue
Login Issue	Unable to login to the system	High	Software
Password Reset	Need to reset my password	Medium	Hardware
Email Problem	Not receiving emails	Low	Accounting
Network Down	Network is down in building 5	High	Software
Printer Issue	Printer not working	Medium	Hardware

We use subject and body as features, and priority and queue are the labels we want to predict.

Features and Labels

Feature 1	Feature 2	Label 1	Label 2
Login Issue	Unable to login to the system	High	Software
Password Reset	Need to reset my password	Medium	Hardware
Email Problem	Not receiving emails	Low	Accounting
Network Down	Network is down in building 5	High	Software
Printer Issue	Printer not working	Medium	Hardware

When using text sequence classification with BERT, we can only use one feature. Therefore, we combine subject and body. Since we want to give more weight to the subject, we concatenate the texts by inserting the subject twice and the body once.

import pandas as pd

# Example Data
data = {
    'subject': ["Login Issue", "Password Reset", "Email Problem", "Network Down", "Printer Issue"],
    'body': ["Unable to login to the system", "Need to reset my password", "Not receiving emails",
             "Network is down in building 5", "Printer not working"],
    'priority': ["High", "Medium", "Low", "High", "Medium"],
    'queue': ["Software", "Hardware", "Accounting", "Software", "Hardware"]
}

df = pd.DataFrame(data)

# Create combined feature
df['combined_feature'] = df.apply(lambda row: f"{row['subject']} {row['subject']} {row['body']}", axis=1)

print(df[['combined_feature', 'priority', 'queue']])

Transformed Table

Combined Feature	Label 1	Label 2
Login Issue Login Issue Unable to login to the system	High	Software
Password Reset Password Reset Need to reset my password	Medium	Hardware
Email Problem Email Problem Not receiving emails	Low	Accounting
Network Down Network Down Network is down in building 5	High	Software
Printer Issue Printer Issue Printer not working	Medium	Hardware

To train the model, we need to convert the labels into numbers. Here is the code to do this:

from sklearn.preprocessing import LabelEncoder

# Initialize Label Encoder
le_priority = LabelEncoder()
le_queue = LabelEncoder()

# Convert labels to numbers
df['priority_encoded'] = le_priority.fit_transform(df['priority'])
df['queue_encoded'] = le_queue.fit_transform(df['queue'])

print(df[['combined_feature', 'priority_encoded', 'queue_encoded']])

Result:

Combined Feature	priority_encoded	queue_encoded
Login Issue Login Issue Unable to login to the system	0	2
Password Reset Password Reset Need to reset my password	2	1
Email Problem Email Problem Not receiving emails	1	0
Network Down Network Down Network is down in building 5	0	2
Printer Issue Printer Issue Printer not working	2	1

Since we can only have one label for our classification, we now have two options.

Combine the two labels into one. This would result in priority_queue: HIGHSoftware, HIGHHardware, HIGHAccounting, etc. This would lead to PRODUCT[len(unique(label)) for label in labels], in our case len(unique(priorities)) * len(unique(queues)), which is 3 * 3 = 9.

Advantages:
- Simple implementation and management.
- One model for the entire classification.
Disadvantages:
- Increased complexity and size of the classification problem.
- Potentially worse performance with low data per combination.
Train a separate model for each label. In this tutorial, we use method 2. We have a separate model for each of Queue and Priority.

Code to Split the Table into Queue and Priority Table

# Split into Queue and Priority Tables
queue_df = df[['combined_feature', 'queue_encoded']]
priority_df = df[['combined_feature', 'priority_encoded']]

print(queue_df)
print(priority_df)

Table for Queue Model

Combined Feature	queue_encoded
Login Issue Login Issue Unable to login to the system	2
Password Reset Password Reset Need to reset my password	1
Email Problem Email Problem Not receiving emails	0
Network Down Network Down Network is down in building 5	2
Printer Issue Printer Issue Printer not working	1

Table for Priority Model

Combined Feature	priority_encoded
Login Issue Login Issue Unable to login to the system	0
Password Reset Password Reset Need to reset my password	2
Email Problem Email Problem Not receiving emails	1
Network Down Network Down Network is down in building 5	0
Printer Issue Printer Issue Printer not working	2

Tokenizer Explanation

A tokenizer converts text into smaller units called tokens. These tokens can be words, punctuation, or sentence components. Tokenizers are important because machine learning and NLP models require text in a form they can process. Through tokenization, models can analyze text and learn to recognize patterns.

Token Encoding

In token encoding, tokens are converted into numbers so they can be processed by machine learning models. Here is an example of what a tokenized and encoded text for our table might look like:

from transformers import BertTokenizer

# Initialize BERT Tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Tokenization and encoding of an example
example_text = df['combined_feature'][0]
tokens = tokenizer.tokenize(example_text)
encoded_tokens = tokenizer.convert_tokens_to_ids(tokens)
print(tokens)
print(encoded_tokens)

Example output for "Login Issue Login Issue Unable to login to the system":

Tokens:

['login', 'issue', 'login', 'issue', 'unable', 'to', 'login', 'to', 'the', 'system']

Encoded Tokens:

[2653, 3277, 2653, 3277, 3928, 2000, 2653, 2000, 1996, 2291]

Splitting Tables into Train and Test Dataset

To train and test our models, we split the data into training and test datasets. Here is the code to do this:

from sklearn.model_selection import train_test_split

# Split the Queue table into train and test datasets
queue_train, queue_test, y_queue_train, y_queue_test = train_test_split(queue_df['combined_feature'],
                                                                        queue_df['queue_encoded'], test_size=0.2,
                                                                        random_state=42)

# Split the Priority table into train and test datasets
priority_train, priority_test, y_priority_train, y_priority_test = train_test_split(priority_df['combined_feature'],
                                                                                    priority_df['priority_encoded'],
                                                                                    test_size=0.2, random_state=42)

print(queue_train, queue_test, y_queue_train, y_queue_test)
print(priority_train, priority_test, y_priority_train, y_priority_test)

By splitting data, we ensure that we have enough data to train and test our models, allowing us to evaluate their performance.

Step 2: Model Training

Model Training

In this article, we describe how to train the model with our training data. We use the transformers library from Hugging Face and torch for training BERT models.

BERT Model

BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model capable of capturing the context of words in a sentence. It is commonly used for tasks such as text classification, question answering, and many other NLP tasks.

Parameters for Training

batch_size: The number of examples processed in one pass through the model. Smaller batch sizes require less memory but result in more frequent updates of the model parameters.
epochs: The number of complete passes through the entire training dataset. More epochs can lead to a better model but risk overfitting.
learning_rate: The step size with which the model adjusts its parameters. Too high a learning rate can lead to unstable training processes, while too low a learning rate can result in slow learning.

Initializing the Model

We define a class TicketClassifier that initializes the model and training parameters.

from transformers import BertForSequenceClassification, BertTokenizer, Trainer, TrainingArguments


class TicketClassifier:
    def __init__(self, model_name: str):
        self.tokenizer = BertTokenizer.from_pretrained(model_name)
        self.model = BertForSequenceClassification.from_pretrained(model_name, num_labels=3)  # For 3 classes (Low, Medium, High)

    def train(self, train_data, train_labels):
        training_args = TrainingArguments(output_dir='./results')

        trainer = Trainer(model=self.model, args=training_args, train_dataset=train_data)
        trainer.train()
        return trainer


classifier = TicketClassifier(model_name='bert-base-uncased')

Training the Model

We use the prepared datasets queue_train, queue_test, priority_train, priority_test for training and evaluation.

# Training the Queue model
trainer_queue = classifier.train(queue_train, y_queue_train)

# Training the Priority model
trainer_priority = classifier.train(priority_train, y_priority_train)

Model Evaluation

After training, we evaluate the model with the test data.

# Evaluating the Queue model
eval_queue_results = trainer_queue.evaluate(eval_dataset=queue_test)
print(eval_queue_results)

# Evaluating the Priority model
eval_priority_results = trainer_priority.evaluate(eval_dataset=priority_test)
print(eval_priority_results)

Through these steps, we ensure that our models are well trained and evaluated to successfully solve the classification task.

After evaluating the models, we obtain various metrics that describe the performance of the model. One of the most important metrics is accuracy, which indicates how many of the predictions are correct.

Prediction Accuracy

Accuracy is calculated by dividing the number of correct predictions by the total number of predictions. Here is a Python code to calculate the accuracy:

from sklearn.metrics import accuracy_score

# Example predictions and actual labels
y_true = [0, 2, 1, 0, 2]  # Actual labels
y_pred = [0, 2, 1, 0, 1]  # Predicted labels

# Calculating accuracy
accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

Evaluation with Continuous Numbers

For continuous numbers like priority levels, it's important to consider the proximity of predictions. For example, the prediction 2 is closer to 1 than 3 is to 1. One way to evaluate this is by calculating the mean absolute error and the mean squared error.

Mean Absolute Error

The mean absolute error measures how far predictions are from the actual values. Here is a Python code to calculate the mean absolute error:

import numpy as np

# Example predictions and actual labels
y_true = np.array([0, 2, 1, 0, 2])  # Actual labels
y_pred = np.array([0, 2, 1, 0, 1])  # Predicted labels

# Calculating mean absolute error
mean_absolute_error = np.mean(np.abs(y_true - y_pred))
print(f"Mean Absolute Error: {mean_absolute_error}")

Mean Squared Error

The mean squared error measures the squared deviation of predictions from actual values, which weights larger errors more heavily. Here is a Python code to calculate the mean squared error:

# Calculating mean squared error
mean_squared_error = np.mean((y_true - y_pred) ** 2)
print(f"Mean Squared Error: {mean_squared_error}")

With these metrics, we can better understand and improve the performance of our models. Accuracy gives us an overall view of the performance, while mean absolute error and mean squared error for continuous numbers allow for a more detailed evaluation.

Summary

In this article, we demonstrated how to train an AI to classify tickets. By using Python and libraries like scikit-learn, we were able to create a simple yet effective model. This model can be further improved by using more complex algorithms and larger datasets.