AICC Scenario - ThanoSQL

Introduction

In today’s data-driven world, effective customer service is a critical component of any successful business. Call centers, which handle customer inquiries and issues, generate vast amounts of data. This use case will guide you through analyzing call center data using ThanoSQL, a powerful tool for managing and querying data. We will classify call transcripts into categories using various language models (LLMs), and derive key performance metrics such as average call time, satisfaction score, and resolution rate for each category. This analysis can provide valuable insights into customer interactions, helping to improve service quality and overall customer satisfaction.

This tutorial can be executed both within ThanoSQL Lab and in a local Python/Jupyter environment. Whether you prefer to work directly within ThanoSQL Lab’s integrated environment or set up a local development environment on your machine, the instructions provided will guide you through the necessary steps.

If you want to try running the code from this use case, you can download the complete Jupyter notebook using this link: Download Jupyter Notebook. Alternatively, you can download it directly to your machine using the wget command below:

wget https://raw.githubusercontent.com/smartmind-team/mintlify-docs/main/examples/AICC_Scenario.ipynb

To run the models in this tutorial, you will need the following tokens:

OpenAI Token: Required to access all the OpenAI-related tasks when using OpenAI as an engine. This token enables the use of OpenAI’s language models for various natural language processing tasks.
Huggingface Token: Required only to access gated models such as Mistral on the Huggingface platform. Gated models are those that have restricted access due to licensing or usage policies, and a token is necessary to authenticate and use these models. For more information, check this Huggingface documentation. Make sure to have these tokens ready before proceeding with the tutorial to ensure a smooth and uninterrupted workflow.

Dataset

We will be working with the following datasets:

Counseling Staff Information Table (agents): Contains details about the call center agents.
- agent_id: Unique identifier for each agent.
- agent_name: Name of the agent.
Consultation Call Metadata Table (calls): Records metadata for each call.
- call_id: Unique identifier for each call.
- agent_id: Identifier linking the call to the agent.
- satisfaction_score: Customer satisfaction score for the call.
- call_duration: Duration of the call.
- resolution_rate: Rate at which the call issue was resolved.
Consultation Call Transcription Table (transcript): Contains the text of the conversations.
- call_id: Identifier linking the call to the transcription.
- conversation: Text of the conversation during the call.

Goals

Classify call transcripts into meaningful categories using LLMs.
Calculate and analyze the average call time, satisfaction score, and resolution rate for each category.

Displaying ThanoSQL Query Results in Jupyter Notebooks

The check_result function is designed to handle and display the results of a database query executed via the ThanoSQL client. It ensures that any errors are reported, and successful query results are displayed in a user-friendly format. Note: This function is specifically designed to work in Jupyter notebook environments.

from IPython.display import display

def check_query_result(query_result):
    if query_result.error_result:
        print(query_result.error_result)
    else:
        if query_result.records is not None and len(query_result.records.data) > 0:
            df = query_result.records.to_df()
            display(df)
        else:
            print("Query executed successfully")

Procedure

Import ThanoSQL Library:
- Import the ThanoSQL library and create a client instance. This client will be used to interact with the ThanoSQL engine.
```
from thanosql import ThanoSQL
client = ThanoSQL(api_token="your_api_token", engine_url="engine_url")
```
Upload Data to Tables:
- Upload the agents table which contains details about the call center agents.
```
table = client.table.upload('agents', 'agents.csv', if_exists='replace')
table.get_records(limit=10).to_df()
```
On execution, we get:
- This step uploads the agents data to ThanoSQL and retrieves the first 10 records to confirm the upload.
- Upload the calls table which records metadata for each call.
```
table = client.table.upload('calls', 'calls.csv', if_exists='replace')
table.get_records(limit=10).to_df()
```
On execution, we get:
- This step uploads the calls data to ThanoSQL and retrieves the first 10 records to confirm the upload.
- Upload the transcript table which contains the text of the conversations.
```
table = client.table.upload('transcript', 'transcript.csv', if_exists='replace')
table.get_records(limit=10).to_df()
```
On execution, we get:
- This step uploads the transcript data to ThanoSQL and retrieves the first 10 records to confirm the upload.

Classify Conversations and Aggregate Metrics:

Classify conversations using the Mistral LLM and calculate performance metrics.

Using Mistral LLM:

query_result = client.query.execute("""
    -- This query categorizes call transcripts using a Huggingface model and computes average metrics for each agent and category.

    -- Step 1: Cleanup resources before starting the main query
    SELECT thanosql.cleanup_resources();

    -- Step 2: Categorize call transcripts and compute average metrics
    SELECT 
        t.category,
        COUNT(*),
        AVG(c.satisfaction_score) AS avg_satisfaction_score,  
        AVG(c.call_duration) AS avg_call_duration,            
        AVG(c.resolution_rate) AS avg_resolution_rate         
    FROM 
        (
            -- Generate categories for call transcripts using a Huggingface model
            SELECT 
                t.call_id,                  
                thanosql.generate(
                    input := CONCAT(
                        '<s>[INST]You are a classification assistant. Your task is to analyze a conversation between an assistant and a human and determine the most appropriate category for the conversation <<<>>> into one of the following predefined categories: Technology, Refund, Shipping, ACCOUNT, CANCELLATION_FEE, CONTACT, DELIVERY, Feedback, Invoice, Newsletter, Order\n\nYou will only respond with one of the predefined categories. Reply with capital letters only. Do not start with category or categories. Do not provide explanations or notes. Do not change the line.[/INST]\n\n<<<\nConversation: ',
                        conversation,
                        '</s>\n>>>'
                    ),       
                    engine := 'thanosql', 
                    model := 'smartmind/Mistral-7B-Instruct-v0.2', 
                    model_args := '{"max_new_tokens": 7}' 
                ) AS category              
            FROM
                transcript AS t                
        ) AS t
    JOIN 
        calls AS c ON t.call_id = c.call_id 
    GROUP BY 
        t.category            
""")
check_query_result(query_result)

On execution, we get: Result 1

Classify conversations using the OpenAI GPT-4o and calculate performance metrics.

Using OpenAI GPT-4o:

query_result = client.query.execute("""
    -- This query categorizes call transcripts using a model from OpenAI, computes average metrics for each category.
  
    -- Step 1: Cleanup resources before starting the main query
    SELECT thanosql.cleanup_resources();

    -- Step 2: Categorize call transcripts and compute average metrics
    SELECT 
        t.category,
        COUNT(*),              
        AVG(c.satisfaction_score) AS avg_satisfaction_score,   
        AVG(c.call_duration) AS avg_call_duration,             
        AVG(c.resolution_rate) AS avg_resolution_rate          
    FROM 
        (
            -- Step 3: Generate categories for call transcripts using a model from OpenAI
            SELECT 
                t.call_id,                  
                thanosql.generate(
                    input := CONCAT(
                        '<s>[INST]You are a classification assistant. Your task is to analyze a conversation between an assistant and a human and determine the most appropriate category for the conversation <<<>>> into one of the following predefined categories: Technology, Refund, Shipping, ACCOUNT, CANCELLATION_FEE, CONTACT, DELIVERY, Feedback, Invoice, Newsletter, Order\n\nYou will only respond with the category. Do not include the word "Category". Do not provide explanations or notes. Do not change the line.[/INST]\n\n<<<\nConversation: ',
                        conversation,
                        '</s>\n>>>'
                    ),       
                    engine := 'openai',    
                    model := 'gpt-4o',     
                    token := 'your_openai_api_key', 
                    model_args := '{"temperature": 0}' 
                ) AS category             
            FROM
                transcript AS t                
        ) AS t
    JOIN 
        calls AS c ON t.call_id = c.call_id 
    GROUP BY 
        t.category
    ORDER BY
        t.category;
""")
check_query_result(query_result)

On execution, we get: Result 2

Conclusion

By following these steps, you can effectively analyze call center data, classify call transcripts into meaningful categories using natural language processing, and derive valuable insights into call durations, customer satisfaction, and resolution rates for each category. These insights can help you understand the performance of your call center agents and identify areas for improvement, ultimately leading to enhanced customer satisfaction and operational efficiency.

Use Cases

​Introduction

​Dataset

​Goals

​Displaying ThanoSQL Query Results in Jupyter Notebooks

​Procedure

​Conclusion

Introduction

Dataset

Goals

Displaying ThanoSQL Query Results in Jupyter Notebooks

Procedure

Conclusion