Introduction

Understanding customer sentiment through product reviews is crucial for businesses aiming to enhance their product offerings and customer satisfaction. This use case will guide you through the process of analyzing product reviews, classifying them into various emotional categories using natural language processing (NLP), and linking these reviews with sales data. By the end of this use case, you will be able to uncover insights about how different emotional categories of reviews correlate with product sales, helping you make data-driven decisions to improve your products and marketing strategies.

This tutorial can be executed both within ThanoSQL Lab and in a local Python/Jupyter environment. Whether you prefer to work directly within ThanoSQL Lab’s integrated environment or set up a local development environment on your machine, the instructions provided will guide you through the necessary steps.

If you want to try running the code from this use case, you can download the complete Jupyter notebook using this link: Download Jupyter Notebook. Alternatively, you can download it directly to your machine using the wget command below:

wget https://raw.githubusercontent.com/smartmind-team/mintlify-docs/main/examples/Review_Analytics.ipynb

To run the models in this tutorial, you will need the following tokens:

  • OpenAI Token: Required to access all the OpenAI-related tasks when using OpenAI as an engine. This token enables the use of OpenAI’s language models for various natural language processing tasks.
  • Huggingface Token: Required only to access gated models such as Mistral on the Huggingface platform. Gated models are those that have restricted access due to licensing or usage policies, and a token is necessary to authenticate and use these models. For more information, check this Huggingface documentation. Make sure to have these tokens ready before proceeding with the tutorial to ensure a smooth and uninterrupted workflow.

Dataset

We will be working with the following datasets:

  • Review Comments Table (review_comments): Contains textual reviews of products.
    • ProductId: Unique identifier for each product.
    • UserId: Unique identifier for each user.
    • Text: Text of the product review.
  • Review Comments Sentiment Table (review_comments_sentiment): Contains product reviews along with their sentiment classification.
    • ProductId: Unique identifier for each product.
    • UserId: Unique identifier for each user.
    • Text: Text of the product review.
    • Sentiment: Sentiment classification of the review.
  • Sales Data Table (review_sales): Contains sales data for each product.
    • ProductId: Unique identifier for each product.
    • Score: Sales score for the product.

Goals

  1. Classify product reviews into emotional categories.
  2. Link classified reviews with corresponding sales data.
  3. Generate actionable insights based on the analysis of linked data.

Displaying ThanoSQL Query Results in Jupyter Notebooks

The check_result function is designed to handle and display the results of a database query executed via the ThanoSQL client. It ensures that any errors are reported, and successful query results are displayed in a user-friendly format.

Note: This function is specifically designed to work in Jupyter notebook environments.

from IPython.display import display

def check_query_result(query_result):
    if query_result.error_result:
        print(query_result.error_result)
    else:
        if query_result.records is not None and len(query_result.records.data) > 0:
            df = query_result.records.to_df()
            display(df)
        else:
            print("Query executed successfully")

Procedure

  1. Import ThanoSQL Library:

    • Import the ThanoSQL library and create a client instance. This client will be used to interact with the ThanoSQL engine.
    from thanosql import ThanoSQL
    client = ThanoSQL(api_token="your_api_token", engine_url="engine_url")
    
  2. Upload Data to Tables:

    • Upload the review_sales table which contains sales data for each product.
    table = client.table.upload("review_sales", "sales_data.csv", if_exists='replace')
    table.get_records(limit=10).to_df()
    

    On execution, we get:

    • This step uploads the review_sales data to ThanoSQL and retrieves the first 10 records to confirm the upload.

    • Upload the review_comments table which contains textual reviews of products.

    table = client.table.upload("review_comments", "comments_data.csv", if_exists='replace')
    table.get_records(limit=10).to_df()
    

    On execution, we get:

    • This step uploads the review_comments data to ThanoSQL and retrieves the first 10 records to confirm the upload.

    • Upload the review_comments_sentiment table which contains product reviews along with their sentiment classification.

    table = client.table.upload("review_comments_sentiment", "comments_data_sentiment.csv", if_exists='replace')
    table.get_records(limit=10).to_df()
    

    On execution, we get:

    • This step uploads the review_comments_sentiment data to ThanoSQL and retrieves the first 10 records to confirm the upload.
  3. Classify Reviews and Aggregate Sales Data:

    • Predict sentiment for reviews using a pre-trained model and aggregate the data to link reviews with sales.

    • Predict Sentiment for Reviews:

      query_result = client.query.execute("""
          -- This query predicts the sentiment of product review comments using a specified model
          -- and retrieves the sentiment along with the product ID and text of the review.
          SELECT
              "ProductId",                   
              "Text",                        
              (
                  jsonb_array_elements_text(
                      (
                          -- Step 1: Predict sentiment for the review comment
                          thanosql.predict(
                              input := "Text",                      
                              model := 'bhadresh-savani/distilbert-base-uncased-emotion', 
                              model_args := '{"top_k": 1, "device_map": "cpu"}'      
                          )::jsonb                                 
                          -- Cast the result to JSONB
                      )
                  )::jsonb ->> 'label'                            
                  -- Extract the 'label' field from the JSONB array
              ) AS "Sentiment"                                     
              -- Alias the result as 'Sentiment'
          FROM
              review_comments                                      
          LIMIT 10;         
      """)
      check_query_result(query_result)
      

    On execution, we get:

    • Link Reviews with Sales Data:
      • Group reviews by ProductId and Sentiment to show the count of each sentiment and the corresponding sales data for each product.
      query_result = client.query.execute("""
          -- This query calculates the total quantity sold for each product and the count of each sentiment from reviews.
          -- It uses a model to predict the sentiment of review comments and joins this information with sales data.
          SELECT 
              s."ProductId",                             
              SUM(s."Score") AS total_quantity_sold,     
              r."Sentiment",                             
              COUNT(r."Sentiment") AS sentiment_count    
          FROM 
              review_sales s                             
          JOIN
              (
                  -- Subquery to predict sentiment for review comments
                  SELECT
                      "ProductId",                       
                      "Text",                            
                      (
                          jsonb_array_elements_text(
                              (
                                  thanosql.predict(
                                      input := "Text",    
                                      model := 'bhadresh-savani/distilbert-base-uncased-emotion', 
                                      model_args := '{"top_k": 1, "device_map": "cpu"}'
                                  )::jsonb                
                                  -- Cast the result to JSONB
                              )
                          )::jsonb ->> 'label'            
                          -- Extract the 'label' field from the JSONB array
                      ) AS "Sentiment"                  
                  FROM
                      review_comments                   
                  LIMIT 100                             
              ) r
          ON 
              s."ProductId" = r."ProductId"             
                  -- Join condition: match product IDs in sales and review comments
          GROUP BY 
              s."ProductId", r."Sentiment"                
              -- Group the results by product ID and sentiment
          ORDER BY 
              s."ProductId", sentiment_count DESC;        
              -- Order the results by product ID and sentiment count in descending order
          """)
      check_query_result(query_result)
      

    On execution, we get:

    • Analyze Positive Reviews and Sales Performance:
      • Show the relationship between sales and positive reviews (joy, love, surprise, neutral) for each product.
      query_result = client.query.execute("""
          -- This query identifies products with positive reviews and compares their sales performance against the average sales.
          WITH positive_reviews AS (
              -- Step 1: Calculate the count of positive reviews for each product
              SELECT 
                  "ProductId",                              
                  COUNT(*) AS positive_count                
              FROM 
                  (
                      -- Subquery to predict sentiment for review comments
                      SELECT
                          "ProductId",                      
                          "Text",                           
                          (
                              jsonb_array_elements_text(
                                  (
                                      thanosql.predict(
                                          input := "Text", 
                                          model := 'bhadresh-savani/distilbert-base-uncased-emotion',
                                          model_args := '{"top_k": 1, "device_map": "cpu"}' 
                                      )::jsonb             
                                          -- Cast the result to JSONB
                                  )
                              )::jsonb ->> 'label'          
                                  -- Extract the 'label' field from the JSONB array
                          ) AS "Sentiment"                  
                      FROM
                          review_comments                   
                      LIMIT 100                             
                  ) AS subquery
              WHERE 
                  "Sentiment" IN ('joy', 'love', 'surprise', 'neutral') 
                      -- Filter for positive sentiments
              GROUP BY 
                  "ProductId"                               
                      -- Group the results by product ID
          ),
          total_sales AS (
              -- Step 2: Calculate the total quantity sold for each product
              SELECT 
                  "ProductId",                              
                  SUM("Score") AS total_quantity_sold       
              FROM 
                  review_sales                              
              GROUP BY 
                  "ProductId"                               
          )
          SELECT 
              ts."ProductId",                               
              ts.total_quantity_sold,                       
              COALESCE(pr.positive_count, 0) AS positive_count, 
                  -- Count of positive reviews, 0 if no positive reviews
              CASE 
                  WHEN pr.positive_count IS NULL THEN 'No Positive Reviews' 
                      -- No positive reviews case
                  WHEN ts.total_quantity_sold > (
                      SELECT AVG(total_quantity_sold) 
                      FROM total_sales
                  ) THEN 'Above Average Sales'              
                      -- Above average sales case
                  ELSE 'Below Average Sales'                
                      -- Below average sales case
              END AS sales_performance                      
                  -- Categorized sales performance
          FROM 
              total_sales ts                                
          LEFT JOIN 
              positive_reviews pr                           
                  -- Left join with positive reviews data
          ON 
              ts."ProductId" = pr."ProductId"               
                  -- Join condition: match product IDs
          ORDER BY 
              ts.total_quantity_sold DESC;                  
                  -- Order the results by total quantity sold in descending order
      """)
      check_query_result(query_result)
      

    On execution, we get:

Conclusion

This use case has guided you through the process of using ThanoSQL to analyze product reviews and their impact on sales performance. By classifying reviews into emotional categories and linking them with sales data, you can uncover valuable insights that help in understanding customer sentiment and its correlation with product success. This analysis can inform your marketing strategies, product development, and customer service approaches, ultimately contributing to improved customer satisfaction and business growth.