Introduction

Managing and retrieving large image datasets is a common challenge for photographers, art galleries, and companies involved in poster production or exhibitions. This use case will guide you through the process of using ThanoSQL to handle and query large datasets of images and their metadata. By the end of this use case, you will be able to search for images based on both image and text inputs and retrieve detailed metadata about the images, such as dimensions and photographer information.

This tutorial can be executed both within ThanoSQL Lab and in a local Python/Jupyter environment. Whether you prefer to work directly within ThanoSQL Lab’s integrated environment or set up a local development environment on your machine, the instructions provided will guide you through the necessary steps.

If you want to try running the code from this use case, you can download the complete Jupyter notebook using this link: Download Jupyter Notebook. Alternatively, you can download it directly to your machine using the wget command below:

wget https://raw.githubusercontent.com/smartmind-team/mintlify-docs/main/examples/Image_Retrieval.ipynb

To run the models in this tutorial, you will need the following tokens:

  • OpenAI Token: Required to access all the OpenAI-related tasks when using OpenAI as an engine. This token enables the use of OpenAI’s language models for various natural language processing tasks.
  • Huggingface Token: Required only to access gated models such as Mistral on the Huggingface platform. Gated models are those that have restricted access due to licensing or usage policies, and a token is necessary to authenticate and use these models. For more information, check this Huggingface documentation. Make sure to have these tokens ready before proceeding with the tutorial to ensure a smooth and uninterrupted workflow.

Dataset

We will be working with two datasets:

Goals

  1. Obtain photo files and their metadata (e.g., photo dimensions, photographer names).
  2. Perform advanced queries on both structured and unstructured data using ThanoSQL to retrieve the desired metadata.

Install Required Libraries

This command installs the Pillow and matplotlib libraries, which are necessary for image processing and display.

!pip install pillow matplotlib

Displaying ThanoSQL Query Results in Jupyter Notebooks

The check_result function is designed to handle and display the results of a database query executed via the ThanoSQL client. It ensures that any errors are reported, and successful query results are displayed in a user-friendly format.

Note: This function is specifically designed to work in Jupyter notebook environments.

from IPython.display import display

def check_query_result(query_result):
    if query_result.error_result:
        print(query_result.error_result)
    else:
        if query_result.records is not None and len(query_result.records.data) > 0:
            df = query_result.records.to_df()
            display(df)
        else:
            print("Query executed successfully")

Displaying Images in Jupyter Notebooks

This function, show_images, displays images from URLs. It accepts either a single URL or a DataFrame containing multiple URLs. The function fetches each image using the URL, processes it using PIL, and then displays it using matplotlib.

import requests
from PIL import Image
from io import BytesIO
import matplotlib.pyplot as plt
import pandas as pd

def show_images(input, image_column_name=None):
    """
    Display images from URLs. The input can be either a single URL or a DataFrame containing multiple URLs.
    vs
    Args:
        input (str or DataFrame): Single image URL or DataFrame containing image URLs.
        image_column_name (str): The column name containing image URLs if input is a DataFrame.
    """
    # Check if the input is a DataFrame
    if isinstance(input, pd.DataFrame):
        df = input
        num_images = len(df)
        fig, axes = plt.subplots(1, num_images, figsize=(15, 5))
        
        if num_images == 1:
            axes = [axes]
        
        for idx, (image_url, ax) in enumerate(zip(df[image_column_name], axes)):
            response = requests.get(image_url)
            response.raise_for_status()
            image = Image.open(BytesIO(response.content))
            ax.imshow(image)
            ax.axis('off')
    
    # If the input is a single URL
    elif isinstance(input, str):
        fig, ax = plt.subplots(figsize=(5, 5))
        response = requests.get(input)
        response.raise_for_status()
        image = Image.open(BytesIO(response.content))
        ax.imshow(image)
        ax.axis('off')

    else:
        raise ValueError("Input must be either a DataFrame or a single image URL (str).")
    
    plt.tight_layout()
    plt.show()

Procedure

  1. Download Datasets:

    • First, we will load the datasets into pandas DataFrames and display their shapes to understand the structure and size of the data.
    import pandas as pd
    unsplash_embed_sample = pd.read_csv("unsplash_embed_sample.csv")
    unsplash_meta_sample = pd.read_csv("unsplash_meta_sample.csv")   
    
    print("unsplash_meta_sample:", unsplash_meta_sample.shape, "unsplash_embed_sample:", unsplash_embed_sample.shape)
    

    On execution, we get:

    • This step loads the datasets into pandas DataFrames and prints their shapes to give an overview of the data size and structure.
  2. Import ThanoSQL Library:

    • Next, we need to import the ThanoSQL library and create a client instance. This client will be used to interact with the ThanoSQL engine.

    You can find your API Token and Engine URL by following these steps:

    1. Go to your workspace’s settings page.
    2. Navigate to the “Developer” tab.
    3. Locate and copy your API Token and Engine URL.
    from thanosql import ThanoSQL
    client = ThanoSQL(api_token="your_api_token", engine_url="engine_url")
    
  3. Upload Data to Tables:

    • We upload the unsplash_embed_sample table to ThanoSQL. This table contains image paths and their corresponding embeddings.
    table = client.table.upload('unsplash_embed_sample', 'unsplash_embed_sample.csv', if_exists='replace')
    table.get_records(limit=10).to_df()
    

    On execution, we get:

    • This step uploads the unsplash_embed_sample data to ThanoSQL and retrieves the first 10 records to confirm the upload.

    • Similarly, we upload the unsplash_meta_sample table which contains detailed metadata about the images.

    table = client.table.upload('unsplash_meta_sample', 'unsplash_meta_sample.csv', if_exists='replace')
    table.get_records(limit=10).to_df()
    

    On execution, we get:

    • This step uploads the unsplash_meta_sample data to ThanoSQL and retrieves the first 10 records to confirm the upload.
  4. Create Embedding Column:

    • To store the embedding vectors, we need to create an embedding column in the unsplash_embed_sample table.
    query_result = client.query.execute(
        """
        CREATE EXTENSION IF NOT EXISTS vector;
    
        ALTER TABLE unsplash_embed_sample
        ADD COLUMN IF NOT EXISTS embedding vector(512)
        """
    )
    check_query_result(query_result)
    

    On execution, we get:

    • This step ensures that the vector extension is available and adds an embedding column to the unsplash_embed_sample table if it does not already exist.
  5. Embed Images:

    • Now, we calculate and update the embedding vectors for the images in the unsplash_embed_sample table using a pre-trained model.
    query_result = client.query.execute(
        """
        -- This query calculates embeddings for each photo URL using a specified model and saves it to the specified table
        WITH calculated_embeddings AS (
            -- Step 1: Calculate embeddings
            SELECT
                photo_url,
                thanosql.embed(
                    engine := 'huggingface',           
                    input := photo_url,                
                    model := 'openai/clip-vit-base-patch32',
                    model_args := '{"device_map": "cpu"}'
                ) AS embedding                        
                    -- Alias the result of the embedding function as 'embedding'
            FROM
                unsplash_embed_sample                
        )
        -- Step 2: Update the original table with the calculated embeddings
        UPDATE unsplash_embed_sample
        SET 
            embedding = calculated_embeddings.embedding 
                -- Set the 'embedding' column with the calculated embeddings
        FROM 
            calculated_embeddings                    
                -- Use the 'calculated_embeddings' as the source for the new values
        WHERE
            unsplash_embed_sample.photo_url = calculated_embeddings.photo_url;
                -- Ensure that each row in 'unsplash_embed_sample' is matched with the corresponding row in 'calculated_embeddings' 
                -- by matching the 'photo_url' column in both tables
        """
    )
    check_query_result(query_result)
    

    On execution, we get:

  6. Search by Image:

    • In this section, we’ll perform a similarity search using an example image. We will retrieve the embedding vector for an image of a coastal cliff and then use that vector to find similar images.
    • This calls the show_images function with a single image URL of a coastal cliff which will be used to search similar images from the sample dataset. It will display the image inline in the notebook.
    show_images('https://images.unsplash.com/photo-1506887419028-47afaf38c27e')
    

    On execution, we get:

    • This query retrieves the three most similar images to the example image based on their embeddings with the detailed metadata for the images.
    query_result = client.query.execute(
        """
        -- This query calculates the embedding for a specific image URL, finds the top 3 similar images in the database,
        -- and retrieves their metadata from the 'unsplash_meta_sample' table.
        WITH calculated_embeddings AS (
            -- Step 1: Calculate embedding for the specific image URL
            SELECT 
                thanosql.embed(
                    engine := 'huggingface',           
                    input := 'https://images.unsplash.com/photo-1506887419028-47afaf38c27e', 
                    model := 'openai/clip-vit-base-patch32',
                    model_args := '{"device_map": "cpu"}' 
                ) AS embedding                        
                    -- Alias the result of the embedding function as 'embedding'
        )
        -- Step 2: Select metadata of the top 3 similar images
        SELECT 
            photo_id,                      
            photo_image_url,               
            photo_submitted_at,            
            photo_width,                   
            photo_height,                  
            photographer_username,         
            photographer_first_name,       
            photographer_last_name,        
            exif_camera_make,              
            exif_camera_model,             
            exif_iso,                      
            exif_aperture_value,           
            exif_focal_length,             
            exif_exposure_time             
        FROM 
            unsplash_meta_sample
        WHERE 
            photo_id IN (
                -- Step 3: Retrieve the top 3 similar images based on embedding similarity
                SELECT 
                    photo_id
                FROM 
                    unsplash_embed_sample
                ORDER BY 
                    embedding <-> (
                        SELECT embedding FROM calculated_embeddings 
                            -- Calculate distance to the specific image's embedding
                    )
                LIMIT 3                    
                    -- Limit to the top 3 similar images
            );
        """
    )
    df = query_result.records.to_df()
    df
    

    On execution, we get:

    • We can now display the retrieved images using the metadata.
    show_images(df, 'photo_image_url')
    

    On execution, we get:

  7. Search by Text:

    • We can also perform a similarity search using a text description. Here, we search for the image that is most similar to the description “a photo of the desert”.
    query_result = client.query.execute(
        """
        -- This query calculates the embedding for a description of an image ('a photo of the desert'), finds the most similar image in the database,
        -- and retrieves its metadata from the 'unsplash_meta_sample' table.
        WITH calculated_embeddings AS (
            -- Step 1: Calculate embedding for the given description
            SELECT 
                thanosql.embed(
                    engine := 'huggingface',           
                    input := 'a photo of the desert',  
                    model := 'openai/clip-vit-base-patch32',
                    model_args := '{"device_map": "cpu"}' 
                ) AS embedding                        
                    -- Alias the result of the embedding function as 'embedding'
        )
        -- Step 2: Select metadata of the most similar image
        SELECT 
            photo_id,                      
            photo_image_url,               
            photo_submitted_at,            
            photo_width,                   
            photo_height,                  
            photographer_username,         
            photographer_first_name,       
            photographer_last_name,        
            exif_camera_make,              
            exif_camera_model,             
            exif_iso,                      
            exif_aperture_value,           
            exif_focal_length,             
            exif_exposure_time             
        FROM 
            unsplash_meta_sample
        WHERE 
            photo_id IN (
                -- Step 3: Retrieve the most similar image based on embedding similarity
                SELECT 
                    photo_id
                FROM 
                    unsplash_embed_sample
                ORDER BY 
                    embedding <-> (
                        SELECT embedding FROM calculated_embeddings 
                            -- Calculate distance to the embedding of the description
                    )
                LIMIT 1                    
                    -- Limit to the single most similar image
            );
        """
    )
    df = query_result.records.to_df()
    df
    

    On execution, we get:

    • Finally, we display the retrieved image.
    show_images(df, 'photo_image_url')
    

    On execution, we get:

    • This code fetches and displays the image using its URL from the metadata.

Conclusion

By following these steps, you will be able to efficiently manage and retrieve image metadata from large datasets using ThanoSQL. This capability is invaluable for various applications, including art exhibitions, poster production, and digital asset management. The ability to perform both image-based and text-based searches enables a flexible and powerful approach to handling visual data.