Objects from your S3 buckets can be used directly without having to be exported first. You can simply pass their URI as an input to ThanoSQL functions.

Prerequisites

Before you begin, ensure you have the following:

  • A basic understanding of SQL and AWS.
  • A valid AWS access key ID and secret access key.
  • Access to a ThanoSQL Workspace and an S3 object.
Make sure that you have set the access permissions of your object appropriately.
At the time of writing, S3 access to image and audio file types is supported, but that to video file types is not. Please keep this in mind when deciding where to implement S3 connections in your project.

Using a Credentials File

Overview

You have to keep a credentials file in the root folder of your workspace. You only need to set this up once, and then you can freely use S3 URIs in ThanoSQL functions just like any other paths or links. As long as as you have this file, no modifications are needed in your code and queries.

How to Set Up

Step 1: Prepare Your Environment

  1. Sign In: Log in to your ThanoSQL account.
  2. Choose a Workspace: Navigate to the dashboard and select a workspace you want to work in. For more information on workspaces, check out the workspace page.
  3. Start the Lab: Go to the Lab tab and wait for your Lab to load.

Step 2: Prepare Your Credentials

  1. Create File: In the root folder of your Lab, create a .aws folder if it does not exist already. Go to the .aws folder and create a file called credentials (without any extensions). You can either do this directly from a terminal using a Linux text editor or from the Lab UI by creating a file and then renaming it.
  2. Provide Credentials: Fill in the credentials file using the following format:
[default]
aws_access_key_id = YOUR_AWS_ACCESS_KEY_ID
aws_secret_access_key = YOUR_AWS_SECRET_ACCESS_KEY

Step 3: Test Run

You are all set! In order to check if your ThanoSQL Workspace-AWS S3 connection works smoothly, try running one of the examples provided below.

Example Usage

For more information on ThanoSQL functions, refer to the generate, embed, and predict pages.

Example 1: Classifying Audio

Here is an example of how to classify an audio file stored in S3, after credentials have been set:

SELECT 
    thanosql.predict(
        task := 'audio-classification',
        engine := 'huggingface',
        input := 's3://your-bucket/your-audio-file.mp3',
        model := 'superb/hubert-base-superb-ks',
        model_args := '{"device_map": "cpu"}'
    ) AS prediction

On execution, we get:

          prediction
-------------------------------
 {"label": "go", "score": NaN}

S3 URIs can also be stored in tables and used just like other objects and URLs. Suppose we have the following table, which contains a column with a mix of web URLs and S3 URIs:

ImageIdResourceLink
1https://unsplash.com/photos/bygTaBey1Xk
2s3://your-bucket/your-first-image-file.jpg
3https://unsplash.com/photos/gXSFnk2a9V4
4s3://your-bucket/your-second-image-file.png
5https://unsplash.com/photos/grg6-DNJuaU

If we want to calculate the embeddings of the images like we do in Use Case 2, we can use the following query:

SELECT
    imageid AS id,
    resourcelink AS link,
    thanosql.embed(
        engine := 'huggingface',           
        input := resourcelink,                
        model := 'openai/clip-vit-base-patch32',
        model_args := '{"device_map": "cpu"}'
    ) AS embedding                        
FROM
    your_image_table

On execution, we get:

| id |                     link                    |                       embedding                       |
|----|---------------------------------------------|-------------------------------------------------------|
|  1 | https://unsplash.com/photos/bygTaBey1Xk     | [0.16986924,0.16281517,0.21393582,-0.20016165, ...]   |
|  2 | s3://your-bucket/your-first-image-file.jpg  | [0.26234767,-0.22636145,0.1522088,-0.1634534, ...]    |
|  3 | https://unsplash.com/photos/gXSFnk2a9V4     | [0.03666824,0.049130432,0.04386332,-0.16477501, ...]  |
|  4 | s3://your-bucket/your-second-image-file.png | [0.051476493,-0.2522358,-0.5084932,0.14148024, ...]   |
|  5 | https://unsplash.com/photos/grg6-DNJuaU     | [0.12211017,0.055086724,-0.11496107,-0.15263805, ...] |