Generate

The generate function in ThanoSQL is designed to generate text based on a given input using a pre-trained text generation model. This function leverages the capabilities of the HuggingFace Transformers library to provide efficient and high-quality text generation.

Syntax

SELECT 
    [sequential_column,] [partition_column,] column_name, ...
    thanosql.generate(
        engine := 'engine_name',
        input := [column_name | 'input_text'],
        model := 'model_name',
        model_args := 'model_args_in_json',
        token := 'auth_token',
        base_url := 'base_url' -- Only applicable when the engine is 'openai'
    ) AS generated_text
FROM 
    table_name

Parameters

Parameter	Type	Default	Description	Options
`engine`	string	`'huggingface'`	The engine to use for text generation.	`'huggingface'`: Uses models from HuggingFace. `'thanosql'`: Uses ThanoSQL’s native models. `'openai'`: Uses models from OpenAI.
`input`	string		The input text based on which the text generation will occur.	N/A
`model`	string		The name or path of the pre-trained text generation model.	Example: `'meta-llama/Meta-Llama-3-8B'`
`model_args`	json	`None`	JSON string representing additional arguments for the model.	Example: `'{"max_new_tokens": 50}'` Common Parameters: `max_new_tokens`, `temperature`, `top_p`, `top_k`
`token`	string	`None`	Token for authentication if required by the model.	N/A
`base_url`	string	`None`	Base URL to point the client to a different endpoint than the default OpenAI API endpoint. This is only applicable when the engine is `openai`.	N/A

Returns

str: The generated text based on the input.

Example Usage

The following examples are provided to help you become familiar with the ThanoSQL syntax. To try out these queries in real scenarios, please visit the Use Cases section for detailed tutorials and practical applications.

To run the models in this tutorial, you will need the following tokens:

OpenAI Token: Required to access all the OpenAI-related tasks when using OpenAI as an engine. This token enables the use of OpenAI’s language models for various natural language processing tasks.
Huggingface Token: Required only to access gated models such as Mistral on the Huggingface platform. Gated models are those that have restricted access due to licensing or usage policies, and a token is necessary to authenticate and use these models. For more information, check this Huggingface documentation. Make sure to have these tokens ready before proceeding with the tutorial to ensure a smooth and uninterrupted workflow.

Using Hugging Face Generation Models

Here is an example of how to use the generate function using Hugging Face LLM:

SELECT 
    thanosql.generate(
        input := prompt,
        engine := 'huggingface',
        model := 'smartmind/Mistral-7B-Instruct-v0.2',
        token := 'huggingface_token',
        model_args := '{"max_new_tokens": 7}'
    ) AS summary
FROM
    transcript

On execution, we get:

| summary                                                                                                                                 |
|-----------------------------------------------------------------------------------------------------------------------------------------|
| This call was in regards to a customer inquiry about the status of their recent order. The customer expressed frustration about delays. |
| The quick brown fox jumps over the lazy dog near the river bank...                                                                      |

Please note that you would need to create an access token from Hugging Face and usage grant from the gated models like Mistral and Llama in order to use those models.

Using ThanoSQL Generation Models

Here is an example of how to use the generate function using ThanoSQL LLM:

SELECT 
    thanosql.generate(
        input := issue_description,
        engine := 'thanosql',
        model := 'smartmind/ThanoSQL-AICC-llm',
        model_args := '{"max_new_tokens": 50}'
    ) AS troubleshooting_steps
FROM
    support_tickets

On execution, we get:

| troubleshooting_steps                                                                                                 |
|---------------------------------------------------------------------------------------------------------------------- |
| 1. Verify internet connection. 2. Clear browser cache. 3. Reset password. 4. Contact support if the issue persists.   |
| 1. Reboot the system. 2. Check for software updates. 3. Reinstall the application. 4. Reach out to technical support. |

Please note that you don’t need to input token here.

Using OpenAI Generation Models

Here are different ways to use the generate function with various input formats for the OpenAI LLM:

Example 1: Simple Text Input

SELECT 
    thanosql.generate(
        input := product_details,
        engine := 'openai',
        model := 'gpt-4o',
        token := 'openai_api_key',
        model_args := '{"temperature": 0.7, "max_tokens": 50}'
    ) AS promotional_content
FROM
    product_catalog

On execution, we get:

| promotional_content                                                                                                                 |
|-------------------------------------------------------------------------------------------------------------------------------------|
| Discover the amazing features of our new smartphone! Experience cutting-edge technology and unparalleled performance.               |
| Get ready for summer with our latest collection of stylish and comfortable outdoor furniture. Perfect for any patio or garden.      |

Example 2: JSON Array Input Using json_build_array & json_build_object

SELECT 
    thanosql.generate(
        engine := 'openai',
        input := json_build_array(
            json_build_object(
                'role', 'user',
                'content', json_build_array(
                    json_build_object(
                        'type', 'text',
                        'text', conversation
                    ),
                    json_build_object(
                        'type', 'image_url',
                        'image_url', json_build_object(
                            'url', image_url,
                            'detail', 'high'
                        )
                    )
                )
            ) 
        ),
        model := 'gpt-4o',
        token := 'openai_api_key',
        model_args := '{"temperature": 0}'
    ) AS generated_text
FROM 
    product_catalog

On execution, we get:

| generated_content                                                                                                                                                                                                                            |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| The product details include a high-quality image available at [Image URL: https://example.com/image.jpg]. The product is described as top-of-the-line, featuring state-of-the-art technology and design. Users are excited about its launch. |

Example 3: JSON Array Input as Text

SELECT 
    thanosql.generate(
        engine := 'openai',
        input := '[
            {"role": "user", "content": "How''s the weather today?"},
            {"role": "assistant", "content": "It is a bit cold."},
            {"role": "user", "content": "What would you like to do then?"}
        ]',
        model := 'gpt-4o',
        token := 'openai_api_key',
        model_args := '{"temperature": 0.7}'
    ) AS generated_text

On execution, we get:

| generated_text                                                                                                                 |
|--------------------------------------------------------------------------------------------------------------------------------|
| I'm just a computer program, so I don't have feelings, but I'm here and ready to help you with whatever you need!              |

Please note that you would need to create an API key from OpenAI. For more information, check out the official OpenAI documentation.

Using the OpenAI Client

Here is an example of how to use the generate function with the base URL using the OpenAI Client:

SELECT 
    thanosql.generate(
        engine := 'openai',
        input := 'Once upon a time in a faraway land',
        model := 'your_model_name',
        base_url := 'your_base_url', -- Only applicable when the engine is 'openai'
        token := 'your_token'
    ) AS generated_text

On execution, we get:

| generated_text                                                                |
|-------------------------------------------------------------------------------|
| Once upon a time in a faraway land, there lived a wise old king who...        |

Standalone Usage

Here is an example of how to use the generate function as a standalone query:

SELECT 
    thanosql.generate(
        engine := 'openai',
        input := 'Once upon a time in a faraway land',
        model := 'gpt-4o',
        model_args := '{"temperature": 0.7, "max_tokens": 50}',
        token := 'openai_api_key'
    ) AS generated_text

On execution, we get:

| generated_text                                                                                           |
|----------------------------------------------------------------------------------------------------------|
| Once upon a time in a faraway land, there lived a wise old owl who had seen many seasons come and go...  |

Model Restrictions

When using the generate function with the huggingface engine, ensure that only models compatible with the HuggingFace pipeline are used. Verify that the selected model is supported by the HuggingFace library to avoid compatibility issues. Even with compatible models, some models might still not work. We are actively working on improving compatibility and functionality to provide a better user experience. For more information, refer to the official Hugging Face documentation.

Best Practices

To optimize resource management in a limited environment, especially when loading and using different LLM models, it is advisable to use the cleanup_resources function after loading up the LLM model. This helps prevent models from being loaded in CPU memory, which is particularly useful in a workspace with limited resources.

SELECT * FROM thanosql.cleanup_resources();

This function clears all loaded resources, including models, tokenizers, and pipelines, ensuring that resources are properly freed after use.

Advanced Configuration

The generate function allows for advanced configuration through the model_args parameter, which accepts additional model arguments in JSON format. This enables fine-tuning of the text generation process to better suit specific needs.

Common Model Arguments

Here are some common parameters you can use with model_args:

Parameter	Type	Description
`max_tokens`(OpenAI), `max_new_tokens`(Hugging Face)	integer	Maximum number of tokens to generate.
`temperature`	float	Sampling temperature. Lower values make output more focused, higher values make it more random.
`top_p`	float	Nucleus sampling probability. Controls diversity by sampling from the top probability mass. Used as alternative to `tempearture`

Example Configurations

Generating Short, Focused Text To generate short and focused text, you might want to use lower values for max_tokens and temperature:

SELECT 
    thanosql.generate(
        input := prompt,
        engine := 'openai',
        model := 'gpt-4o',
        token := 'openai_api_key',
        model_args := '{"max_tokens": 10, "temperature": 0.3}'
    ) AS summary
FROM
    transcript

Generating Creative Content For more creative and diverse outputs, you can increase the temperature or use top_p instead:

SELECT 
    thanosql.generate(
        input := product_details,
        engine := 'openai',
        model := 'gpt-4o',
        token := 'openai_api_key',
        model_args := '{"max_tokens": 50, "temperature": 0.9}'
    ) AS promotional_content
FROM
    product_catalog

Controlling Output Length If you need to control the length of the generated text, adjust the max_tokens/max_new_tokens parameter accordingly:

SELECT
    thanosql.generate(
        input := article_intro,
        engine := 'huggingface',
        model := 'smartmind/Mistral-7B-Instruct-v0.2',
        model_args := '{"max_new_tokens": 150, "temperature": 0.7}'
    ) AS expanded_intro
FROM
    news_articles

Get Started

ThanoSQL Functions

Workspace

Syntax

Parameters

Returns

Example Usage

Using Hugging Face Generation Models

Using ThanoSQL Generation Models

Using OpenAI Generation Models

Example 1: Simple Text Input

Example 2: JSON Array Input Using json_build_array & json_build_object

Example 3: JSON Array Input as Text

Using the OpenAI Client

Standalone Usage

Model Restrictions

Best Practices

Advanced Configuration

Common Model Arguments

Example Configurations

Get Started

ThanoSQL Functions

Workspace

​Syntax

​Parameters

​Returns

​Example Usage

​Using Hugging Face Generation Models

​Using ThanoSQL Generation Models

​Using OpenAI Generation Models

​Example 1: Simple Text Input

​Example 2: JSON Array Input Using json_build_array & json_build_object

​Example 3: JSON Array Input as Text

​Using the OpenAI Client

​Standalone Usage

​Model Restrictions

​Best Practices

​Advanced Configuration

​Common Model Arguments

​Example Configurations

Syntax

Parameters

Returns

Example Usage

Using Hugging Face Generation Models

Using ThanoSQL Generation Models

Using OpenAI Generation Models

Example 1: Simple Text Input

Example 2: JSON Array Input Using json_build_array & json_build_object

Example 3: JSON Array Input as Text

Using the OpenAI Client

Standalone Usage

Model Restrictions

Best Practices

Advanced Configuration

Common Model Arguments

Example Configurations