# Query

The Query (aka, Ask) model empowers developers to query databases in plain English. At its core is a fine-tuned FLAN-T5-base model that translates user questions into optimized PostgreSQL queries. Retrieved results are combined with the original input to generate clear, conversational responses.

### **Architecture**

* **Base Model**: FLAN-T5-base (sequence-to-sequence transformer)
* **Fine-Tuning Objective**: Natural language → SQL translation
* **Tokenizer**: T5 tokenizer with custom prefix handling

### Training Pipeline

**Dataset Preparation**

* Curated pairs of questions and SQL statements
* Normalization: lowercase, remove commas, strip trailing punctuation

**Data Split**

* 90% training | 10% validation

**Input Formatting**

* Prefix: Translate the following text to PGSQL:

**Hyperparameters**

* Learning rate: 3e-5
* Batch size: 4 (×4 gradient-accumulation)
* Epochs: 10
* Weight decay: 0.01
* Scheduler: Cosine decay with 10% warmup
* Label smoothing: 0.1

**Evaluation**

* Metric: SacreBLEU on validation set

### Inference Workflow

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("AquilaX-AI/NL-PGSQL")
model = AutoModelForSeq2SeqLM.from_pretrained("AquilaX-AI/NL-PGSQL")

# Prepare and tokenize input
input_text = "Translate the following text to PGSQL: What is the total sales for last month?"
inputs = tokenizer(input_text, return_tensors="pt")

# Generate SQL query
outputs = model.generate(**inputs, max_length=256)
sql_query = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sql_query)
```

1. **Preprocess: Lowercase, remove commas and trailing punctuation.**
2. **Prefix: Add task instruction.**
3. **Tokenize: Convert text to token IDs.**
4. **Generate: Produce SQL tokens via model.**
5. **Decode: Convert tokens back to string.**

### Integration Guide

**A: Installation**

```bash
pip install transformers torch
```

**B: Model & Tokenizer**

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("AquilaX-AI/NL-PGSQL")
model = AutoModelForSeq2SeqLM.from_pretrained("AquilaX-AI/NL-PGSQL")
```

**C: Query Execution**

1. Sanitize generated SQL
2. Execute securely against your PostgreSQL database

**D: Response Formatting**

Post-process database output into readable text

### Dataset

The training dataset comprises custom-curated pairs of natural language questions and SQL queries, designed to align with the target database schema. Its diversity enhances the model's applicability to varied use cases.

### Evaluation Metrics

Model performance is assessed using the SacreBLEU metric, which measures the similarity between generated and reference SQL queries. This metric ensures high accuracy and fluency in query generation.

#### API & Web Interface

* **AquilaX App**: Users can directly interact with the model via the AquilaX platform at[ https://aquilax.ai/app/home](https://aquilax.ai/app/home), enabling natural language query input and SQL output retrieval without infrastructure management.
* **API Access**: The API, available at[ https://developers.aquilax.ai/api-reference/genai/securitron](https://developers.aquilax.ai/api-reference/genai/securitron), supports programmatic integration for automation and scalability.

#### Considerations

* Optimized for a specific schema; may require adaptation for others.
* Implement error handling for unexpected inputs.

***

> Credit on Engineering team: [Suriya](https://www.linkedin.com/in/suriya-s-83b25524a) & [Pachaiappan](https://www.linkedin.com/in/pachaiappan/)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.aquilax.ai/ai-models/query.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
