# Query

The Query (aka, Ask) model empowers developers to query databases in plain English. At its core is a fine-tuned FLAN-T5-base model that translates user questions into optimized PostgreSQL queries. Retrieved results are combined with the original input to generate clear, conversational responses.

### **Architecture**

* **Base Model**: FLAN-T5-base (sequence-to-sequence transformer)
* **Fine-Tuning Objective**: Natural language → SQL translation
* **Tokenizer**: T5 tokenizer with custom prefix handling

### Training Pipeline

**Dataset Preparation**

* Curated pairs of questions and SQL statements
* Normalization: lowercase, remove commas, strip trailing punctuation

**Data Split**

* 90% training | 10% validation

**Input Formatting**

* Prefix: Translate the following text to PGSQL:

**Hyperparameters**

* Learning rate: 3e-5
* Batch size: 4 (×4 gradient-accumulation)
* Epochs: 10
* Weight decay: 0.01
* Scheduler: Cosine decay with 10% warmup
* Label smoothing: 0.1

**Evaluation**

* Metric: SacreBLEU on validation set

### Inference Workflow

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("AquilaX-AI/NL-PGSQL")
model = AutoModelForSeq2SeqLM.from_pretrained("AquilaX-AI/NL-PGSQL")

# Prepare and tokenize input
input_text = "Translate the following text to PGSQL: What is the total sales for last month?"
inputs = tokenizer(input_text, return_tensors="pt")

# Generate SQL query
outputs = model.generate(**inputs, max_length=256)
sql_query = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(sql_query)
```

1. **Preprocess: Lowercase, remove commas and trailing punctuation.**
2. **Prefix: Add task instruction.**
3. **Tokenize: Convert text to token IDs.**
4. **Generate: Produce SQL tokens via model.**
5. **Decode: Convert tokens back to string.**

### Integration Guide

**A: Installation**

```bash
pip install transformers torch
```

**B: Model & Tokenizer**

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("AquilaX-AI/NL-PGSQL")
model = AutoModelForSeq2SeqLM.from_pretrained("AquilaX-AI/NL-PGSQL")
```

**C: Query Execution**

1. Sanitize generated SQL
2. Execute securely against your PostgreSQL database

**D: Response Formatting**

Post-process database output into readable text

### Dataset

The training dataset comprises custom-curated pairs of natural language questions and SQL queries, designed to align with the target database schema. Its diversity enhances the model's applicability to varied use cases.

### Evaluation Metrics

Model performance is assessed using the SacreBLEU metric, which measures the similarity between generated and reference SQL queries. This metric ensures high accuracy and fluency in query generation.

#### API & Web Interface

* **AquilaX App**: Users can directly interact with the model via the AquilaX platform at[ https://aquilax.ai/app/home](https://aquilax.ai/app/home), enabling natural language query input and SQL output retrieval without infrastructure management.
* **API Access**: The API, available at[ https://developers.aquilax.ai/api-reference/genai/securitron](https://developers.aquilax.ai/api-reference/genai/securitron), supports programmatic integration for automation and scalability.

#### Considerations

* Optimized for a specific schema; may require adaptation for others.
* Implement error handling for unexpected inputs.

***

> Credit on Engineering team: [Suriya](https://www.linkedin.com/in/suriya-s-83b25524a) & [Pachaiappan](https://www.linkedin.com/in/pachaiappan/)
