Security Assistant

AI model designed to simplify the identification and remediation of security vulnerabilities within codebases

The Security Assistant is an AI model designed to simplify the identification and remediation of security vulnerabilities within codebases. By integrating seamlessly with the Aquilax scanning system, it provides developers and security analysts with clear, actionable insights to mitigate risks efficiently.

Key Features

  • Remediation Details: Explains why a vulnerability is a concern.

  • Fix & Code: Offers precise code recommendations or configuration changes.

  • More Info: Highlights potential impacts and risks if left unaddressed.

Model Details

  • Name: Security Assistant

  • Architecture: Qwen2.5-Coder

  • Parameters: 0.5 billion

  • Purpose: Explain and resolve security vulnerabilities in codebases

  • Integration: Processes AquilaX vulnerability reports

  • Input Format: Structured JSON containing fields such as cwe_id, cwe_name, affected_line, partial_code, file_name, status, reason, and remediation_action.

  • Output Format: Natural-language responses addressing:

    • Why this is a security risk.

    • How to fix it (code or configuration).

    • Consequences of inaction.

  • Fine-Tuning Technique: Unsloth + LoRA on a custom dataset of prompts and expert-crafted responses.

  • Deployment: Hosted on Hugging Face under AquilaX-AI/security_assistant.

Functionality

  1. Explain Vulnerabilities: Clarifies root causes and technical impacts (e.g., CWE-20: Improper Input Validation).

  2. Provide Remediation: Suggests practical fixes, such as input validation patterns or dependency updates.

  3. Highlight Risks: Details security, compliance, and operational consequences if issues persist.

The assistant supports a broad range of Common Weakness Enumeration (CWE) categories, including but not limited to input validation, authentication failures, and insecure configurations.

Training Process

  • Environment Setup: GPU-accelerated instance with PyTorch, Unsloth, and CUDA.

  • Model Configuration:

    • 4-bit quantization

    • Sequence length: 2,048 tokens

    • LoRA applied to transformer projections (q_proj, k_proj, v_proj, out_proj, etc.)

    • LoRA hyperparameters: rank=256, alpha=64

  • Dataset: Shuffled JSON records of vulnerability reports paired with expert responses.

  • Training Loop:

    • Trainer: SFTTrainer (TRL)

    • Batch size: 4

    • Gradient accumulation: 8 steps

    • Optimizer: AdamW (8-bit)

    • Learning rate: 2e-4

    • Epochs: 3

    • Strategy: Response-only fine-tuning

  • Resource Management: Continuous GPU memory monitoring and automated instance shutdown post-training.

  • Model Publishing: Deployed to the Hugging Face Hub for public access.

Inference Workflow

  1. Model Initialization: Load the AquilaX-AI/security_assistant checkpoint and tokenizer; enable GPU if available.

  2. Prompt Structure:

<|im_start|>system
You are Securitron, a helpful AI assistant.
<|im_end|>
<|im_start|>user
{<structured JSON report>} + question
<|im_end|>
  1. Response Generation: Stream output up to 1,024 tokens using TextStreamer.

  2. Performance: Optimized for CPU inference with sub-second response times; faster on GPU.

This setup ensures fast, accurate, and accessible responses across hardware configurations.

Example Usage

Input Prompt:

<|im_start|>system
You are a helpful AI assistant named Securitron<|im_end|>
<|im_start|>user
cwe_id:CWE-20
cwe_name:Improper Input Validation
affected_line:Pattern Undefined (v3)
partial_code:example: c4d5ea2f-81a2-4a05-bcd3-202126ae21df
        name:
          type: string
          example: Toolbox
        serial:
file_name:itemit_openapi.yaml
status:True Positive
reason: There is no pattern property that could lead to insufficient input validation.
remediation_action: Always define a pattern to ensure strict input validation.
How to fix this?<|im_end>

Model Response: To address CWE-20 (Improper Input Validation) in itemit_openapi.yaml, add a pattern property to the name and serial fields:

name:
  type: string
  example: Toolbox
  pattern: ^[a-zA-Z0-9\s]{1,50}$
serial:
  type: string
  pattern: ^[a-f0-9\-]{36}$
  • The name pattern allows alphanumeric characters and spaces, up to 50 characters.

  • The serial pattern enforces a UUID format.

  • Test the schema to ensure invalid inputs are rejected.

This validates inputs strictly, mitigating risks of malicious data.

Usage Guidelines

Limitations:

  • Depends on the accuracy of input reports; it does not perform code scanning itself.

  • May require human review for complex or context-specific cases.

Future Roadmap

  • Expand coverage to additional CWE categories.

  • Integrate real-time static and dynamic code analysis.

  • Offer multilingual support.

  • Reduce inference latency for resource-constrained environments.

Support & Contact

For support or updates, contact the AquilaX team or visit the model’s Hugging Face repository (AquilaX-AI/security_assistant).


Credit on Engineering team: Suriya & Pachaiappan

Last updated

Was this helpful?