XiYan-SQL promotional banner with "A Multi-Generator Ensemble Framework for Text-to-SQL" tagline on a purple-blue gradient background, featuring the Amity Solutions logo.
Generative AI
Boonyawee Sirimaya
4
min read
April 4, 2025

A Multi-Generator Ensemble Framework For Text-to-SQL

A MULTI-GENERATOR ENSEMBLE FRAMEWORK FOR TEXT-TO-SQL

XiYan-SQL is a framework designed to enhance the ability of large language models (LLMs) to translate natural language into SQL queries. Unlike traditional text-to-SQL models that rely on a single model, XiYan-SQL utilizes a multi-generator ensemble approach, combining multiple SQL generators to generate, refine, and select the most precise SQL query for a given input. By incorporating various techniques and essential components, XiYan-SQL achieves outstanding score across multiple benchmarks, outperforming other existing methods.

Traditional text-to-SQL models

Most traditional text-to-SQL models rely on a single-model approach, such as rule-based methods, sequence-to-sequence (Seq2Seq) architectures, transformer-based models, or large language model (LLM) prompting techniques. However, these approaches often attempt to handle all tasks within a single model, which can limit flexibility and accuracy in complex query generation.

XiYan-SQL workflow diagram showing Schema Linking, Candidate Generation, and Candidate Selection stages
XiYan-SQL workflow

Framework

Xi-Yan framework proposed a framework with 3 main components as follows:

1. Schema linking

It is built to connect natural language with the data stored in a database. It functions by interpreting the user's input in natural language and linking it to the corresponding data, ensuring that the selected information is closely aligned with the query or request.

2. Candidate Generation

This component includes SQL generation, which involves multiple generators, and a refiner that addresses and corrects logical or syntactical errors in the SQL queries. The generator generate potential SQL queries, while the refiner ensures the accuracy and correctness of the generated queries by fixing errors related to logic or syntax.

3. Candidate selection

The SQL queries generated in the candidate generation step, after refinement, will be evaluated by a judgment model. This model selects the most accurate and relevant queries that align with the user's request. The selection process aims to ensure that the chosen SQL queries are both logical and correct in response to the user's input query.

Advantages of XIYAN-SQL

1. Increased Accuracy & Reduced Errors

Unlike traditional text-to-SQL models that rely on a single model, XiYan-SQL utilizes a multi-generator ensemble approach, incorporating fine-tuned models and in-context learning (ICT SQL Generator) to generate multiple query candidates. This approach increases the chances of producing an accurate SQL query while reducing execution errors. By selecting the best query, XiYan-SQL enhances precision and ensures more reliable execution compared to single-model methods.

2. Minimal Effort for New Database Schemas

The M-Schema representation and schema linking module help adapt to different database schemas, making the model more flexible across various structures.

3. Using it as a Framework for Adapting Other Text-to-SQL Models

XiYan can serve as a framework for adapting other tasks, including fine-tuning or implementing text-to-SQL models. Key concepts such as schema linking, M-Schema, and multi-generator selection can be applied to improve model performance during fine-tuning. Additionally, the ICT SQL Generator provides a prompt-based approach that optimizes query generation without requiring fine-tuning, offering valuable insights for enhancing other text-to-SQL models.

Key components

M-schema

Xiyan-SQL introduces M-schema as a hierarchical relationship of data within a database. The strength of M-schema is that it provides a more detailed description of columns and introduces a new format for displaying values. Based on the results, it was found that M-schema boosts performance compared to DDL schema and performs similarly to MAC-SQL, with results that are occasionally higher or lower, but generally close, as demonstrated in experiments with 4 LLMs as the NL2SQL generator in end-to-end SQL generation, as shown in the following table.

Comparison table of DDL Schema, MAC-SQL Schema, and M-Schema representations
Example of the M-schema
Performance comparison table of different schema representations across multiple models
M-schema performance for end-to-end SQL generation

Schema linking

As mentioned above, schema linking refers to the mapping of the natural language (NL) query to the database schema. It plays a crucial role in the framework, ensuring that the data is retrieved correctly and aligns with the natural language input. This ensures that the SQL generated in the next step uses the correct data relevant to the input. Schema linking consists of two steps:

Retrieval Module – This step aims to retrieve the top-k columns and values from the database. The column retriever uses semantic similarity, while the values retriever combines both semantic similarity and Locality-Sensitive Hashing (LSH). To search for columns and values, keywords are extracted and identified from the natural language input by prompting the model with a few-shot example technique. A set of data (columns and values) is then retrieved in this step.

Column Selector – After retrieving the data in the previous step, it is formatted in the M-schema format with the help of a large language model (LLM). Then, by prompting the model with the few-shot example technique, it selects only the relevant and necessary data to generate the SQL query for the natural language input, minimizing the data schema.

Candidate Generation

Candidate generation incorporates multiple generators to produce diverse SQL outputs. It comprises two primary processes. The first is the SQL generator, responsible for generating SQL candidates using two techniques: Fine-tuned models and In-Context Learning. The second process is the Refiner, which enhances, corrects, or resolves errors in the generated SQL queries.

1. SQL Generator
  • Fine-tuned SQL Generator – This method follows a two-phase fine-tuning approach:
    • Basic Syntax Training: This step aims to fine-tune the pre-trained model to understand fundamental SQL syntax and patterns.
    • Generation-Enhanced Training: This step improves the model’s ability to understand different contexts by training it on multi-task data and syntactic preference data. The syntactic preference data is generated using various LLMs to create syntactic variations while maintaining the original meaning.
  • In-Context Learning SQL Generator - It utilizes few-shot prompting to generate SQL queries, allowing up to five few-shot examples per question. Before embedding, NLTK’s tools extract entities from the input question. These identified entities are then replaced with special tokens, such as <surname>, to minimize the model’s reliance on specific entity values. Once modified, the questions are embedded, and the top-K most semantically similar examples are selected based on the input query.
2. Refiner

This step ensures that the generated queries are accurate and functional by utilizing a prompting approach to analyze SQL candidates. It checks whether the generated SQL aligns with the given question and identifies any errors. If issues are found, the model can correct, optimize, or regenerate the SQL candidate as needed.

Candidate Selection

After a set of candidates is generated, this step aims to select the best candidate for the input question. The model used as the evaluator in this step is fine-tuned to evaluate all SQL query candidates based on the schema, conditions, and the question, returning the most reliable and well-aligned candidate that best matches the input question.

Results

XiYan-SQL demonstrates accuracy based on the approach that integrates each component into the framework, as tested on the Bird benchmark dataset. It highlights the performance of the framework with and without each component. The complete version of the framework, which assembles all components, achieves a performance of 73.34% as shown in the following table.

Performance table of XiYan-SQL with different component configurations
Performance of XiYan-SQL on the Bird benchmark