Blog
mercari AI’s research “LLMs as an Interactive Database Interface for Designing Large Queries” accepted to HILDA Workshop in SIGMOD 2024
Overview
We are pleased to announce that the paper "LLMs as an Interactive Database Interface for Designing Large Queries" by Engineer Yilin Li and Deddy Jobson, of the Mercaril AI team has been accepted for HILDA Workshop of the international conference SIGMOD 2024 in the field of management of data.
SIGMOD 2024 is one of the most prestigious international conferences in the field of data management, held annually by researchers from around the world. This year's 49th, was held in Santiago, Chile, from June 9 to 14, 2024.
Key points of presentation
- Current Text2SQL tends to produce inexact queries when scaled to databases the size of what we have in Mercari.
- We build a system to incorporate human feedback to iteratively finetune SQL queries.
- We plan on leveraging knowledge graphs in the next iteration for better schema linking.
Background
In large companies, writing SQL queries can be a time-consuming process. The required data can be scattered across a plethora of tables, which constantly get updated, making it difficult for the layperson to create queries for one-shot data analytics tasks.
Summary of paper
To improve data democratization in Mercari, we implement a solution using LLMs for text 2 SQL. While a number of existing methods exist, they tend to suffer from errors that are hard to catch. For that reason, instead of treating Text2SQL as a one-shot generation process, we consider it to be an interactive problem with a human in the loop. We build a system that involves validation stages and human feedback to polish the query before returning it to the end user. Our method results in fewer errors and, therefore, more useful queries.
About Customer Understanding Team
The Customer Understanding team is dedicated to analyzing the behavior of customers of Mercari Group and proposing strategies to optimize their journey and lifetime value. We employ a variety of technologies like statistics, mathematical optimization, large language models, etc.