Mercari AI Team Research Paper “Improving Visual Recommendation on E-commerce Platforms Using Vision-Language Models” Accepted for Presentation at RecSys 2025

Overview

Mercari, Inc. (“Mercari”) announces that a paper co-written by Mercari AI Team engineers has been accepted as an industry track spotlight paper at the ACM Conference on Recommender Systems (RecSys) 2025. The paper, titled “Improving Visual Recommendation on E-commerce Platforms Using Vision-Language Models,” is the work of Mercari engineers Yuki Yada, Sho Akiyama, Ryo Watanabe, Yuta Ueno, Yusuke Shido, and Andre Rusli.

RecSys is the world’s top conference in the field of recommender systems and brings together researchers from all over the world annually. RecSys 2025, the 19th edition of the conference, will be held September 22–26, 2025, in Prague, Czech Republic.

For this year’s conference, RecSys selected 55 papers for its industry track from among the 149 applications it received (acceptance rate: 36.7%). Furthermore, the Mercari AI Team’s paper was selected as one of 15 spotlight oral presentations, highlighting particularly outstanding research (top 10.1% of all selected papers).

Key points of presentation

The presentation will discuss a project that has improved the image AI model behind Mercari’s recommendations feature, which functions by searching for and aggregating items of similar appearance within the platform's listing pages.

To achieve the improvements, engineers trained an AI model using the extensive record of item image data that Mercari retains. This resulted in a substantial improvement to the e-commerce platform’s visual similarity-based recommendations feature.

Background of research

“I really like the design of this clothing item. I’d like to see more clothing designs like this.” “I wonder if Mercari sells items with my favorite cartoon character on it.” These are some of the things that Mercari hears from its users, indicating that there is a certain level of demand for a discovery experience grounded in the visual characteristics of items.

Mercari has previously released discovery features that leverage visual characteristics, including image searches and recommendations. However, the AI model working in the background of the feature had not been trained on Mercari’s item data, and as a result there was an issue where the model could not sufficiently ascertain the visual traits of items.

Summary of paper

The paper discusses an AI model with a superb understanding of both images and text known as a vision-language model, which was trained using the data of over one million items that Mercari retains.

As a result of building the AI model, the team has seen the following striking improvements:

Increased the click-through rate of the “visually similar items” feature by 50%
Increased the number of purchases made via item page recommendations by 14%
The AI model discussed in the presentation has also been used in the platform’s image search feature.

About Mercari’s Recommendation ML Team

The goal of the Recommendation Team is to enhance the user experience through recommendation features across the Mercari app, such as the home page and the item details page. In addition to the recommendation of similar items featured in this research paper, the team utilizes advanced AI technology to make various improvements to the app, including leveraging the two-tower model to create item recommendations on the app home page and using bandit algorithms to optimize the page layout.

About Mercari’s AI/LLM Team

The goal of the AI/LLM Team is to contribute to the improvement of the user experience through product improvements that leverage two cutting-edge AI technologies, LLMs and embedding. The team leads the development of features like image search and AI listing.

Post Share Bookmark