Mercari AI Team Research Paper “Visual Recommendation on E-commerce Platforms Via Fine-Tuned Vision-Language Models” Accepted for Presentation at MIRU 2025

Mercari, Inc. (“Mercari”) announces that a paper co-written by Mercari AI Team engineers has been accepted for presentation at the Meeting on Image Recognition and Understanding (MIRU) 2025. The paper, titled “Visual Recommendation on E-commerce Platforms Via Fine-Tuned Vision-Language Models,” is the work of Mercari engineers Yuki Yada, Sho Akiyama, Ryo Watanabe, Yuta Ueno, Andre Rusli, and Yusuke Shido. Held annually, the MIRU is one of the largest symposiums on technology related to image recognition and understanding held in Japan. MIRU will hold its 28th meeting in Kyoto City, from July 29 to August 1, 2025. In 2025, MIRU selected 88 papers for presentation from among the 255 applications it received (acceptance rate: 34.5%).

The presentation is scheduled for Thursday, July 31, 2025, from 3:00 to 4:20 PM.

Key points of presentation

The presentation will discuss a project that has improved the image AI model behind Mercari’s recommendations feature, which functions based on items of similar appearance within the platform’s listings pages.

To achieve the improvements, engineers trained an AI model using the extensive record of item image data that Mercari retains. This resulted in a substantial improvement to the e-commerce platform’s visual similarity-based recommendations feature.

Background of research

“I really like the design of this clothing item. I’d like to see more clothing designs like this.” “I wonder if Mercari sells items with my favorite cartoon character on it.” These are some of the things that Mercari hears from its users, indicating that there is a certain level of demand for a discovery experience grounded in the image-based characteristics of items.

Mercari has previously released a discovery feature that leverages image-based characteristics including image searches and recommendations. However, the AI model working in the background of the feature had not been trained on Mercari’s item data, and as a result there was an issue where the model could not sufficiently ascertain the visual traits of items.

Summary of paper

The presentation will discuss an AI model with a superb understanding of both images and text known as a vision-language model, which was trained using the data of over one million items that Mercari retains.

As a result of building the AI model, the team has seen the following striking improvements:
Increased the click-through rate of the “visually similar items” feature by 50%
Increased the number of purchases made via item page recommendations by 14%
-- The AI model discussed in the presentation has also been used in the platform’s image search feature.

About Mercari’s AI/LLM Team

The goal of the AI/LLM Team is to contribute to the improvement of the user experience through product improvements that leverage two cutting-edge AI technologies, LLMs and embedding. The team leads the development of features like image search and AI listing.

Post Share Bookmark

Related Projects

AI Listing
C2C marketplaces face various challenges which differ from general e-commerce sites because customers buy and sell items from and to each other. On Mercari, sellers must handle the procedures for listing items themselves, such as taking photos, entering product descriptions, and setting prices. Because Mercari is a secondary market, no two items are exactly alike, including their conditions and prices. This makes it difficult for customers to research information and appropriate prices for the products they wish to list and complete the listing process on their own.
April 01, 2022