mercari AI

Blog

Mercari AI Team Research Paper “Visual Recommendation on E-commerce Platforms Via Fine-Tuned Vision-Language Models” Accepted for Presentation at MIRU 2025

Mercari, Inc. (“Mercari”) announces that a paper co-written by Mercari AI Team engineers has been accepted for presentation at the Meeting on Image Recognition and Understanding (MIRU) 2025​. The paper, titled “Visual Recommendation on E-commerce Platforms Via Fine-Tuned Vision-Language Models,” is the work of Mercari engineers Yuki Yada, Sho Akiyama, Ryo Watanabe, Yuta Ueno, Andre Rusli, and Yusuke Shido. Held annually, the MIRU​ is one of the largest symposiums on technology related to image recognition and understanding held in Japan. MIRU will hold its 28th meeting in Kyoto City, from July 29 to August 1, 2025. In 2025, MIRU selected 88 papers for presentation from among the 255 applications it received (acceptance rate: 34.5%).

The presentation is scheduled for Thursday, July 31, 2025, from 3:00 to 4:20 PM.

Key points of presentation

The presentation will discuss a project that has improved the image AI model behind Mercari’s recommendations feature, which functions based on items of similar appearance within the platform’s listings pages.

To achieve the improvements, engineers trained an AI model using the extensive record of item image data that Mercari retains. This resulted in a substantial improvement to the e-commerce platform’s visual similarity-based recommendations feature.

Background of research

“I really like the design of this clothing item. I’d like to see more clothing designs like this.” “I wonder if Mercari sells items with my favorite cartoon character on it.” These are some of the things that Mercari hears from its users, indicating that there is a certain level of demand for a discovery experience grounded in the image-based characteristics of items.

Mercari has previously released a discovery feature that leverages image-based characteristics including image searches and recommendations. However, the AI model working in the background of the feature had not been trained on Mercari’s item data, and as a result there was an issue where the model could not sufficiently ascertain the visual traits of items.

Summary of paper

The presentation will discuss an AI model with a superb understanding of both images and text known as a vision-language model, which was trained using the data of over one million items that Mercari retains.

As a result of building the AI model, the team has seen the following striking improvements:
Increased the click-through rate of the “visually similar items” feature by 50%
Increased the number of purchases made via item page recommendations by 14%
-- The AI model discussed in the presentation has also been used in the platform’s image search feature.

About Mercari’s AI/LLM Team

The goal of the AI/LLM Team is to contribute to the improvement of the user experience through product improvements that leverage two cutting-edge AI technologies, LLMs and embedding. The team leads the development of features like image search and AI listing.

Related Projects