Mercari AI’s research paper—”Textual Content Moderation in C2C Marketplace”—accepted to ACL2022 ECNLP Workshop
Earlier this year, the CRE ML Team's research paper on violation detection systems in chat messages was accepted to the ECNLP (European Community for
Neuro-Linguistic Programming), a workshop of the ACL (The Association for Computational Linguistics).
Textual Content Moderation in C2C Marketplace - ACL Anthology
The ACL is one of the industry’s most prestigious language processing conferences. ACL 2022, the 60th workshop of its kind, was held in Dublin, Ireland.
Key points of presentation
- Implemented a machine learning system to detect violations of the terms of service written in chat messages
- Improved violation detection accuracy while significantly reducing the number of manual checks by customer support
- Created a system that is easy to maintain and continuously improves detection accuracy
After an item is purchased on the Mercari marketplace, the seller and buyer can freely communicate with each other via a chat system. To facilitate smooth transactions, harassing behavior such as making defamatory or threatening comments, is prohibited in Mercari’s terms of service. However, it is difficult for customer support to visually check all such communications, and therefore Mercari uses a system that automatically screens for violations.
In the past, the system used heuristics such as keyword searches to detect violations, but the detection accuracy was limited. However, because trouble can easily follow communications between sellers and buyers, and there is a significant risk of overlooking a violation, it is necessary to ensure the reproducibility of violation detection (also known as recall). An additional issue was that the number of false positives remained high, even if a conversation had passed the heuristic screening, and therefore customer support representatives had to check huge volumes of conversations manually.
In this study, researchers implemented a system that used a machine learning model specialized for the aforementioned screening process.
The study reported that implementing the system reduced the number of false positives while ensuring reproducibility, resulting in a significant reduction in the number of conversations that needed to be checked manually.
Summary of paper
After a buyer purchases an item on the Mercari marketplace, the seller and buyer are able to freely send chat messages to each other. However, there are cases where the content of these messages makes sense in the context of the parties’ conversation that started on the separate public chat message board used prior to the item purchase. We therefore devised a machine learning model that makes inferences based on the context of the previous conversation, rather than estimating whether or not the text in isolation is in violation of Mercari’s terms of service.
The model itself is designed to be extremely lightweight, and can make inferences at a cost that is reasonable in relation to the vast number of chat messages exchanged on Mercari. In addition, the architecture allows each type of violation to be added, modified, or deleted with ease since the content of violations stipulated in the terms of service may be modified in the future to adapt to changing social norms. The moderation staff then checks the conversations that the machine learning system has screened. The decisions made at this time are added to the training data as true labels. In this way, a feedback loop is created within the violation detection system, which is expected to improve detection accuracy continuously.
In addition, Mercari confirms chat messages from the viewpoint of preventing fraud and trouble, in compliance with laws and regulations, and with the consent of each user.
About CRE ML Team
The CRE ML team operates several other machine learning services and provides technical support for the safety and security of our users.