Essential Virtual Try-On Research Papers For Machine Learning Engineers

Kailash Ahirwar
Tryon Labs
Published in
8 min readFeb 21, 2024

--

Virtual Try-On is revolutionizing the way customers are shopping online. It allows customers to experience products virtually before making a purchase, enhancing their confidence and satisfaction with their online shopping choices.

In this article, I’ll share the top 6 research papers to help you with your Virtual Try-On AI research and development.

Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person

Abstract

Outfit Anyone is a Virtual Try-On AI model from Humanaigc, designed for virtual clothing outfitting. The system enables ultra-high quality virtual try-on for any clothing and any person It has also been mentioned in the context of integrating with other AI technologies to dress and animate models…

Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

Abstract

As online shopping is growing, the ability for buyers to virtually visualize products in their settings-a phenomenon we define as “Virtual Try-All”-has become crucial. Recent diffusion models inherently contain a world model, rendering them suitable for this task within an inpainting context. However, traditional image-conditioned diffusion models often fail to capture the fine-grained details of products. In contrast, personalization-driven models such as DreamPaint are good at preserving the item’s details but they are not optimized for real-time applications. We present “Diffuse to Choose,” a novel diffusion-based image-conditioned inpainting model that efficiently balances fast inference with the retention of high-fidelity details in a given reference item while ensuring accurate semantic manipulations in the given scene content. Our approach is based on incorporating fine-grained features from the reference image directly into the latent feature maps of the main diffusion model, alongside with a perceptual loss to further preserve the reference item’s details. We conduct extensive testing on both in-house and publicly available datasets, and show that Diffuse to Choose is superior to existing zero-shot diffusion inpainting methods as well as few-shot diffusion personalization algorithms like DreamPaint…

https://diffuse2choose.github.io/static/videos/Diffuse_to_Choose_DemoReel.mp4

TryOnDiffusion: A Tale of Two UNets

Abstract

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively…

StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On

Abstract

Given a clothing image and a person image, an image-based virtual try-on aims to generate a customized image that appears natural and accurately reflects the characteristics of the clothing image. In this work, we aim to expand the applicability of the pre-trained diffusion model so that it can be utilized independently for the virtual try-on task. The main challenge is to preserve the clothing details while effectively utilizing the robust generative capability of the pre-trained model. In order to tackle these issues, we propose StableVITON, learning the semantic correspondence between the clothing and the human body within the latent space of the pre-trained diffusion model in an end-to-end manner. Our proposed zero cross-attention blocks not only preserve the clothing details by learning the semantic correspondence but also generate high-fidelity images by utilizing the inherent knowledge of the pre-trained model in the warping process. Through our proposed novel attention total variation loss and applying augmentation, we achieve the sharp attention map, resulting in a more precise representation of clothing details. StableVITON outperforms the baselines in qualitative and quantitative evaluation, showing promising quality in arbitrary person images…

High-Resolution Virtual Try-On with Misalignment
and Occlusion-Handled Conditions

Abstract

Image-based virtual try-on aims to synthesize an image of a person wearing a given clothing item. To solve the task, the existing methods warp the clothing item to fit the person’s body and generate the segmentation map of the person wearing the item, before fusing the item with the person. However, when the warping and the segmentation generation stages operate individually without information exchange, the misalignment between the warped clothes and the segmentation map occurs, which leads to the artifacts in the final image. The information disconnection also causes excessive warping near the clothing regions occluded by the body parts, so called pixel-squeezing artifacts. To settle the issues, we propose a novel try-on condition generator as a unified module of the two stages (i.e., warping and segmentation generation stages). A newly proposed feature fusion block in the condition generator implements the information exchange, and the condition generator does not create any misalignment or pixel-squeezing artifacts. We also introduce discriminator rejection that filters out the incorrect segmentation map predictions and assures the performance of virtual try-on frameworks. Experiments on a high-resolution dataset demonstrate that our model successfully handles the misalignment and the occlusion, and significantly outperforms the baselines.

Taming the Power of Diffusion Models for High-Quality Virtual Try-On with Appearance Flow

Abstract

Virtual try-on is a critical image synthesis task that aims to transfer clothes from one image to another while preserving the details of both humans and clothes. While many existing methods rely on Generative Adversarial Networks (GANs) to achieve this, flaws can still occur, particularly at high resolutions. Recently, the diffusion model has emerged as a promising alternative for generating high-quality images in various applications. However, simply using clothes as a condition for guiding the diffusion model to inpaint is insufficient to maintain the details of the clothes. To overcome this challenge, we propose an exemplar-based inpainting approach that leverages a warping module to guide the diffusion model’s generation effectively. The warping module performs initial processing on the clothes, which helps to preserve the local details of the clothes. We then combine the warped clothes with clothes-agnostic person image and add noise as the input of diffusion model. Additionally, the warped clothes is used as local conditions for each denoising process to ensure that the resulting output retains as much detail as possible. Our approach, namely Diffusion-based Conditional Inpainting for Virtual Try-ON (DCI-VTON), effectively utilizes the power of the diffusion model, and the incorporation of the warping module helps to produce high-quality and realistic virtual try-on results. Experimental results on VITON-HD demonstrate the effectiveness and superiority of our method. Source code and trained models will be publicly released at: https://github.com/bcmi/DCIVTON-Virtual-Try-On.

If you’re working on Virtual Try-On and want to chat about AI research and development in this area, please reach out to me on Twitter. https://twitter.com/Ahkailash1

About Tryon Labs:

We are building Tryon AI that empowers online apparel stores with Generative AI for Virtual Try-On and Cataloging. Online fashion stores spend a lot on photographers, models, and studios to create catalogs. Online shoppers sometimes struggle to pick clothes that will look nice on them.

Tryon AI cuts costs on cataloging for online fashion stores and improves the shopping experience for customers. Online fashion stores can offer a seamless and immersive way to try on clothes online from the comfort of their homes.

Check out our open-source implementation of TryOnDiffusion: https://github.com/tryonlabs/tryondiffusion

Visit our website: https://www.tryonlabs.ai or contact us at contact@tryonlabs.ai

Join our discord server: https://discord.gg/FuBXDUr3

--

--

Kailash Ahirwar
Tryon Labs

Artificial Intelligence Research | Author - Generative Adversarial Networks Projects | Co-founder - Raven Protocol | Mate Labs