StyleDreamer: Make Your 3D Style Avatar From a Single View with Multi-View Consistency Score Distillation

method

TL;DR: StyleDreamer creates high-fidelity 3D Style Avatar from a single view.

Overview

In this paper, we investigate a practical One-to-Style task that generates the 3D style avatar with a single view. The task presents two challenges: 1) Content consistency and 2) Style consistency across multiple views of the 3D style avatar. We propose StyleDreamer to address the two problems. For the first problem, our StyleDreamer employs a 3D GAN to preserve the identity of each view. For the second problem, we propose a novel Multi-View Consistency Score Distillation (MV-CSD) to ensure performing consistent stylization across multiple views. In this way, the style of the rendered images from all views is supervised to match the style of the given view, based on the provided edit instruction. Experimental results show that our approach outperforms existing methods in terms of stability and quality, indicating its potential applications in the real world.

How it works

StyleDreamer: Our method generates a 3D head avatar whose content and style are consistent with the given portrait image and style prompts, respectively.

  1. Sampling. Given a front-view portrait image, novel views are synthesized by a 3D GAN, as reference images.
  2. Rendering. Given the sampled primary view and auxiliary view, two images are rendered from NeRF, respectively.
  3. Gradient of CSD. The rendered images are sent to image editing diffusion, conditioned on corresponding reference images and the style prompt, as well as the LoRA to compute the gradient of CSD.
  4. Gradient of MV-CSD. The auxiliary gradient is warped via both two views and the primary depth map. Then, we mixup the primary gradient and warped auxiliary gradient to compute the gradient of MV-CSD.
  5. Optimzation. The NeRF is updated by the gradient of MV-CSD. The LoRA is updated on the rendered images.

More Results

method
method
method
method
method