Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2024 Vol.25 No.1 P.149-159

Deep3DSketch-im: rapid high-fidelity AI 3D model generation by single freehand sketches

Author(s): Tianrun CHEN, Runlong CAO, Zejian LI, Ying ZANG, Lingyun SUN
Affiliation(s): 1. College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China more
Corresponding email(s): tianrun.chen@zju.edu.cn, crl1567@163.com, zejianlee@zju.edu.cn, 02750@zjhu.edu.cn, sunly@zju.edu.cn
Key Words: Content creation, Sketch, Three-dimensional (3D) modeling, 3D reconstruction, Shape from X, Artificial intelligence (AI)

Share this article to： More <<< Previous Article \|Next Article >>>

Tianrun CHEN, Runlong CAO, Zejian LI, Ying ZANG, Lingyun SUN. Deep3DSketch-im: rapid high-fidelity AI 3D model generation by single freehand sketches[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(1): 149-159.

@article{title="Deep3DSketch-im: rapid high-fidelity AI 3D model generation by single freehand sketches",
author="Tianrun CHEN, Runlong CAO, Zejian LI, Ying ZANG, Lingyun SUN",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="25",
number="1",
pages="149-159",
year="2024",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300314"
}

%0 Journal Article
%T Deep3DSketch-im: rapid high-fidelity AI 3D model generation by single freehand sketches
%A Tianrun CHEN
%A Runlong CAO
%A Zejian LI
%A Ying ZANG
%A Lingyun SUN
%J Frontiers of Information Technology & Electronic Engineering
%V 25
%N 1
%P 149-159
%@ 2095-9184
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300314

TY - JOUR
T1 - Deep3DSketch-im: rapid high-fidelity AI 3D model generation by single freehand sketches
A1 - Tianrun CHEN
A1 - Runlong CAO
A1 - Zejian LI
A1 - Ying ZANG
A1 - Lingyun SUN
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 25
IS - 1
SP - 149
EP - 159
%@ 2095-9184
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300314

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: The rise of artificial intelligence generated content (AIGC) has been remarkable in the language and image fields, but artificial intelligence (AI) generated three-dimensional (3D) models are still under-explored due to their complex nature and lack of training data. The conventional approach of creating 3D content through computer-aided design (CAD) is labor-intensive and requires expertise, making it challenging for novice users. To address this issue, we propose a sketch-based 3D modeling approach, Deep3Dsketch-im, which uses a single freehand sketch for modeling. This is a challenging task due to the sparsity and ambiguity. Deep3Dsketch-im uses a novel data representation called the signed distance field (SDF) to improve the sketch-to-3D model process by incorporating an implicit continuous field instead of voxel or points, and a specially designed neural network that can capture point and local features. Extensive experiments are conducted to demonstrate the effectiveness of the approach, achieving state-of-the-art (SOTA) performance on both synthetic and real datasets. Additionally, users show more satisfaction with results generated by Deep3Dsketch-im, as reported in a user study. We believe that Deep3Dsketch-im has the potential to revolutionize the process of 3D modeling by providing an intuitive and easy-to-use solution for novice users.

Deep3DSketch-im：基于人工智能从单个手绘草图快速生成高保真三维模型

陈天润¹，曹润龙³，李泽健²，臧影³，孙凌云¹
¹浙江大学计算机科学与技术学院，中国杭州市，310027
²浙江大学软件学院，中国杭州市，310027
³湖州师范学院信息工程学院，中国湖州市，313000
摘要：人工智能生成内容（AIGC）在语言和图像领域的崛起值得注意，但由于其复杂性和缺乏训练数据，基于人工智能生成三维模型仍未被充分探索。通过计算机辅助设计（CAD）创建三维内容的传统方法需大量人力和专业知识，这对于新手用户来说具有挑战性。为解决此问题，提出一种基于草图的三维建模方法，名为Deep3DSketch-im，它利用单个手绘草图进行建模。由于草图的稀疏性和模棱两可性，这是一项具有挑战性的任务。Deep3DSketch-im使用一种称作"有符号距离场（SDF）"的新型数据表示，通过将隐式连续场整合至从草图到三维模型的过程，以及一个特别设计的可以捕捉点和局部特征的神经网络，改进从草图到三维模型的过程。进行了大量实验证明该方法的有效性，在合成数据集和真实数据集上均取得更优的性能。此外，用户研究报告显示，用户对Deep3DSketch-im生成的结果更加满意。我们相信，Deep3DSketch-im有潜力通过为新手用户提供直观易用的解决方案来彻底改变三维建模的过程。

关键词：内容创作；草图；三维建模；三维重建；从X到形状；人工智能

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Cai YJ, Wang YW, Zhu YH, et al., 2021. A unified 3D human motion synthesis model via conditional variational auto-encoder. IEEE/CVF Int Conf on Computer Vision, p.11625-11635.

[2]Chang AX, Funkhouser T, Guibas L, et al., 2015. ShapeNet: an information-rich 3D model repository. https://arxiv.org/abs/1512.03012

[3]Chen DY, Tian XP, Shen YT, et al., 2003. On visual similarity based 3D model retrieval. Comput Graph Forum, 22(3):223-232.

[4]Chen TR, Fu CL, Zhu LY, et al., 2023a. Deep3DSketch: 3D modeling from free-hand sketches with view- and structural-aware adversarial training. IEEE Int Conf on Acoustics, Speech and Signal Processing, p.1-5.

[5]Chen TR, Fu CL, Zang Y, et al., 2023b. Deep3DSketch+: rapid 3D modeling from single free-hand sketches. Proc 29^th Int Conf on Multimedia Modeling, p.16-28.

[6]Chen TR, Ding CT, Zhu LY, et al., 2023c. Reality3DSketch: rapid 3D modeling of objects from single freehand sketches. IEEE Trans Multim, early access.

[7]Chen ZQ, Zhang H, 2019. Learning implicit fields for generative shape modeling. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.5932-5941.

[8]Chester I, 2007. Teaching for CAD expertise. Int J Technol Des Educ, 17:23-35.

[9]Cohen JM, Markosian L, Zeleznik RC, et al., 1999. An interface for sketching 3D curves. Symp on Interactive 3D Graphics, p.17-21.

[10]Deng CY, Huang JH, Yang YL, 2020. Interactive modeling of lofted shapes from a single image. Comput Visual Med, 6(3):279-289.

[11]Fu X, Zhang SZ, Chen TR, et al., 2022. Panoptic NeRF: 3D-to-2D label transfer for panoptic urban scene segmentation. Int Conf on 3D Vision, p.1-11.

[12]Gao CJ, Yu Q, Sheng L, et al., 2022. SketchSampler: sketch-based 3D reconstruction via view-dependent depth sampling. Proc 17^th European Conf on Computer Vision, p.464-479.

[13]Guillard B, Remelli E, Yvernay P, et al., 2021. Sketch2Mesh: reconstructing and editing 3D shapes from sketches. IEEE/CVF Int Conf on Computer Vision, p.13003-13012.

[14]Huang SS, Wang YH, 2024. Controllable image generation based on causal representation learning. Front Inform Technol Electron Eng, 25(1):135-148.

[15]Jo K, Shim G, Jung S, et al., 2023. CG-NeRF: conditional generative neural radiance fields for 3D-aware image synthesis. IEEE/CVF Winter Conf on Applications of Computer Vision, p.724-733.

[16]Kar A, Häne C, Malik J, 2017. Learning a multi-view stereo machine. Proc 31^st Int Conf on Neural Information Processing Systems, p.364-375.

[17]Kato H, Ushiku Y, Harada T, 2018. Neural 3D mesh renderer. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3907-3916.

[18]Lei YM, Li JQ, 2024. Prompt learning in computer vision: a survey. Front Inform Technol Electron Eng, 25(1):42-63.

[19]Li CJ, Pan H, Bousseau A, et al., 2020. Sketch2CAD: sequential CAD modeling by sketching in context. ACM Trans Graph, 39(6):164.

[20]Lin CH, Wang CY, Lucey S, 2020. SDF-SRN: learning signed distance 3D object reconstruction from static images. Proc 34^th Int Conf on Neural Information Processing Systems, Article 961.

[21]Lin GY, Yang L, Zhang CY, et al., 2023. Patch-Grid: an efficient and feature-preserving neural implicit surface representation. https://arxiv.org/abs/2308.13934

[22]Liu SC, Saito S, Chen WK, et al., 2019a. Learning to infer implicit surfaces without 3D supervision. Proc 33^rd Int Conf on Neural Information Processing Systems, Article 32.

[23]Liu SC, Chen WK, Li TY, et al., 2019b. Soft rasterizer: a differentiable renderer for image-based 3D reasoning. IEEE/CVF Int Conf on Computer Vision, p.7707-7716.

[24]Mahapatra C, Jensen JK, McQuaid M, et al., 2019. Barriers to end-user designers of augmented fabrication. CHI Conf on Human Factors in Computing Systems, Article 383.

[25]Metzer G, Richardson E, Patashnik O, et al., 2022. Latent-NeRF for shape-guided generation of 3D shapes and textures. https://arxiv.org/abs/2211.07600

[26]Michel O, Bar-On R, Liu R, et al., 2022. Text2Mesh: text-driven neural stylization for meshes. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.13482-13492.

[27]Park JJ, Florence P, Straub J, et al., 2019. DeepSDF: learning continuous signed distance functions for shape representation. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.165-174.

[28]Reddy EJ, Rangadu VP, 2018. Development of knowledge based parametric CAD modeling system for spur gear: an approach. Alex Eng J, 57(4):3139-3149.

[29]Seufert M, 2019. Fundamental advantages of considering quality of experience distributions over mean opinion scores. Proc 11^th Int Conf on Quality of Multimedia Experience, p.1-6.

[30]Tong X, 2022. Three-dimensional shape space learning for visual concept construction: challenges and research progress. Front Inform Technol Electron Eng, 23(9):1290-1297.

[31]Tong YZ, Yuan JK, Zhang M, et al., 2023. Quantitatively measuring and contrastively exploring heterogeneity for domain generalization. Proc 29^th ACM SIGKDD Conf on Knowledge Discovery and Data Mining, p.2189-2200.

[32]Wang F, Kang L, Li Y, 2015. Sketch-based 3D shape retrieval using convolutional neural networks. IEEE Conf on Computer Vision and Pattern Recognition, p.1875-1883.

[33]Wang WY, Xu QG, Ceylan D, et al., 2019. DISN: deep implicit surface network for high-quality single-view 3D reconstruction. Proc 33^rd Int Conf on Neural Information Processing Systems, Article 45.

[34]Xu R, Wang ZX, Dou ZY, et al., 2022. RFEPS: reconstructing feature-line equipped polygonal surface. ACM Trans Graph, 41(6):228.

[35]Xu R, Dou ZY, Wang NN, et al., 2023. Globally consistent normal orientation for point clouds by regularizing the winding-number field. ACM Trans Graph, 42(4):111.

[36]Yang L, Liang YQ, Li X, et al., 2023. Neural parametric surfaces for shape modeling. https://arxiv.org/abs/2309.09911

[37]Yao SY, Zhong RZ, Yan YC, et al., 2022. DFA-NeRF: personalized talking head generation via disentangled face attributes neural rendering. https://arxiv.org/abs/2201.00791

[38]Yu A, Ye V, Tancik M, et al., 2021. pixelNeRF: neural radiance fields from one or few images. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.4576-4585.

[39]Zang Y, Fu CL, Chen TR, et al., 2023. Deep3DSketch+: obtaining customized 3D model by single free-hand sketch through deep learning. https://arxiv.org/abs/2310.18609

[40]Zhang SH, Guo YC, Gu QW, 2021. Sketch2Model: view-aware 3D modeling from single free-hand sketches. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.6000-6017.

[41]Zhang SZ, Peng SD, Chen TR, et al., 2023. Painting 3D nature in 2D: view synthesis of natural scenes from a single semantic mask. IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8518-8528.

[42]Zhong Y, Gryaditskaya Y, Zhang HG, et al., 2020. Deep sketch-based modeling: tips and tricks. Int Conf on 3D Vision, p.543-552.

[43]Zhou J, Ke P, Qiu XP, et al., 2023. ChatGPT: potential, prospects, and limitations. Front Inform Technol Electron Eng, early access.

[44]Zhu DD, Li YC, Zhang M, et al., 2023a. Bridging the gap: neural collapse inspired prompt tuning for generalization under class imbalance. https://arxiv.org/abs/2306.15955v2

[45]Zhu DD, Li YC, Shao YF, et al., 2023b. Generalized universal domain adaptation with generative flow networks. Proc 31^st ACM Int Conf on Multimedia, p.8304-8315.

[46]Zhu DD, Li YC, Yuan JK, et al., 2023c. Universal domain adaptation via compressive attention matching. IEEE/CVF Int Conf on Computer Vision, p.6974-6985.

Open peer comments: Debate/Discuss/Question/Opinion

<1>