CLC number: TP391.41
On-line Access: 2024-08-27
Received: 2023-10-17
Revision Accepted: 2024-05-08
Crosschecked: 2023-10-13
Cited: 0
Clicked: 1156
Citations: Bibtex RefMan EndNote GB/T7714
Shanshan HUANG, Yuanhao WANG, Zhili GONG, Jun LIAO, Shu WANG, Li LIU. Controllable image generation based on causal representation learning[J]. Frontiers of Information Technology & Electronic Engineering, 2024, 25(1): 135-148.
@article{title="Controllable image generation based on causal representation learning",
author="Shanshan HUANG, Yuanhao WANG, Zhili GONG, Jun LIAO, Shu WANG, Li LIU",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="25",
number="1",
pages="135-148",
year="2024",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.2300303"
}
%0 Journal Article
%T Controllable image generation based on causal representation learning
%A Shanshan HUANG
%A Yuanhao WANG
%A Zhili GONG
%A Jun LIAO
%A Shu WANG
%A Li LIU
%J Frontiers of Information Technology & Electronic Engineering
%V 25
%N 1
%P 135-148
%@ 2095-9184
%D 2024
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.2300303
TY - JOUR
T1 - Controllable image generation based on causal representation learning
A1 - Shanshan HUANG
A1 - Yuanhao WANG
A1 - Zhili GONG
A1 - Jun LIAO
A1 - Shu WANG
A1 - Li LIU
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 25
IS - 1
SP - 135
EP - 148
%@ 2095-9184
Y1 - 2024
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.2300303
Abstract: Artificial intelligence generated content (AIGC) has emerged as an indispensable tool for producing large-scale content in various forms, such as images, thanks to the significant role that AI plays in imitation and production. However, interpretability and controllability remain challenges. Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images. To address this issue, we have developed a novel method for causal controllable image generation (CCIG) that combines causal representation learning with bi-directional generative adversarial networks (GANs). This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images. The key of our approach, CCIG, lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder, generator, and joint discriminator in the image generation module. By doing so, we can learn causal representations in image’s latent space and use causal intervention operations to control image generation. We conduct extensive experiments on a real-world dataset, CelebA. The experimental results illustrate the effectiveness of CCIG.
[1]Ahuja K, Mahajan D, Wang YX, et al., 2023. Interventional causal representation learning. Proc 43th Int Conf on Machine Learning, p.372-407.
[2]Augustin M, Boreiko V, Croce F, et al., 2022. Diffusion visual counterfactual explanations. Proc 36th Advances in Neural Information Processing Systems, p.364-377.
[3]Brehmer J, de Haan P, Lippe P, et al., 2022. Weakly supervised causal representation learning. Proc 36th Advances in Neural Information Processing Systems, p.38319-38331.
[4]Gao YH, Shen L, Xia ST, 2021. DAG-GAN: causal structure learning with generative adversarial nets. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.3320-3324.
[5]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.
[6]Heusel M, Ramsauer H, Unterthiner T, et al., 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Proc 31st Int Conf on Neural Information Processing Systems, p.6629-6640.
[7]Huang S, Li Q, Liao J, et al., 2023. An overview of controllable image synthesis: current challenges and future trends. SSRN, Article 4187269. https://ssrn.com/abstract=4187269
[8]Huang SS, Jin X, Jiang Q, et al., 2022. Deep learning for image colorization: current and future prospects. Eng Appl Artif Intell, 114:105006.
[9]Kocaoglu M, Snyder C, Dimakis AG, et al., 2018. CausalGAN: learning causal implicit generative models with adversarial training. Proc Int Conf on Learning Representations.
[10]Lachapelle S, Brouillard P, Deleu T, et al., 2020. Gradient-based neural DAG learning. Proc 8th Int Conf on Learning Representations.
[11]Lai PK, 2022. DeepSCM: an efficient convolutional neural network surrogate model for the screening of therapeutic antibody viscosity. Comput Struct Biotechnol J, 20:2143-2152.
[12]Leeb F, Annadani Y, Bauer S, et al., 2020. Structural autoencoders improve representations for generation and transfer. https://arxiv.org/abs/2006.07796v1
[13]Lippe P, Magliacane S, Löwe S, et al., 2022. CITRIS: causal identifiability from temporal intervened sequences. Proc 39th Int Conf on Machine Learning, p.13557-13603.
[14]Liu ZW, Luo P, Wang XG, et al., 2015. Deep learning face attributes in the wild. Proc IEEE Int Conf on Computer Vision, p.3730-3738.
[15]Lopez-Paz D, Nishihara R, Chintala S, et al., 2017. Discovering causal signals in images. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6979-6987.
[16]Lu CC, Wu YH, Hernández-Lobato JM, et al., 2021. Nonlinear invariant risk minimization: a causal approach. https://arxiv.org/abs/2102.12353
[17]Lv FR, Liang J, Li S, et al., 2022. Causality inspired representation learning for domain generalization. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8046-8056.
[18]Moraffah R, Moraffah B, Karami M, et al., 2020. Causal adversarial network for learning conditional and interventional distributions. https://arxiv.org/abs/2008.11376
[19]Ng I, Zhu SY, Chen ZT, et al., 2019. A graph autoencoder approach to causal structure learning. https://arxiv.org/abs/1911.07420
[20]Ng I, Zhu S, Fang Z, et al., 2022. Masked gradient-based causal structure learning. Proc SIAM Int Conf on Data Mining, p.424-432.
[21]Pan YH, Li ZC, Zhang LY, et al., 2022. Causal inference with knowledge distilling and curriculum learning for unbiased VQA. ACM Trans Multim Comput Commun Appl, 18(3):67.
[22]Petkov H, Hanley C, Dong F, 2022. DAG-WGAN: causal structure learning with Wasserstein generative adversarial networks. https://arxiv.org/abs/2204.00387
[23]Reinhold JC, Carass A, Prince JL, 2021. A structural causal model for MR images of multiple sclerosis. Proc 24th Int Conf on Medical Image Computing and Computer-Assisted Intervention, p.782-792.
[24]Salimans T, Goodfellow I, Zaremba W, et al., 2016. Improved techniques for training GANs. Proc 30th Int Conf on Neural Information Processing Systems, p.2234-2242.
[25]Sanchez P, Tsaftaris SA, 2022. Diffusion causal models for counterfactual estimation. Proc 1st Conf on Causal Learning and Reasoning, p.647-668.
[26]Sanchez P, Kascenas A, Liu X, et al., 2022. What is healthy? Generative counterfactual diffusion for lesion localization. Proc 2nd MICCAI Workshop on Deep Generative Models, p.34-44.
[27]Sauer A, Geiger A, 2021. Counterfactual generative networks. Proc 9th Int Conf on Learning Representations.
[28]Schölkopf B, Locatello F, Bauer S, et al., 2021. Toward causal representation learning. Proc IEEE, 109(5):612-634.
[29]Shen XW, Liu FR, Dong HZ, et al., 2022. Weakly supervised disentangled generative causal representation learning. J Mach Learn Res, 23(1):241.
[30]Shen YJ, Zhou BL, 2021. Closed-form factorization of latent semantics in GANs. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1532-1540.
[31]Shen YJ, Yang CY, Tang XO, et al., 2022. InterFaceGAN: interpreting the disentangled face representation learned by GANs. IEEE Trans Patt Anal Mach Intell, 44(4):2004-2018.
[32]Sun YP, Chen Q, He XY, et al., 2022. Singular value fine-tuning: few-shot segmentation requires few-parameters fine-tuning. Proc 36th Advances in Neural Information Processing Systems, p.37484-37496.
[33]Suter R, Miladinovic D, Schölkopf B, et al., 2019. Robustly disentangled causal mechanisms: validating deep representations for interventional robustness. Proc 36th Int Conf on Machine Learning, p.6056-6065.
[34]Varando G, 2020. Learning DAGs without imposing acyclicity. https://arxiv.org/abs/2006.03005v1
[35]Vowels MJ, Camgoz NC, Bowden R, 2023. D’ya like DAGs? A survey on structure learning and causal discovery. ACM Comput Surv, 55(4):82.
[36]Wang WJ, Lin XY, Feng FL, et al., 2022. Causal representation learning for out-of-distribution recommendation. Proc ACM Web Conf, p.3562-3571.
[37]Wang XQ, Du YL, Zhu SY, et al., 2021. Ordering-based causal discovery with reinforcement learning. Proc 30th Int Joint Conf on Artificial Intelligence, p.3566-3573.
[38]Wang YF, Zhu YL, Hang TT, et al., 2021. Incorporating proportional sparse penalty for causal structure learning. Proc IEEE 33rd Int Conf on Tools with Artificial Intelligence, p.105-112.
[39]Wei D, Gao T, Yu Y, 2020. DAGs with no fears: a closer look at continuous optimization for learning Bayesian networks. Proc 34th Int Conf on Neural Information Processing Systems, p.328.
[40]Xia WH, Zhang YL, Yang YJ, et al., 2023. GAN inversion: a survey. IEEE Trans Patt Anal Mach Intell, 45(3):3121-3138.
[41]Yang MY, Liu FR, Chen ZT, et al., 2021. CausalVAE: disentangled representation learning via neural structural causal models. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.9593-9602.
[42]Yu Y, Chen J, Gao T, et al., 2019. DAG-GNN: DAG structure learning with graph neural networks. Proc 36th Int Conf on Machine Learning, p.7154-7163.
[43]Zhang LM, Rao A, Agrawala M, 2023. Adding conditional control to text-to-image diffusion models. https://arxiv.org/abs/2302.05543
[44]Zhang WB, Liao J, Zhang Y, et al., 2022. CMGAN: a generative adversarial network embedded with causal matrix. Appl Intell, 52(14):16233-16245.
[45]Zhang XH, Wong Y, Wu XF, et al., 2021. Learning causal representation for training cross-domain pose estimator via generative interventions. Proc IEEE/CVF Int Conf on Computer Vision, p.11270-11280.
[46]Zheng X, Aragam B, Ravikumar P, et al., 2018. DAGs with NO TEARS: continuous optimization for structure learning. Proc 32nd Int Conf on Neural Information Processing Systems, p.9492-9503.
[47]Zhu JG, Xie HC, AbdAlmageed W, 2022. Do-operation guided causal representation learning with reduced supervision strength. https://arxiv.org/abs/2206.01802v1
[48]Zhu SY, Ng I, Chen ZT, 2020. Causal discovery with reinforcement learning. Proc 8th Int Conf on Learning Representations.
Open peer comments: Debate/Discuss/Question/Opinion
<1>