Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2019 Vol.20 No.12 P.1657-1664

http://doi.org/10.1631/FITEE.1900580

Intelligent design of multimedia content in Alibaba

Author(s): Kui-long Liu, Wei Li, Chang-yuan Yang, Guang Yang
Affiliation(s): 1. Alibaba Group, Hangzhou 311121, China
Corresponding email(s): kuilong.lkl@alibaba-inc.com, pangeng.lw@alibaba-inc.com, changyuan.yangcy@alibaba-inc.com, qingyun@taobao.com
Key Words: Multimedia content, Alibaba, Artificial intelligence, Design, Business application

Share this article to： More <<< Previous Article \|Next Article >>>

Kui-long Liu, Wei Li, Chang-yuan Yang, Guang Yang. Intelligent design of multimedia content in Alibaba[J]. Frontiers of Information Technology & Electronic Engineering, 2019, 20(12): 1657-1664.

@article{title="Intelligent design of multimedia content in Alibaba",
author="Kui-long Liu, Wei Li, Chang-yuan Yang, Guang Yang",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="20",
number="12",
pages="1657-1664",
year="2019",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1900580"
}

%0 Journal Article
%T Intelligent design of multimedia content in Alibaba
%A Kui-long Liu
%A Wei Li
%A Chang-yuan Yang
%A Guang Yang
%J Frontiers of Information Technology & Electronic Engineering
%V 20
%N 12
%P 1657-1664
%@ 2095-9184
%D 2019
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1900580

TY - JOUR
T1 - Intelligent design of multimedia content in Alibaba
A1 - Kui-long Liu
A1 - Wei Li
A1 - Chang-yuan Yang
A1 - Guang Yang
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 20
IS - 12
SP - 1657
EP - 1664
%@ 2095-9184
Y1 - 2019
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1900580

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: multimedia content is an integral part of alibaba‘s business ecosystem and is in great demand. The production of multimedia content usually requires high technology and much money. With the rapid development of artificial intelligence (AI) technology in recent years, to meet the design requirements of multimedia content, many AI auxiliary tools for the production of multimedia content have emerged and become more and more widely used in alibaba‘s business ecology. Related applications include mainly auxiliary design, graphic design, video generation, and page production. In this report, a general pipeline of the AI auxiliary tools is introduced. Four representative tools applied in the alibaba Group are presented for the applications mentioned above. The value brought by multimedia content design combined with AI technology has been well verified in business through these tools. This reflects the great role played by AI technology in promoting the production of multimedia content. The application prospects of the combination of multimedia content design and AI are also indicated.

智能多媒体内容设计在阿里巴巴的应用

摘要：多媒体内容是阿里巴巴业务生态中必不可少的组成部分，且需求量巨大。多媒体内容生产通常具有较高技术及资金要求。随着人工智能技术近年飞速发展，众多辅助多媒体内容生产的工具应运而生，人工智能技术与多媒体内容设计的结合在阿里巴巴业务生态中的应用愈加广泛，涉及领域包括辅助设计、平面设计、视频生成和页面制造。本文首先介绍了在阿里巴巴业务生态中人工智能辅助设计工具的通用处理流程，然后在上述4个应用领域分别选择一个代表性工具着重介绍。通过这些工具的使用，多媒体内容设计结合人工智能带来的价值在业务中得到很好验证，体现了人工智能技术在促进多媒体内容生产中起到的巨大作用，也预示了其广泛应用前景。

关键词：多媒体内容；阿里巴巴；人工智能；设计；业务应用

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Azadi S, Fisher M, Kim VG, et al., 2018. Multi-content GAN for few-shot font style transfer. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.7564-7573.

[2]Bradski G, Kaehler A, 2008. Learning OpenCV: Computer Vision with the OpenCV Library. O‘Reilly Media, Inc.

[3]Bretan M, Weinberg G, Heck L, 2016. A unit selection methodology for music generation using deep neural networks. https://arxiv.org/abs/1612.03789

[4]Cao Z, Simon T, Wei SE, et al., 2017. Realtime multi-person 2D pose estimation using part affinity fields. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.7291-7299.

[5]Chen LC, Zhu YK, Papandreou G, et al., 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. https://arxiv.org/abs/1802.02611

[6]Chollet F, 2017. Xception: deep learning with depthwise separable convolutions. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.1251-1258.

[7]Goodfellow IJ, Pouget-Abadie J, Mirza M, et al., 2014. Generative adversarial nets. Proc 27^th Int Conf on Neural Information Processing Systems, p.2672-2680.

[8]He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770-778.

[9]He KM, Gkioxari G, Doll‘ar P, et al., 2017. Mask R-CNN. Proc IEEE Int Conf on Computer Vision, p.2961-2969.

[10]Hochreiter S, Schmidhuber J, 1997. Long short-term memory. Neur Comput, 9(8):1735-1780.

[11]Huang X, Peng YX, 2019. TPCKT: two-level progressive cross-media knowledge transfer. IEEE Trans Multim, 21(11):2850-2862.

[12]Kim KS, Zhang DN, Kang MC, et al., 2013. Improved simple linear iterative clustering superpixels. IEEE Int Symp on Consumer Electronics, p.259-260.

[13]Levin A, Lischinski D, Weiss Y, 2007. A closed-form solution to natural image matting. IEEE Trans Patt Anal Mach Intell, 30(2):228-242.

[14]Lin TY, Doll‘ar P, Girshick R, et al., 2017. Feature pyramid networks for object detection. Proc Conf on Computer Vision and Pattern Recognition, p.2117-2125.

[15]Ngiam J, Khosla A, Kim M, et al., 2011. Multimodal deep learning. Proc 28^th Int Conf on Machine Learning, p.689-696.

[16]Papandreou G, Zhu T, Chen LC, et al., 2018. PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model. https://arxiv.org/abs/1803.08225

[17]Peng YX, Zhu WW, Zhao Y, et al., 2017. Cross-media analysis and reasoning: advances and directions. Front Inform Technol Electron Eng, 18(1):44-57.

[18]Peng YX, Huang X, Zhao YZ, 2018. An overview of cross-media retrieval: concepts, methodologies, benchmarks, and challenges. IEEE Trans Circ Syst Video Technol, 28(9):2372-2385.

[19]Peng YX, Qi JW, Huang X, 2019. Current research status and prospects on multimedia content understanding. J Comput Res Devel, 56(1):187-212 (in Chinese).

[20]Ren SQ, He KM, Girshick R, et al., 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Patt Anal Mach Intell, 39(6):1137-1149.

[21]Ristani E, Tomasi C, 2018. Features for multi-target multi-camera tracking and re-identification. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6036-6046.

[22]Rother C, Kolmogorov V, Blake A, 2004. “GrabCut”: interactive foreground extraction using iterated graph cuts. ACM Trans Graph, 23(3):309-314.

[23]Simonyan K, Zisserman A, 2015. Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556

[24]Song SJ, Zhang W, Liu JY, et al., 2019. Unsupervised person image generation with semantic parsing transformation. https://arxiv.org/abs/1904.03379

[25]Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Advances in Neural Information Processing Systems, p.5998-6008.

[26]Xia FT, Wang P, Chen XJ, et al., 2017. Joint multi-person pose estimation and semantic part segmentation. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6769-6778.

[27]Zhang SF, Zhu XY, Lei Z, et al., 2017. S³FD: single shot scale-invariant face detector. Proc IEEE Int Conf on Computer Vision, p.192-201.

[28]Zhou BL, Khosla A, Lapedriza A, et al., 2016. Learning deep features for discriminative localization. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.2921-2929.

[29]Zhu JY, Park T, Isola P, et al., 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proc IEEE Int Conf on Computer Vision, p.2223-2232.

Open peer comments: Debate/Discuss/Question/Opinion

<1>