Full Text:   <1383>

Summary:  <1187>

CLC number: TP391

On-line Access: 2016-05-04

Received: 2015-11-07

Revision Accepted: 2016-02-19

Crosschecked: 2016-04-11

Cited: 1

Clicked: 3645

Citations:  Bibtex RefMan EndNote GB/T7714

 ORCID:

Tian-ran Hu

http://orcid.org/0000-0003-0086-2447

-   Go to

Article info.
Open peer comments

Frontiers of Information Technology & Electronic Engineering  2016 Vol.17 No.5 P.389-402

http://doi.org/10.1631/FITEE.1500385


Home location inference from sparse and noisy data: models and applications


Author(s):  Tian-ran Hu, Jie-bo Luo, Henry Kautz, Adam Sadilek

Affiliation(s):  Computer Science Department, University of Rochester, NY 14623, USA

Corresponding email(s):   thu@cs.rochester.edu, jluo@cs.rochester.edu, kautz@cs.rochester.edu, sadilek@cs.rochester.edu

Key Words:  Home location, Mobility patterns, Healthcare


Share this article to: More |Next Article >>>

Tian-ran Hu, Jie-bo Luo, Henry Kautz, Adam Sadilek. Home location inference from sparse and noisy data: models and applications[J]. Frontiers of Information Technology & Electronic Engineering, 2016, 17(5): 389-402.

@article{title="Home location inference from sparse and noisy data: models and applications",
author="Tian-ran Hu, Jie-bo Luo, Henry Kautz, Adam Sadilek",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="17",
number="5",
pages="389-402",
year="2016",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1500385"
}

%0 Journal Article
%T Home location inference from sparse and noisy data: models and applications
%A Tian-ran Hu
%A Jie-bo Luo
%A Henry Kautz
%A Adam Sadilek
%J Frontiers of Information Technology & Electronic Engineering
%V 17
%N 5
%P 389-402
%@ 2095-9184
%D 2016
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1500385

TY - JOUR
T1 - Home location inference from sparse and noisy data: models and applications
A1 - Tian-ran Hu
A1 - Jie-bo Luo
A1 - Henry Kautz
A1 - Adam Sadilek
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 17
IS - 5
SP - 389
EP - 402
%@ 2095-9184
Y1 - 2016
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1500385


Abstract: 
Accurate home location is increasingly important for urban computing. Existing methods either rely on continuous (and expensive) Global Positioning System (GPS) data or suffer from poor accuracy. In particular, the sparse and noisy nature of social media data poses serious challenges in pinpointing where people live at scale. We revisit this research topic and infer home location within 100 m×100 m squares at 70% accuracy for 76% and 71% of active users in New York City and the Bay Area, respectively. To the best of our knowledge, this is the first time home location has been detected at such a fine granularity using sparse and noisy data. Since people spend a large portion of their time at home, our model enables novel applications. As an example, we focus on modeling people’s health at scale by linking their home locations with publicly available statistics, such as education disparity. Results in multiple geographic regions demonstrate both the effectiveness and added value of our home localization method and reveal insights that eluded earlier studies. In addition, we are able to discover the real buzz in the communities where people live.

This is an interesting paper with an important contribution to the literature. In this paper, the authors have proposed a method to detect users’ homes from geo-located tweets. The authors have shown a number of applications of identifying the home locations including analyzing mobility patterns, topics of Twitter conversation and health states.

基于稀疏噪声数据的家的位置推断:模型与应用

目的:家,是人们生活的中心。由于家的特殊意义,在对于人类活动的研究中,确定家的位置就显得尤为重要。本文旨在从一个人的签到记录上准确预测家的具体位置(精度在100米以内)。
创新点:由于家的位置属于隐私,我们无法,也不能直接使用用户的隐私数据来进行研究。因此数据的采集和近似是第一个难题。本文的解决方法是认为人们在家里说的话跟在外面说的话不一样。由于人们在家里签到会说一些特点的词汇,比如“睡觉”、“洗澡”,等等。我们收集了带有这样词汇的签到,然后把这样的签到句子经由多人筛选。如果所有人都认为某一条签到是来自家里的,我们就认为这个签到的位置是发送者的家的位置。
方法:从人们的签到中抽取一些关键的特征,再把这些特征经由数据挖掘的算法提炼得出一个综合的判断。我们考虑的特征包括,人们出现在某地点的频率、时间,以及是否在夜间出现等等。
结论:实验证明,可以以70%+的准确率预测70%+的活跃社交网络用户,而且精度是100米以内。

关键词:家的位置;移动模式;医疗保健

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Ashbrook, D., Starner, T., 2003. Using GPS to learn significant locations and predict movement across multiple users. Pers. Ubiq. Comput., 7(5):275-286.

[2]Backstrom, L., Sun, E., Marlow, C., 2010. Find me if you can: improving geographical prediction with social and spatial proximity. Proc. 19th Int. Conf. on World Wide Web, p.61-70.

[3]Cheng, Z., Caverlee, J., Lee, K., 2010. You are where you tweet: a content-based approach to geo-locating twitter users. Proc. 19th ACM Int. Conf. on Information and Knowledge Management, p.759-768.

[4]Cheng, Z., Caverlee, J., Lee, K., et al., 2011. Exploring millions of footprints in location sharing services. Proc. 5th Int. AAAI Conf. on Weblogs and Social Media, p.81-88.

[5]Cho, E., Myers, S.A., Leskovec, J., 2011. Friendship and mobility: user movement in location-based social networks. Proc. 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1082-1090.

[6]Cranshaw, J., Toch, E., Hong, J., et al., 2010. Bridging the gap between physical location and online social networks. Proc. 12th ACM Int. Conf. on Ubiquitous Computing, p.119-128.

[7]Culotta, A., 2010. Towards detecting influenza epidemics by analyzing Twitter messages. Proc. 1st Workshop on Social Media Analytics, p.115-122.

[8]Hoh, B., Gruteser, M., Xiong, H., et al., 2006. Enhancing security and privacy in traffic-monitoring systems. IEEE Perv. Comput., 5(4):38-46.

[9]Krumm, J., 2007. Inference attacks on location tracks. Proc. 5th Int. Conf. on Pervasive Computing, p.127-143.

[10]Krumm, J., Rouhana, D., 2013. Placer: semantic place labels from diary data. Proc. ACM Int. Joint Conf. on Pervasive and Ubiquitous Computing, p.163-172.

[11]Lin, M., Hsu, W., Lee, Z., 2012. Predictability of individuals’ mobility with high-resolution positioning data. Proc. ACM Conf. on Ubiquitous Computing, p.381-390.

[12]Mahmud, J., Nichols, J., Drews, C., 2012. Where is this tweet from? Inferring home locations of Twitter users. Proc. 6th Int. AAAI Conf. on Weblogs and Social Media, p.511-514.

[13]Paul, M.J., Dredze, M., 2011. A Model for Mining Public Health Topics from Twitter. Technical Report, Johns Hopkins University, USA.

[14]Pontes, T., Magno, G., Vasconcelos, M., et al., 2012a. Beware of what you share: inferring home location in social networks. Proc. IEEE 12th Int. Conf. on Data Mining Workshops, p.571-578.

[15]Pontes, T., Vasconcelos, M., Almeida, J., et al., 2012b. We know where you live: privacy characterization of Foursquare behavior. Proc. ACM Conf. on Ubiquitous Computing, p.898-905.

[16]Sadilek, A., Krumm, J., 2012. Far out: predicting long-term human mobility. Proc. 26th AAAI Conf. on Artificial Intelligence, p.814-820.

[17]Sadilek, A., Kautz, H., 2013. Modeling the impact of lifestyle on health at scale. Proc. 6th ACM Int. Conf. on Web Search and Data Mining, p.637-646.

[18]Sadilek, A., Kautz, H., Silenzio, V., 2012. Modeling spread of disease from social interactions. Proc. 6th Int. AAAI Conf. on Weblogs and Social Media.

[19]Sapolsky, R.M., 2004. Social status and health in humans and other animals. Ann. Rev. Anthropol., 33:393-418.

[20]Scellato, S., Noulas, A., Lambiotte, R., et al., 2011a. Socio-spatial properties of online location-based social networks. Proc. 5th Int. AAAI Conf. on Weblogs and Social Media, p.329-336.

[21]Scellato, S., Noulas, A., Mascolo, C., 2011b. Exploiting place features in link prediction on location-based social networks. Proc. 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, p.1046-1054.

[22]Smith, G., Wieser, R., Goulding, J., et al., 2014. A refined limit on the predictability of human mobility. Proc. IEEE Int. Conf. on Pervasive Computing and Communications, p.88-94.

[23]Song, C., Qu, Z., Blumm, N., et al., 2010. Limits of predictability in human mobility. Science, 327(5968):1018-1021.

[24]Winkleby, M.A., Jatulis, D.E., Frank, E., et al., 1992. Socioeconomic status and health: how education, income, and occupation contribute to risk factors for cardiovascular disease. Am. J. Public Health, 82(6):816-820.

[25]Xing, W., Ghorbani, A., 2004. Weighted pagerank algorithm. Proc. 2nd Annual Conf. on Communication Networks and Services Research, p.305-314.

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952783; E-mail: cjzhang@zju.edu.cn
Copyright © 2000 - 2022 Journal of Zhejiang University-SCIENCE