|
Frontiers of Information Technology & Electronic Engineering
ISSN 2095-9184 (print), ISSN 2095-9230 (online)
2016 Vol.17 No.5 P.389-402
Home location inference from sparse and noisy data: models and applications
Abstract: Accurate home location is increasingly important for urban computing. Existing methods either rely on continuous (and expensive) Global Positioning System (GPS) data or suffer from poor accuracy. In particular, the sparse and noisy nature of social media data poses serious challenges in pinpointing where people live at scale. We revisit this research topic and infer home location within 100 m×100 m squares at 70% accuracy for 76% and 71% of active users in New York City and the Bay Area, respectively. To the best of our knowledge, this is the first time home location has been detected at such a fine granularity using sparse and noisy data. Since people spend a large portion of their time at home, our model enables novel applications. As an example, we focus on modeling people’s health at scale by linking their home locations with publicly available statistics, such as education disparity. Results in multiple geographic regions demonstrate both the effectiveness and added value of our home localization method and reveal insights that eluded earlier studies. In addition, we are able to discover the real buzz in the communities where people live.
Key words: Home location, Mobility patterns, Healthcare
创新点:由于家的位置属于隐私,我们无法,也不能直接使用用户的隐私数据来进行研究。因此数据的采集和近似是第一个难题。本文的解决方法是认为人们在家里说的话跟在外面说的话不一样。由于人们在家里签到会说一些特点的词汇,比如“睡觉”、“洗澡”,等等。我们收集了带有这样词汇的签到,然后把这样的签到句子经由多人筛选。如果所有人都认为某一条签到是来自家里的,我们就认为这个签到的位置是发送者的家的位置。
方法:从人们的签到中抽取一些关键的特征,再把这些特征经由数据挖掘的算法提炼得出一个综合的判断。我们考虑的特征包括,人们出现在某地点的频率、时间,以及是否在夜间出现等等。
结论:实验证明,可以以70%+的准确率预测70%+的活跃社交网络用户,而且精度是100米以内。
关键词组:
References:
Open peer comments: Debate/Discuss/Question/Opinion
<1>
DOI:
10.1631/FITEE.1500385
CLC number:
TP391
Download Full Text:
Downloaded:
2400
Download summary:
<Click Here>Downloaded:
1650Clicked:
6524
Cited:
1
On-line Access:
2024-08-27
Received:
2023-10-17
Revision Accepted:
2024-05-08
Crosschecked:
2016-04-11