Publishing Service

Polishing & Checking

Journal of Zhejiang University SCIENCE C

ISSN 1869-1951(Print), 1869-196x(Online), Monthly

Semantics and the crowd

Abstract: One of the principal scientific challenges that drives my group is to understand the character of formal knowledge on the Web. By formal knowledge, I mean information that is represented on the Web in something other than natural language text—typically, as machine-readable Web data with a formal syntax and a specific, intended semantics. The Web provides a major counterpoint to our traditional artificial intelligence (AI) based accounts of formal knowledge. Most symbolic AI systems are designed to address sophisticated logical inference over coherent conceptual knowledge, and thus the underlying research is focused on characterizing formal properties such as entailment relations, time/space complexity of inference, monotonicity, and expressiveness. In contrast, the Semantic Web allows us to explore formal knowledge in a very different context, where data representations exist in a constantly changing, large-scale, highly distributed network of loosely-connected publishers and consumers, and are governed by a Web-derived set of social practices for discovery, trust, reliability, and use. We are particularly interested in understanding how large-scale Semantic Web data behaves over longer time periods: the way by which its producers and consumers shift their requirements over time; how uniform resource identifiers (URIs) are used to dynamically link knowledge together; and the overall lifecycle of Web data from publication, to use, integration with other knowledge, evolution, and eventual deprecation. We believe that understanding formal knowledge in this Web context is the key to bringing existing AI insights and knowledge bases to the level of scale and utility of the current hypertext Web.
Technically, the scalability of the Semantic Web is rooted in a large number of independently-motivated participants with a shared vision, each following a set of carefully-designed common protocols and representation languages (principally dialects of the Resource Description Framework (RDF), the Web Ontology Language (OWL), and the SPARQL Protocol and RDF Query Language (SPARQL)) that run on top of the standard Web server and browser infrastructure. This strategy builds on the familiar hypertext Web, and has been incredibly successful. The Semantic Web now encompasses more than 50 billion Semantic Web assertions (triples) shared across the world via large numbers of autonomous Web servers, processed by situation-specific combinations of local and remote logic engines, and consumed by a shifting collection of software and users. However, this kind of loosely-coupled scalability strategy comes at a technical price: the Semantic Web is by far the largest formal knowledge base on the planet, and certainly one of the broadest, but also one of the messiest. Semantic coherence can be guaranteed only locally if at all, performance is spotty, data updates are unpredictable, and the raw data can be problematic in many ways. These problems impact the overall scalability of the Semantic Web; beyond simply exchanging large quantities of data, we also want the Semantic Web to scalably support queries, integration, rules, and other data processing tools. If we can solve these problems, though, the Semantic Web promises an exciting new kind of data Web, with practical scaling properties beyond what federated database technology can achieve. In the full Semantic Web vision, massive amounts of partially-integrated data form a dynamically shifting fabric of on-demand information, able to be published and consumed by clients around the world, with transformational impact.
Our current work is inspired by two properties of the Semantic Web: how existing Internet social (‘crowd’) phenomena can apply to data on the Semantic Web, and how we can use these social Web techniques to improve the dynamic scalability of the Semantic Web. Most data currently published on the Semantic Web is originally sourced from existing relational databases, either via front-end systems like the D2R server (http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/), or by offline loading of the relational data into an associated high-performance triplestore to support Semantic Web access and processing. In each case, the core information is usually acquired by conventional means, cleansed and structured into a relational store by a database administrator, and imbued with a particular data semantics that is eventually reflected in the Semantic Web republication. Thus, much of the data presently on the Semantic Web relies heavily on the traditional computer science discipline of database construction.

Key words: No Keyword


Share this article to: More

Go to Contents

References:

<Show All>

Open peer comments: Debate/Discuss/Question/Opinion

<1>

Please provide your name, email address and a comment





DOI:

10.1631/jzus.C1101003

CLC number:

Download Full Text:

Click Here

Downloaded:

2727

Clicked:

6400

Cited:

1

On-line Access:

2012-04-07

Received:

Revision Accepted:

Crosschecked:

Journal of Zhejiang University-SCIENCE, 38 Zheda Road, Hangzhou 310027, China
Tel: +86-571-87952276; Fax: +86-571-87952331; E-mail: jzus@zju.edu.cn
Copyright © 2000~ Journal of Zhejiang University-SCIENCE