Introduction

Introduction

This tutorial is an introduction to the Semantic Web. In this tutorial, you will learn about each of the pieces that go together to form the basis of the Semantic web. The Semantic Web is a series of specifications designed to link data and databases together to form a vast, ever growing body of knowledge. From the W3C Semantic Web home page: "The Semantic Web is about two things. It is about common formats for integration and combination of data drawn from diverse sources, where on the original Web mainly concentrated on the interchange of documents. It is also about language for recording how the data relates to real world objects. That allows a person, or a machine, to start off in one database, and then move through an unending set of databases which are connected not by wires but by being about the same thing." more >>

Contents

The Semantic Web Stack

URI/IRI - Uniform and Internationalized Resource Identifiers
Unicode - Multiple Language Text Encoding
XML - Extensible Markup Language
Namespaces - Sets of URIs
XML Query - Navigation, Selection and Sorting
XML Schema - Structure
RDF - Resource Description Framework
Ontology - OWL - Web Ontology Language
Rules/Query -
Logic -
Proof -
Trust -

URIs URLs and IRIs

A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. In other words, a URI is a string that gives a resource a name.

A Uniform Resource Locator (URL) is a URI that points to a web address. URIs do not necessarily mean web address.

Internationalized Resource Identifiers (IRIs) are Unicode sequences of characters and form the basis for International Domain Names (IDN):
http://点心和烤鸭.w3.mag.keio.ac.jp


⇓ Click the down arrow for more detail. ⇓



⇑ Click the up arrow to go back to the top of the page. ⇑


What is a URI?
A URI usually follows this form:
scheme ":" hier-part [ "?" query ] [ "#" fragment ]

Examples of URIs


You can also use a "relative reference" like
./images/redbox.png

Relative URIs are useful to refer to resources that are located in the same website as the current resource or document.

Resources and URIs
Examples of resources are web pages, images, RSS feeds, etc. Any thing, real or imaginary is a resource and can have a name. The name of your resources should be a unique identifier, a URI.

⃘ Any resource anywhere can be given a URI.
⃘ Any resource of significance should be given a URI.
⃘ It doesn't matter to whom or where you specify that URI, it will have the same meaning.

Unicode

The Unicode standard provides a large number of characters covering most of the currently used scripts in the world. Unicode also covers notational systems for science, technology, music, and scholarship.

Unicode in xhtml
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type"
content="text/html; charset=UTF-8" />
</head>


⇓ Click the down arrow for more detail. ⇓




⇑ Click the up arrow to go back to the top of the page. ⇑


What is Unicode?
We use Unicode to assist with the internationalization and localization of computer software. Unicode is typically implemented using character encodings such as UTF-8. Unicode encoding allows the content of a resource to be in any language and makes the Semantic Web universal.

UTF-8
UTF-8 is the most common Unicode encoding. UTF-8 is backwards compatible with ASCII which is a character-encoding scheme with a heavy English language bias. UTF-8 implements most of the characters in the Unicode standard using between one and four byte numbers. The 1-byte encoding uses the 128 US-ASCII characters. For example:
A - 41 LATIN CAPITAL LETTER A
Б - D091 CYRILLIC CAPITAL LETTER BE
- E0 BD 8D TIBETAN LETTER DDHA
- E2 88 A9 MATH INTERSECTION

Fonts
Unicode encoded resources like a web page require a compatible font such as Arial Unicode MS and Bitstream Cyberbit. Fortunately, most web browsers have pre-installed Unicode compatible fonts.

XML

XML stands for Extensible Markup Language. The XML standard provides a way to store data in a form that is easy to read by both people and machines.

XML is said to be a mark up language in that the semantics or meaning of the stored data are "marked up" using "tags", similar to HTML tags.

XML is said to be extensible because the mark up tags are definable by anyone.

XML is considered a meta-language in that other computer languages such as xhtml and RDF are created with it.

⇓ Click the down arrow for more detail. ⇓



⇑ Click the up arrow to go back to the top of the page. ⇑


XML Design Goals
XML was designed to describe data (structure) and to focus on what the data is (semantics). Unlike HTML where the tags are all pre-defined, XML allows the author to define his own tags, what the tags mean and how the document's structure works.

What is a Tag?
An XML element is made up of a start and end tag with data in between. The tags describe the data. The data is called the value of the element. In the following example, the element's name (or tag) is "product" and the value is "widget":

<product>widget</product>


Valid XML
XML documents should be well-formed and should validate against a document type definition (DTD). Well-formed means that the syntax of the XML is correct. Valid means that the XML is verified against a DTD to ensure that all tags in the document are nested properly.

Example of a Well-Formed XML Document
<?xml version="1.0"?>
<message>
<to>Dave</to>
<from>Scott</from>
<subject>Reminder</subject>
<text>Please pick up the mail on the way home.</text>
</message>

Namespaces

A primary benefit of the URI is that an author may specify an exact reference to a specific location in some other document instead of making a vague literary reference like "See Chapter 11 Section 2". In xhtml, for instance, we use an anchor element to accomplish this:

<p>See <a href="./ontdes.xml#ClsVsInd">Class vs Individual</a></p>


Note the relative reference in the URI "./ontdes.xml#ClsVsInd". The "." means the reference is relative to the default namespace of the document. In xhtml, an author may use the html element to specify the default namespace.



⇓ Click the down arrow for more detail. ⇓


⇑ Click the up arrow to go back to the top of the page. ⇑


What is a Namespace?
Consider what would happen if we attempt to combine two XML-based languages, xhtml and SVG. SVG is a powerful language that provides elements for drawing boxes, circles and lines. The Semantic Web layers diagram at the top of this page is an example of SVG. If we could combine SVG with xhtml, we could embed drawings in our web pages with nothing more than mark-up; no need for our readers to download big JPGs or PNGs.

Both xhtml and SVG use the <style> element but the structure of their use is different. How can people or computer programs tell the difference? xhtml and SVG are independent languages primarily because each has their own namespace. We can tell the difference using a prefix to specify the namespace of the element. Here's an example of an SVG drawing embedded in an xhtml table cell:

<td><svg:svg
xmlns:svg="http://www.w3.org/2000/svg" version="1.0"
width="10.370368" height="54.1021" id="layerDiag">
<svg:defs id="layerDiagDefs">
<svg:style type="text/css"> ... </svg:style>
</svg:defs>
...
</svg:svg></td>


Namespaces are a very fundamental part of any XML-based language, allowing different languages to be combined in powerful ways.

XML Query

XML Query (XQUERY) is a computer language that provides a way to read, extract and transform XML documents. XQuery includes language constructs to read XML, select and order specific values and return the result in the form of another XML document, for instance an xhtml web page.

XSLT also provides these services but many people feel that XSLT is very difficult to learn. People with an SQL background experience difficulty in understanding the unfamiliar XML syntax of XSLT.

XQuery depends on XPath. XPath is an expression language that allows the processing of XML documents by providing a representation in the form of a tree of XML elements.

⇓ Click the down arrow for more detail. ⇓


⇑ Click the up arrow to go back to the top of the page. ⇑


XPath Examples
/bookstore/book/title
/bookstore/book[1]/title
/bookstore/book[price>35]/price


XQuery Example
For each book in an XML document, list only those books published by Addison-Wesley after 1991. Include the book's year and title in the output:

for $b in document("www.bn.com/bib.xml")/bib/book
where $b/publisher="Addison-Wesley" and $b/@year>1991
return <book year="{$b/@year}">{$b/title}</book>


In the example,
$b
is a temporary variable that represents the book elements in the XML document. We use XPath statements to refer to other elements as in
$b/publisher
. The result of the XQuery (specified in the
return
block) is a document with book elements that might look like:
<book year="1992">2010 The Odyssey Continues</book>
<book year="1994">Radiohead - The band that saved Rock</book>


FLWOR Expressions
XQuery statments are called FLWOR expressions:
for - specifies what items in the sequence you want to select
let - used to create temporary names used in the return
where - limit items returned
order - specify the sort order of the results
return - specify the structure of the data returned

XML Schema

An XML Schema document lists the elements, their attributes names and the relationships between them when used to prepare an XML document of some type.

A computer program might use an XML Schema document to verify and validate the documents that the schema describes. An author can use a schema document as a guide for producing well-formed and valid data documents for use by computer programs.

XML Schema provides a namespace and datatype aware alternative to XML's native Document Type Definitions (DTDs). Although, DTDs are still in widespread use today, XML Schema has many more features and is much more flexible.

⇓ Click the down arrow for more detail. ⇓


⇑ Click the up arrow to go back to the top of the page. ⇑


XML Schema Example
<xsd:complexType name="USAddress" >
<xsd:sequence>
<xsd:element name="street" type="xsd:string"/>
<xsd:element name="city" type="xsd:string"/>
<xsd:element name="state" type="xsd:string"/>
<xsd:element name="zip" type="xsd:decimal"/>
</xsd:sequence>
<xsd:attribute name="country" type="xsd:NMTOKEN" fixed="US"/>
</xsd:complexType>


xsd
is the prefix of the XML Schema namespace. The
<complexType>
element defines a
<sequence>
of
<element>
s that will appear together in the XML data document. Each data value will be interpreted as having the type
xsd:type
(
string
,
decimal
, etc.)

<shipTo country="US">
<street>123 Anystreet</street>
<city>Oxford</city><state>PA</state><zip>19363</zip>
</shipTo>


In the example, the
<shipTo>
element is only partially specified. In XML Schema, we specify complex types like
USAdress
separately and reference them with other element specifications:

<xsd:element name="shipTo" type="USAddress"/>


Modular, reusable types are a fundamental feature of XML Schema.

RDF Model & Syntax

RDF is an XML language based on the premise that everything is definable as a stream of Triples; Subject, Predicate and Object.

Consider the statement Sony is a corporation headquartered in Minato, Tokyo, Japan. If we were to represent this, we might use the following XML fragment:

<company>Sony</company>
<addr country="JP"><city>Minato, Tokyo</city></addr>


But this representation fails to capture the fact that Sony is a corporation with a headquarters, so we would need to add yet more elements or attributes to get closer to our intended meaning.

⇓ Click the down arrow for more detail. ⇓


⇑ Click the up arrow to go back to the top of the page. ⇑


What is a Triple?
All of the statements in an RDF document are in the form of a uniform pattern called a Triple. A Triple dramatically simplifies how we can think about the information in our documents. Triples have three parts:
Subject - the thing the statement describes
Predicate - a specific property of the thing the statement describes
Object - the thing the statement says is the value of this property for the thing the statement describes

RDF identifies things using URIs and describes these things in terms of simple properties and property values. If
ex:
is a prefix for the namespace http://example.com/companies# then we might specify our statement:

ex:company1 ex:hasCompanyName "Sony"
ex:company1 ex:hasCompanyType "Corporation"
ex:company1 ex:hasLocation ex:address1
ex:address1 ex:hasCity "Minato, Tokyo"
ex:address1 ex:hasCountry "JP"
ex:address1 hasLocationType "Headquarters"


All Triples work in the same way, so as we gain more knowledge, we can simply append more triples to our documents. What we place into RDF documents is based on URIs, so the data is linked to other data. Consider that there may be many millions of documents spread across the web all working the same way and linked together.

RDF is fundamental to the structure of the Semantic Web.

Ontology

Using simple triples in RDF, we can identify a thing by assigning it a URI and describe a set of relationships by associating the the thing's URI with properties and property values. Web Ontology Language (OWL) extends the RDF language with constructs we can use to formally describe the meaning (or semantics) behind the properties.

We use the OWL language to do the following:
⃘ Formally describe a system by defining classes and properties of those classes
⃘ Define individuals and assert properties about them, for example which individuals belong to which classes.
⃘ Reason about these classes and individuals using inference to uncover new facts about them.

⇓ Click the down arrow for more detail. ⇓


⇑ Click the up arrow to go back to the top of the page. ⇑


What is an Ontology?

An ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. A typical Ontology includes two types of statements:

TBOX - Descriptions of a system in terms of its vocabulary, for example, a set of classes and properties

All Students are Persons
There are two types of Persons: Students and Teachers


ABOX - Descriptions associated with instances of those classes

John is a Person
Mary is a Teacher


Using constructs based in First Order Logic (FOL), an OWL document describes an Ontology with RDF and RDF's schema language RDFS. OWL documents are usually quite complicated and difficult to follow. Subsequently, many people use an editor such as
protégé
. An OWL editor allows you to use logical expressions to describe your ontologies:

∀ hasChild.Doctor
∃ citizenOf.{USA}