KnowMED Inc.


Knowledge Management, Engineering and Design

Home
Solutions
Architecture
The Semantic Web
About Us

What is the Semantic Web?


Semantic systems and the Semantic Web in particular, are believed to provide appropriate frameworks for informationintegration and interperability in ultra large scaledistributed environments. The Semantic Web is an extension of current Internet technology in which the information is given well-defined meaning by making its underlying structure explicit and formal. The Resource Definition Framework (RDF) plus structured ontology languages such as the Web Ontology Language (OWL) provide necessary constructs for representing information and knowledge as globally unique resources with a clear, unambiguous, precise, and computationally interpretable (formal) semantic. Formal representation of information and knowledge supports computer reasoning for automatic classifications, and enables greater interoperability, integration, and repurposing of information in large scale and complex settings, and allows data to be shared and reused across boundaries of applications, enterprises, and communities. This also makes the information identically 'understandable' for both human and machine.

The Semantic Web technology is generally viewed as layers of technological frameworks as depicted in figure below. Each layer extends and builds on the layer below and tends to be progressively more specialized, and more expressive.

 

Resource Description Framework (RDF): The RDF provides a general-purpose framework for representation of a web of information. An RDF statement comprises of three elements (nodes): subject, predicate, and object, as in an English statement. RDF statements are often called RDF triples and that is basically the only schema for construction of RDF documents.

Put together, all statements (triples) within an RDF document or dataset form a directed labeled graph. These graphs can be given a URI on the web and refered to as any other RDF resource. Automated agents then can use these URIs to find RDF datasets and use them on demand:

An RDF Document is a graph representation of RDF statements


RDF represents each node in the statement using a unique resource identifier (URI) that uniquely and globally identify any given resource in a web of distributed information. For example a URI such as <http://umls.nlm.nih.gov/C0027515> can uniquely identify a specific antibiotic on a network as distributed and complex as internet. Since URIs are globally resolvable, they can be used to link, mash-up, or integrate nodes (rdf resource), statements (triples), documents (collection of triples forming a graph), and databases (collection of graphs) in novel ways and on widely distributed network architectures such as the Web. Built on top of internet architecture, RDF data also makes underlying network and storage infrastructure completely transparent and resilitent to change. Since all layers of the Semantic Web layered cake follow a well-defined formal semantic, RDF documents are machine processable, and can directly participate in automatic and computer reasoning processes (no additional middleware for translation necessary). 

OWL (Web Ontology Language): The Web Ontology Language (OWL) provides a rich and expressive language for defining structured ontologies. OWL is an extension of the RDF Schema Language (RDFS) that provides modeling primitives for defining relationships between properties and resources, and to constrain their interpretation by reasoning engines.  There are three major flavors of the OWL language: OWL-Lite, OWL-DL and OWL-Full to support construction of ontologies with increasing levels of complexity and expressivity.

 SPARQL: SPARQL (pronounced "sparkle” ) is an RDF query language; its name is a recursive acronym that stands for SPARQL Protocol and RDF Query Language. SPARQL is a syntactically-SQL-like language for querying RDF graphs via pattern matching. The language's features include basic conjunctive patterns, value filters, optional patterns, and pattern disjunction.  SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. The output of SPARQL queries can be a result set (represented as tuples, XML, or JSON), or another RDF graph. 

The SPARQL protocol is a method for remote invocation of SPARQL queries. It specifies a simple interface that can be supported via HTTP or SOAP that a client can use to issue SPARQL queries against some endpoint. Combination of SPARQL as language and SPARQL as protocol creates an extensible query and retrieval mechanism that can retrieve information from a distributed network.

  •  Granular and atomic

RDF provides the most fine-grained data representation possible. The informational resources can be represented without any schema at all, and can be bound to more than one schema, on a later time (late binding). A URI such as <http://umls.nlm.nih.gov/C0023451> is the most atomic form of data that can be published and resolved uniquely and unambiguously on a widely distributed network. RDF statements (<subject><predicate><object>) are the most atomic unit of information and can be constructed by mashing up URIs:

<http://umls.nlm.nih.gov/C0027515>   <http://w3.org/…/rdf-syntax#type>     <http://semanticnetwork.nlm.nih.gov/Antibiotic>

 

or to make it easier to read:

<umls:C0027515>     <rdf:type>      <sn:Antibiotic>

Notice that the 3 nodes or URI’s in this statement are basically mashed-up from 3 different locations on the web: <umls:C0027515> is a concept defined in the NLM UMLS Metathesaurus, <rdf:type>  is an RDF primitive defined by RDF specifications located on the W3C website, and <sn:Antibiotic>  is another concept, this time from UMLS Semantic Network. That is, new information is generated by mash-up of existing data. 

This mechanism can be used to contextualize any data on the web, in most fine-grained, distributed and extensible way possible.

  • Inherently Extensible

Using triples as the single method of data representation and the modularity of the RDF content makes it an inherently and infinitely extensible language. New vocabularies or constructs can be added to an RDF document dynamically. In this example we add <skos:prefLabel> to our data to extend it and enable associating a name to an existing object:

<umls:C0027515> <skos:prefLabel> “Neamine

Extending RDF graphs to capture new types of information is as easy as adding a new triple and does not require change in underlying database schema

 

Existing vocabulary can be mashed up (by adding new RDF statements about existing resources) to impose new semantics, or to add new information to the model:

<skos:prefLabel>  <rdf:type> <owl:AnnotationProperty>

In this example by assigning <owl:AnnotationProperty> type to the <skos:prefLabel> we inform computer programs that this property points to literal data suitable for human use and not for computation. This can reduce cost of certain types of search and computations in large datasets, and improve indexing of data.

RDFS (RDF Schema Language) and OWL are RDF dialects that introduce primitives with well defined formal semantics to extend RDF documents in ways that computer programs can automatically interpret and process them. Primitives for transitivity, subclass and subProperty relations between concepts and properties, existential and universal qualifiers that can constrain properties, sets, intersections, unions, disjoints, compliments, quantifiers (cardinality), property chaining, symmetric, asymmetric, inverse, relflexive and irreflexive relations all enable infinite extensibility of any given RDF document in many different ways and up to a full blown Ontology with strong semantics processible by a computer reasoning engine.

  •  Modular and Highly distributable

Every RDF triple is meaningful on its own account, and has no other implied meaning based on where in the document or on the network it is located. That is, order within which triples are recorded in an RDF document does not matter. Therefore, an RDF document can be segmented into several smaller subsets of RDF triples arbitrarily, and each subset can be maintained in a different document, database or network and under different governance, access, and protection structure. RDF documents can be linked to each other by making RDF statements about RDF documents:

<a:RDFDocument_1> <owl:imports> <b:RDFDocument_3>

This graph shows how new RDF models can be generated by reusing and sharing other RDF documents on a distribute network (through linking). The RDF 1 for example, is a dataset that consists of all contents of RDF A, B, 3 and 5, plus its own unique triples. RDF 2 consists of all triples contained in RDF 3,4, 5 and its own triples. Each RDF document may reside in a different network and only linked to each other through an import statement (the arrow represents importing, for example a:RDF1 owl:imports b:RDF3). Notice that this whole figure by itself is an RDF graph that represents distribution of information in a distributed network! 

 

RDF documents can also be linked by RDF statements between URIs within each document (<a:URI1> <rdf:type><b:URI2>). This creates an largely distributed and modular architecture where a dataset can be modularized arbitrarily and reused, linked, shared, mixed, and matched with many other modules on a largely distributed network, in novel and ad-hoc ways:

In this example two RDF documents are linked through two statements within RDFB that linksto some URI defined by the RDFA document (red arrows).


  • Globally resolvable, sound, provable, consistent semantics

The Semantic Web URIs are based on current Web and internet protocols to enable distribution and retrieval of information in an ultra large, dynamically changing, and expanding network. OWL and RDFS primitive have computer interpretable meaning that enable reasoning engines to interpret the logical entailments of RDF content. This is a critical feature of the RDF/OWL representations that can change the interpretation of any given RDF document from a dataset for data retrieval into a full blown ontology for classification, knowledge discovery, and higher level abstractions and simulations. Secondly the same functionality could be leveraged for automatic consistency and integrity checking of RDF content across a largely distributed network.

  • Scalable:

Semantic Web is designed to enable automated processing of large volumes of information distributed across largely distributed networks such as the Web. Hence the design of the underlying representation and retrieval frameworks have taken into account the fact that the information on the web can grow in a massively large scale, can be distributed in a vast, dynamically changing, and evolving network. There are several reasons why Semantic Web representations are massively scalable.

First the representation framework is extremely simple, fine-grained and flat. This reduces the overhead associated with complex information models and schema representation frameworks and middleware that is required to parse, manipulate and interpret them. RDF provides many different dialects with semantically identical representations to suite different purposes, from human readability, indexing, transport and serialization. While the N3 representation is most suitable for readability by human beings and automated parsers

 

An N3 RDF representation

 

the RDF/XML formats exists to facilitate transport and communication of RDF documents through existing communication infrastructure that are already optimized for XML:

 

RDF/XML representation of the same information 

Furthermore, the use of URI’s that can be resolved automatically across a large interconnected network supports reuse of existing information by linking or referring to them instead of unnecessarily repeating and copying them inside many datasets. That is, for example, demographic information about a patient can reside in a single dataset, and other records (medical, financial, social) that need that information can link to it on a just in time basis remotely. Links can be hard links on RDF statements or through querying using SPARQL.

This single sourcing of information eliminates redundancy, improves reuse, reduces error, improves the quality of the information, and makes it easy to update and maintain high quality information.

More significantly, all representation, interpretation and transformation layers associated with use of RDF data are, or can be virtualized as RDF graphs: The RDF data is stored as RDF (RDF Triple store) either natively or by virtualization, retrieved as RDF (using SPARQL), represented in memory as RDF graphs, serialized, and transported across the network as RDF (using RDF/XML format).   This is in contrast with traditional infrastructure that store data in tuples, represent in memory as objects and proprietary data structures, serialize as XML and transport as HL7 messages. Each of these transformations requires different skillsets, middleware, and technology platforms that may or may not work natively with other layers. This adds overhead, and creates many more points of failure.