IKHarvester

Some time ago I wrote about Didaskon, a framework for composing curriculum for a specific user, basing on his profile and using formal and informal knowledge. I belong to team of the developers.

At the moment, I am developing the one of its component – IKHarvester (Informal Knowledge Harvester). It aims at collecting (harvesting) data from Social Semantic Information Sources (SSIS) and providing it to Didaskon as informal Learning Objects (LOs). By SSIS, I mean community sites (blogs, wikis, social semantic digital libraries, bookmark sharing, video sharing etc.) with semantic annotations added. The prototype will use only wikis based on MediaWiki engine, blogs that support SIOC, and JeromeDL. For the general idea look at earlier presented poster.

In this post, I will focus only on blog posts.


Metadata for blog posts shall be collected with SIOC data exporters. A blog that supports SIOC some additional information in the meta tag (inside head tag) of its HTML code. For instance, regarding my blog, which is available at http://dobrzanski.net, it has the following statement:

<link rel="meta" type="application/rdf+xml" title="SIOC" href="http://dobrzanski.net/index.php?sioc_type=site" />

The href attribute value is the URL of the RDF representation of the data on current page. Its value changes during browsing such blog; it is always up to date, ready to produce RDF output. In general, the output consists of some information about the blog itself and its posts.

IKHarvester is supposed to create metadata information about a blog post so that it can be used as an informal learning object. For that reason, it employs SIOC ecporter. Having the SIOC URL of the post it invokes the exporter and is given RDF triples. There are number of them, some do not carry crucial (for it) information. So, the system filters the output and saves only important triples to the repository. When its providing feature is called, IKHarvester collects that triples from the repository and transform them so they describe the post in a way compatible with LOM standard.

The following table presents how posts’ attributes (first column) are mapped to SIOC ontology predicates (second column) and then to LOM attributes (third column). Some of the LOM attributes are set to default values, which cannot be collected from SIOC exporter output.

Blog posts

Attribute Predicate LOM
  sioc:Post Educational.LearningResourceType=“BlogPost”
URI - Technical.Location

General.Identifier.Catalog=“URI”

General.Identifier.Entry

Meta-Metadata.Identifier.Catalog=“URI”

Meta-Metadata.Identifier.Entry
title dc:title General.Identifier.Title
creator sioc:has_creator Lifecycle.Contribute.Role=“Author”

Lifecycle.Contribute.Entity=“Personal info.”

Lifecycle.Contribute.Date=“Date of creation”

Meta-Metadata.Contribute.Role=“Author”

Meta-Metadata.Contribute.Entity=“Personal info.”

Meta-Metadata.Contribute.Date=“Date”
creation date dctermss:link Lifecycle.version=“Date”
description SIOC:content General.Description

Educational.Description
Classification.Description
rich content (HTML) content:encoded -
topic sioc:topic General.Keyword
Classification.Keyword
reply sioc:has_reply Annotation.Entity=“About author”

Annotation.Date=“Date”

Annotation.Description=“Content”
external link* sioc:links_to Relation.Kind=“references”

Relation.Resource.Identifier.Catalog=“URI”

Relation.Resource.Identifier.Entry

Relation.Resource.Description=“references”
language - General.Language

Educational.Language

Meta-Metadata.Language
- - Educational.InteractivityType=“expositive”
- - Educational.InteractivityLevel=“medium”
- - Educational.SemanticDensity=“medium”
- - Educational.IntendedEndUserRole=“learner”
- - Educational.Context=“school”

Educational.Context=“higher education”

Educational.Context=“training”

Educational.Context=“other”
- - Educational.Difficulty=“easy”
- - Rights.Cost=“no”
- - General.Structure=“atomic”
- - General.AggregationLevel=“1”
- - MetaMetadata.MetadataSchema=“LOMv1.0”
- - Technical.Requirement.OrComposite…

.Type=“operating system”

.Name=“multi-os”

.Type=“browser”

.Name=“any”
- - LifeCycle.Status=“revised”



So, for blog posts I use SIOC ontology. I use it for wiki articles as well. For JeromeDL resources, I employ jeromedl and marcont ontologies. There is no point in showing the tables for each resource type, since they are similar. Looking forward to hearing some feedback icon smile IKHarvester

5 Responses to “IKHarvester”


Leave a Reply