GSoC 2009: Hackystat

[GSoC] The first three Linked Data principles

Posted by: iammyr on: June 23, 2009

During this week I realized a restlet server which provides useful information in RDF when someone looks up a URI, using HTTP URIs as names for thing, as stated in the first three Linked Data principles:

  1. Use URIs as names for things
  2. Use HTTP URIs
  3. Provide useful information in RDF when someone looks up a URI

For the basic resource types which are Hackystat user, project and sensor data, I used the same URI patterns as the ones used in the Sensorbase.
The so called “useful information in RDF” followed the schema published online. The raw sensor data are dynamically declared as sub-class of one specific sensor data type (highlighted in green within that schema). Considered the Sensorbase DB structure where the SDT to which a SD is associated could be of any kind, it could happen to deal with an unforeseen SDT. In this situation I create dynamically a class for the SDT that has an URI following the same URI pattern of every SDT resource type.

The requested RDF data are created dynamically at run-time if not stored in the cache (managed by Apache JCS as done by the DPD service), and then stored in the cache (a ‘.hackystat/cache/linkedservicedata’ folder is created). I don’t know if finally the system will be too slow because of the lack of a triple storage system, but such a storage system could be easily added as needed at anytime in the future, then I’m postponing the question.

During the RDF model construction I used the following external vocabularies:
@prefix owl: .
@prefix xsd: .
@prefix rdfs: .
@prefix rdf: .
@prefix foaf: .
@prefix iswc: .
@prefix doap: .
@prefix evoont: .
@prefix owls_process: .
@prefix sec: .
@prefix sioc: .

The major features that lack to the current implementation are

  • the fourth Linked Data principle, that is ‘links to external datasets’. Of course it lacks extensions to data coming from Hackystat services other than Sensorbase, but I’m planning to realize all the Linked Data principles firstly over data coming from the Sensorbase because all the other data are abstractions based upon them and because I want to be sure to have something complete and working within the end of the GSoC. However after that I’m going to extend the implementation and schema to involve data coming from as many Hackystat services as possible, starting from the DPD.
  • the server doesn’t yet provide RDF data for “all” the existing URI (that is: not all the URIs are dereferenceable), but I’m working on this. However I already include all the existing URI in the unofficial published REST API specification, between which only the URI referred to resource types such as user, project sensor data type and sensor data, are dereferenceable.

That’s about the major features that are going to be added soon.
Additionally it still lacks of support to every kind of RDF serialization (it supports only N3), it lacks of content negotiation to also handle requests for html or xml, and of course it lacks of an interesting interface.

I supposed to optionally (it’s not compelled) receive from sensors the following information added to the basic ones within the ‘Properties’ field of sensor data belonging to the following specified SDT:

  • Commit
    key=branch ; value=repositoryUrl branchUrl
    key=log ; value=logMessage
    key=version ; value=versionNumber
    key=fromFile ; value=fileFullPath
    key=linesAdded ; value=fileFullPath lineNumber lineNumber lineNumber …
    key=linesDeleted ; value=fileFullPath lineNumber lineNumber lineNumber …
    key=totalLines ; value=numTotLines
    key=author ; value=userEMail
  • File Metric
    key=fromFile ; value=fileFullPath
    key=totalLines ; value=numTotLines
    key=commentLines ; value=numTotCommentLines
    key=codeLines ; value=numTotCodeLines
    key=classCount ; value=numTotClass
    key=functionCount ; value=numTotFunction
    key=functionSizeList ; value=fileFullPath functionName-LOC functionName-LOC functionName-LOC …
    key=classSizeList ; value=fullPathClassName-LOC fullPathClassName-LOC fullPathClassName-LOC …
    key=majorClassName ; value=className
    key=fileType ; value=fileTypeName
  • DevEvent
    key=fromFile ; value=fileFullPath
    key=type ; value=devEventTypeName (e.g Refactor or Edit or Compile or Execute or Test or Build or Debug)
  • ReviewIssue
    key=phase ; value=phaseId
    key=module ; value=moduleName
    key=line ; value=lineNumber
    key=author ; value=userEMail
    key=reviewerId ; value=id
    key=runtimeId ; value=id
    key=type ; value=issueType (e.g. Bug or Defect or Enhancement)
    key=priority ; value=priorityValue
    key=status ; value=statusValue
    key=summary ; value=summaryMessage
  • ReviewActivity
    key=author ; value=userEmail
    key=phase ; value=phaseId
    key=phaseItems ; value=phaseId phaseId phaseId …
    key=issueItems ; value=issueId issueId issueId …
    key=module ; value=moduleName
  • CodeIssue
    key=message ; value=msg
    key=priority ; value=priorityValue
    key=runtimeId ; value=id
    key=fromFile ; value=fileFullPath
    key=line ; value=lineNumber
    key=type ; value=issueType
    key=project ; value=userEmail projectName
    key=operatingSystem ; value=os
    key=status ; value=statusValue
  • Activity
    key=fromFile ; value=fileFullPath
    key=editedLines ; value=fileFullPath lineNumber lineNumber lineNumber …
  • BufferTransition
    key=fromFile ; value=fileFullPath
    key=toFile ; value=fileFullPath
    key=modified ; value=TrueFalse
  • Dependency
    key=path ; value=sourceFullPath
    key=granularity ; value=package|class| swComponentFullPath | method fileFullPath methodName
    key=inbound ; value=swComponentFullPath swComponentFullPath swComponentFullPath …
    key=outbound ; value=swComponentFullPath swComponentFullPath swComponentFullPath …
  • Command
    key=path ; value=sourceFullPath
    key=machine ; value=machineName
    key=arguments ; value=ParameterType ParameterName=ParameterValue, ParameterType ParameterName=ParameterValue , …
    key=operatingSystem ; value=os
  • Build
    key=path ; value=sourceFullPath
    key=target ; value=sourceFullPath
    key=result ; value=resultValue
    key=arguments ; value=ParameterType ParameterName=ParameterValue, ParameterType ParameterName=ParameterValue , …
  • UnitTest
    key=fromFile ; value=fileFullPath
    key=result ; value=resultValue
    key=testName ; value=name
  • Coverage
    key=fromFile ; value=fileFullPath
    key=granularity ; value=package|class| swComponentFullPath | method fileFullPath methodName
    key=covered ; value=numCovered
    key=uncovered ; value=numUncovered
  • Perf
    key=testName ; value=name
    key=result ; value=resultValue
    key=outputType ; value=outputTypeName
    key=performanceMeasure ; value=measureName-value-unit measureName-value-unit measureName-value-unit …
  • Issue
    key=summary ; value=summaryMessage
    key=priority ; value=priorityValue
    key=runtimeId ; value=id
    key=fromFile ; value=fileFullPath
    key=line ; value=lineNumber
    key=type ; value=issueType
    key=project ; value=userEmail projectName
    key=operatingSystem ; value=os
    key=status ; value=statusValue
  • ReviewActivity
    key=fromFile ; value=fileFullPath
    key=editedLines ; value=fileFullPath lineNumber lineNumber lineNumber …

You can find the code hosted on my Google code project.

Finally I decided a name for my project: Linked Service Data whose acronym would be “LiSeD” (not LSD ehehe XD)
During the next week:
I’m going to make all the existing URIs dereferenceable and to start linking external datasets (eventually searching for an algorithm to automate everything).

P.S.
I’m really sorry for the delay in posting this weekly report :p
——————UPDATE————————-
THE FIRST MILESTONE
The First milestone release will be published within the 6th of July (the mid-term evaluation date) and will consist in a restlet server able to provide information in RDF for all the URIs identifying resources described in the REST API specification, which are all the ones coming from Sensorbase plus some resource types created by me (as I suppose to receive the above described additional data from Sensorbase). Every URIs will be dereferenceable (currently some of them can already be requested). The RDF model provided won’t yet contain links to external datasets and will be created (if not already cached and if the logged user is authorized) dynamically at run-time and then cached locally on the server host. The server will be able to handle only requests for the N3 RDF serialization format.

It’s granted that these features will be achieved within the mid-term evaluation date. However it doesn’t lack so much time until I’ll start linking external datasets (as required by the fourth and last linked data principle), so it could happen that I’ll add this feature, too, within that date, but this is not granted.

Tags:

1 Response to "[GSoC] The first three Linked Data principles"

[...] In this blog post I’ll make references to my last weekly status report blog post. [...]

Leave a Reply

 

June 2009
M T W T F S S
« May   Jul »
1234567
891011121314
15161718192021
22232425262728
2930  

Blog Stats

  • 383 hits

Top Clicks

  • None

Top Posts

  • None

Pages