PyRDF

Yesterday's post and comments got me thinking. It still is fairly hard to manipulate and generate RDF data and I don't think it really has to be. ActiveRDF (a Ruby RDF API) takes an interesting approach and I thought I'd build something similarish in Python, so I started that and after a couple of hours I already have something quite neat. I've called it PyRDF for now and here's a sample piece of code for you to get a feel for how it works.


import pyrdf from pyrdf import RdfStore, RdfResource, RdfType from rdflib.Namespace import Namespace

NS_P = Namespace('http://www.zefhemel.com/ont/person#') NS_J = Namespace('http://www.zefhemel.com/ont/job#')

store = RdfStore(defaultNS = NS_P) store.prefix_mapping('p', NS_P) store.prefix_mapping('j', NS_J) pyrdf.setDefaultStore(store)

Person = RdfType(NS_P['Person']) Website = RdfType(NS_P['Website']) Job = RdfType(NS_J['Job'])

zef = RdfResource(NS_P['zef'], rdf_type = Person) zef.name = 'Zef Hemel' zef.age = 22 zef.country = 'Ireland' zef.city = 'Dublin'

job1 = RdfResource(NS_J['job1'], defaultNS=NS_J, rdf_type = Job) job1.name = 'Student System Administrator' job1.description = 'Fiddling around with Linux servers' job1.startYear = 2003 job1.endYear = 2005 job2 = RdfResource(NS_J['job2'], rdf_type = Job) # And without the defaultNS set: job2.j_name = 'Writing website' job2.j_description = 'Writing own weblogs, not that well paid.' job2.j_startYear = 2003

zef.hadJob = [job1, job2]

zef.website = []

zefhemelcom = RdfResource(NS_P['zefhemelcom'], rdf_type = Website) zefhemelcom.title = 'ZefHemel.com' zefhemelcom.url = 'http://www.zefhemel.com' zef.website.append(zefhemelcom) zefnu = RdfResource(NS_P['zefnu'], rdf_type = Website) zefnu.title = 'Zef.Nu' zefnu.url = 'http://zef.nu' zef.website.append(zefnu)

print store.serialize(format="pretty-xml")


Here is the output of that, saves quite some typing eh?Ok, you probably need an understanding of XML and XML namespaces to fully understand this but even if you don't, it should be pretty obvious. PyRDF right now has three classes:

  1. RdfStore, which stores RDF triples as described before. You don't have to do much with this except registering some prefixes. Later on you can also use this class to serialize your data into RDF/XML and to save it and load it from files, but that doesn't work yet.
  2. RdfResource, which represents a resource, you can simply see this as an object. When instantiating an RdfResource you have to give it at least an URI. Additionally you can pass it:
    • store, a place to store the resource's data, by default it's all stored in the defaultStore and usually that's fine.
    • defaultNS, this default namespace that's used for the property names. More on this later.
    • A number of initial properties and values. This is the same as writing resourcename.property = value, but is just added for convenience
  3. RdfType, this is a direct subclass of RdfResource, it doesn't do much, hardly anything at the moment. Later it could potentially be used to enforce correct typing and property use and stuff.

RdfResources have properties, just like objects. Properties can have other resources, literals (strings, integers etc.) or lists (of resources or literals) as values. PyRDF tries to automatically guess what kind of type a property is. If you start using it as a list, it will function as a list, if you put or literals or RdfResources in it, it will (hopefully) act as expected.

By default the property name is combined with the default namespace of the resource (or store), so for example if your default namespace is http://www.zefhemel.com/ont/person# and your property name is age, then the URI of the property will be http://www.zefhemel.com/ont/person#age. If you use a prefix followed by an underscore in the property name, like j_description, the default namespace will be overridden by the namespace associated with the j prefix. So in this case the URI will be http://www.zefhemel.com/ont/job#description.

That's it, that's all that there's to it and I think it's pretty neat. I will now work on the querying capabilities, but I think it's already quite nice like this.

If you want to play around with it you can do a subversion check-out from http://svn.zefhemel.com/pyrdf or you can just visit that address and download it with your browser. You need rdflib to run it, but I think it comes preinstalled with Python (on Windows anyway).