Mulgara RDF Triple-Store

December 30, 2008 | by Omar

Over the past 4 months I have intermittently been looking into the use of the Mulgara RDF triple store. Mulgara is an open-source RDF triple store that boasts to be able to handle up to 7 Billion nodes and was developed some big players in the semantic web space (Zepheira, Topaz and Fedora Commons). My intention was to see if it could be of use in future projects as an alternative to a traditional database.

RDF

For those who are unfamilar with RDF a short description would be:

The Resource Description Framework (RDF) is a language for representing information about resources in the World Wide Web. This Primer is designed to provide the reader with the basic knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description Language, and gives an overview of some deployed RDF applications. It also describes the content and purpose of other RDF specification documents.

For more information about RDF read the following primer hosted on the W3C’s website.

RDF Stores

An RDF store allows for flexible definition of content which is a shortcoming of traditional relational database. An example of this would be the need to add a field within a database. In a relational database one would have to define a new column in the database table which is not usually achievable from a program given that most systems are not foolish to allow database changes from code. However, in an RDF data-store things are stored as nodes and relationships between nodes. Therefore to add a new field simply requires that a new node be created and an establishment of a relationship between the existing nodes and the new node. This flexibility is extremely handy in cases where the data being captured by a system changes often. Add to this the creation of an RDF schema or an ontology and one has the ability to semantically query the data you have within your data-store and are able to make use of meta-data in a way that can not be done within a relational database.

I was thinking that this type of flexibility would be of great benefit for a project we hope to be working on in the new year. Given Zephira’s backing of Mulgara I thought I would investigate whether it could be ready for prime time and the level of complexity required in using it.

Mulgara

Mulgara was initiated in 2006 and was a fork of the Kowari project which died or became unsupported as of 2005. Their claim is that Mulgara has the following featured:

  • Native RDF support
  • Multiple databases (models) per server
  • Simple SQL-like query language
  • Small footprint
  • Full text search functionality
  • Datatype support
  • Supports and tracks W3C Specifications and guidelines
  • Large storage capacity
  • Optimized for metadata storage and retrieval
  • Multi-processor support
  • Independently tuned for both 64-bit and 32-bit architectures
  • Low memory requirements
  • On-disk joins
  • Streamed query results

The remainder of this post will cover my experiences with using Mulgara.

Rough Start

Unfortunately things didn’t start off well. Though the downloading and installation of the Mulgara server was painless there were a number of issues that I came across that would deter many from considering the use of it in a production level system:

Connectors are not included in the default download

Currently there are only a few means of accessing Mulgara to perform CRUD operations:

  • A JRDF connector
  • A Jena connector
  • Straight RMI (remote method invocation)

In the default download from the Mulgara website, the JRDF and Jena connectors are not provided. After a lot of struggling, I resorted to downloading the source and building the server from scratch to ensure that the connectors were included within the jar files.

After getting to this stage I tried to follow the tutorials to try and connect and interact with the server. It seems that the documentation on the mulgara website is stale and that they have not been able to keep their tutorials up to date because the code provided in them does not work unless one uses an in-memory database which is not specified in the tutorial. I posted questions to the various user group email lists and receive no responses.

Once I figured those aspects out things went more smoothly though there is a significant amount of boiler plate code that needs to be created to allow for simple CRUD operations.

Creating a connection to the server

As noted above, one can not simply use JDBC to connect to the server but have an option of using the majority of the popular RDF creation frameworks such as JRDF, Jena or Sesame. I initially tired to use Jena but it seems that support for the JenaMulgara connector died quite a while ago and the connector does not work for the latest version of Mulgara. Therefore, I moved on to the JRDF connector given that the Sesame connector is quite immature.

// Create the URI of the server
java.net.URI serverURI = new java.net.URI("rmi", hostName, "/" + serverName, null);

// Create a new session factory, ensure that it’s local
SessionFactory sessionFactory = SessionFactoryFinder.newSessionFactory(serverURI, false);

// Get a local JRDF Session (local)
Object o = sessionFactory.newJRDFSession();
org.mulgara.server.JRDFSession session = (JRDFSession) sessionFactory.newJRDFSession();
[/sourcecode]

Create a model/database

The above give you a connection to the Mulgara server. To create a model/database requires the following section of code:

java.net.URI modelURI = new URI("rmi", hostName, "/" + serverName, graphName);
java.net.URI modelType = new URI("http://mulgara.org/mulgara#Model");
session.createModel(modelURI, modelType);
org.jrdf.graph.Graph graph = ClientGraph(createGraphProxy(modelURI, session));

Create nodes/relationships

In order to create RDF triples one needs to specify a subject, predicate and an object. There are three types of Java objects that can be used in this in building these relationships:

  1. BlankNode – a blank node that is a node used to group other relationships. A BlankNode can be used as either a subject or an object
  2. Literal – a literal value of some sort (String, number, etc.) A literal can only be used as an object and not a subject or predicate
  3. URIReference – a URIReference is used to define the predicate or relationship between the subject and the object.
org.jrdf.graph.GraphElementFactory elementFactory = graph.getElementFactory();
org.jrdf.graph.BlankNode blanknode = elementFactory.createResource();
org.jrdf.graph.URIReference predicate = elementFactory.createResource(new URI());
org.jrdf.graph.Literal literal = elementFactory.createLiteral(value);

Lastly to create and insert a triple one simply needs to do the following:

org.jrdf.graph.Triple triple = elementFactory.createTriple(subject, predicate, object); // Create the triple object
graph.add(triple); // Store the triple in Mulgara server

For sample code on how to do everything from connect to query and delete see the end of this post.

Performance

I wrote a simple application that took the contents of a relational database and converted it into RDF and stored it in Mulgara. The database I was using was big but not huge, roughly …. All these numbers are based on running the application on a MacBook Pro with a 2Gh Core Duo processor and 2GB of RAM. I initially wrote the app to do inserts one at a time which was obviously inefficient but I wanted to test out the speed of an insert. Each insert of a single triple took roughly 0.17 seconds. In round 2 I started doing inserts in batch. In batch mode it seems that inserts took roughly 0.08 seconds per insert of a node. Both speeds are not particularly fast but like I said, this is running off my laptop so one can’t expect superb performance.

Conclusion

Given that RDF stores are competing with regular relational databases I would have hoped that there was an easier means of connecting to the database and that RM wasn’t being used under the hood. Given the push for RESTful interfaces within the semantic world, I was surprised to see that the Mulgara server does not have a RESTful interface which would relieve the need for RMI completely. Once I got over my dependence on the abhorrent tutorials and documentation things went fairly well though I have performance concerns when dealing with large volumes of data. The main area of concern for me is the requirement of RMI which has been well documented as being not very performant. It pains me to have to use RMI when running both the server and the application on the same server. What is nice is that the server is completely transactional and any failure results in a roll-back. The last concern I have is the fact that the user community/group around Mulgara does not seem to be very active or dedicated to ensuring that the product is supported to the extent that one can rely on getting up to date documentation and a steady stream of bug fixes.

Sample Code

The following code sample shows how to perform inserts, updates, deletes and selection.

package org.twopaths.jrdf;

import java.net.InetAddress;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.UnknownHostException;

import org.jrdf.graph.BlankNode;
import org.jrdf.graph.Graph;
import org.jrdf.graph.GraphElementFactory;
import org.jrdf.graph.GraphElementFactoryException;
import org.jrdf.graph.GraphException;
import org.jrdf.graph.Literal;
import org.jrdf.graph.Triple;
import org.jrdf.graph.URIReference;
import org.jrdf.util.ClosableIterator;
import org.mulgara.client.jrdf.AbstractGraphFactory;
import org.mulgara.query.QueryException;
import org.mulgara.server.JRDFSession;
import org.mulgara.server.NonRemoteSessionException;
import org.mulgara.server.SessionFactory;
import org.mulgara.server.driver.SessionFactoryFinder;
import org.mulgara.server.driver.SessionFactoryFinderException;

public class Sandbox {
	public static void main(String[] args) {
                Graph graph = null;
		try {
			// Create the host name
			String hostname = InetAddress.getLocalHost().getCanonicalHostName();

			// Create the URI of the server
			URI serverURI = new URI("rmi", hostname, "/" + "server1", null);

			// Create a new session factory, ensure that it's local
			SessionFactory sessionFactory = SessionFactoryFinder.newSessionFactory(serverURI, false);

			// Get a local JRDF Session (local)
			Object o = sessionFactory.newJRDFSession();
			System.out.println(o.getClass().getName());
//			LocalJRDFSession session = (LocalJRDFSession) sessionFactory.newJRDFSession();
			JRDFSession session = (JRDFSession) sessionFactory.newJRDFSession();

			//create a new Model
			URI modelURI = new URI("rmi", hostname, "/" + "server1", "exampleGraph");
			URI modelType = new URI("http://mulgara.org/mulgara#Model");
			session.createModel(modelURI, modelType);

			//create a JRDF Graph for the model
//			graph = new JRDFGraph(session, modelURI);
			graph = AbstractGraphFactory.createGraph(serverURI, modelURI);

			//get the Factory
			GraphElementFactory elementFactory = graph.getElementFactory();

			//create resources
			URIReference person = elementFactory.createResource(new URI("http://example.org/staffid#85740"));
			BlankNode address = elementFactory.createResource();

			//create properties
			URIReference hasAddress = elementFactory.createResource(new URI("http://example.org/terms#address"));
			URIReference hasStreet = elementFactory.createResource(new URI("http://example.org/terms#street"));
			URIReference hasCity = elementFactory.createResource(new URI("http://example.org/terms#city"));
			URIReference hasState = elementFactory.createResource(new URI("http://example.org/terms#state"));
			URIReference hasPostCode = elementFactory.createResource(new URI("http://example.org/terms#postalCode"));

			//create values
			Literal street = elementFactory.createLiteral("1501 Grant Avenue");
			Literal city = elementFactory.createLiteral("Bedford");
			Literal state = elementFactory.createLiteral("Massachusetts");
			Literal postCode = elementFactory.createLiteral("01730");

			//create statements
			Triple addressStatement = elementFactory.createTriple(person, hasAddress, address);
			Triple streetStatement = elementFactory.createTriple(address, hasStreet, street);
			Triple cityStatement = elementFactory.createTriple(address, hasCity, city);
			Triple stateStatement = elementFactory.createTriple(address, hasState, state);
			Triple postCodeStatement = elementFactory.createTriple(address, hasPostCode, postCode);

			// Add triples to graph
			graph.add(addressStatement);
			graph.add(streetStatement);
			graph.add(cityStatement);
			graph.add(stateStatement);
			graph.add(postCodeStatement);

			//get all Triples
			Triple findAll = elementFactory.createTriple(null, null, null);
			ClosableIterator allTriples = graph.find(findAll);
			while (allTriples.hasNext()) {
				System.out.println(allTriples.next().toString());
			}

			//search for address (as a subject)
			Triple findAddress = elementFactory.createTriple(address, null, null);
			ClosableIterator addressSubject = graph.find(findAddress);
			while (addressSubject.hasNext()) {
				System.out.println(addressSubject.next().toString());
			}

			//search for the city: "Bedford"
			Triple findCity = elementFactory.createTriple(null, null, city);
			ClosableIterator bedfordCity = graph.find(findCity);
			while (bedfordCity.hasNext()) {
				System.out.println(bedfordCity.next().toString());
			}

			//search for any subject that has an address
			Triple findAddresses = elementFactory.createTriple(null, hasAddress, null);
			ClosableIterator addresses = graph.find(findAddresses);
			while (addresses.hasNext()) {
				System.out.println(addresses.next().toString());
			}

		} catch (UnknownHostException uhe) {
			uhe.printStackTrace();
		} catch (URISyntaxException urise) {
			urise.printStackTrace();
		} catch (SessionFactoryFinderException sffe) {
			sffe.printStackTrace();
		} catch (NonRemoteSessionException nrse) {
			nrse.printStackTrace();
		} catch (QueryException qe) {
			qe.printStackTrace();
		} catch (GraphException ge) {
			ge.printStackTrace();
		} catch (GraphElementFactoryException gefe) {
			gefe.printStackTrace();
		} finally {
                        try {
                            graph.close();
                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                }
		System.out.println("DONE");
	}
}
Bookmark and Share

Tags: , , , , , , ,

2 Responses to “Mulgara RDF Triple-Store”

  1. David Wood Says:

    Hi,

    Thanks for trying Mulgara. Unfortunately, you came to the project with several misconceptions that hampered your initial review. I’d like to try to clear those up, while also admitting the legitimate criticisms you provided.

    Mulgara is not a user-friendly system. You are absolutely right about that. The institutions that funded its development had requirements that didn’t include an easy path toward broader acceptance. That is a shame, but fact. It can also be difficult to keep documentation up to date in Open Source projects and Mulgara has definitely suffered from that.

    However, Mulgara has never been designed to compete with relational databases. It is useful in specific cases where the traditional RDBMS data model is not efficient (that is different from saying where “relational systems are not efficient” because Mulgara is a relational system – just ofg a different sort).

    The Jena and JRDF connectors are deprecated and not recommended for use. At the time Mulgara’s early code was laid down (2000, in a company called Tucana, which spawned Kowari), Jena was the most common way to deal with RDF. That situation has changed over the years with the introduction of RDFa, GRDDL, RDBMS connectors, SPARQL standardization, etc. JRDF was a project started at Tucana, but went its own way as well and is no longer useful with Mulgara. Poor performance can and does result from the use of those connectors, which is why they are not distributed. The best “modern” ways to talk to Mulgara are its included Web interface, command line, SPARQL endpoint and Java API.

    Mulgara’s backing store, a transactional native quad store, is also undergoing significant changes. If and when we can get enough funding to implement our current designs, Mulgara will be able to store and efficiently query literally huge quantities of RDF. Until that time, you may have to play at a deep level to see the benefits.

    Lastly, did you post messages to the mulgara-dev or mulgara-general mailing lists? I didn’t see any and the community surely would have responded. A quick check of the archives shows interest, activity and decent response times. I hope you look at Mulgara more closely.

  2. Paul Gearon Says:

    Hi Omar,
    I felt bad when I realized that your email hadn’t been responded to. I looked it up, and realized that your question was about JRDF. I don’t really know this interface, and so I privately forwarded it on to someone more familiar with it, but he never responded, and I never chased it up. Mea culpa, I’m afraid. :-(

    The Jena interface has been removed completely (it had to use too much of Jena’s internals, and suffered badly because of it). The JRDF interface wasn’t doing ACID correctly at one point, but I believe this has been fixed, so I don’t think it’s exactly “deprecated” anymore. All the same, these are not the interfaces I’d recommend.

    The 3 remaining interfaces (for the moment) are RMI, embedded, and REST. Fedora uses Mulgara as an embedded system. This is an identical interface to the RMI interface. The only difference is that the SessionFactory is a Database instance, and not an RmiSessionFactory.

    REST is relatively recent (October 2008), and I’m still working on it (when I have time). It grew out of the SPARQL endpoint. You can do everything in it…. except transactions. I know you can do everything in it, because this is the only programmatic interface I use now. :-) (I hate RMI). I’m hoping to get time to do REST transactions in a few months, but I have a lot of performance work to get to first, plus I need to learn more about how they’re done in REST. In the meantime, I should document it a little more.

    Initially I didn’t want to call the interface “REST”, as a lot of people were very particular about what can and can’t be called REST, so I called it the “HTTP” interface. You’ll see some of my discussion of it on my (infrequently updated) blog: http://gearon.blogspot.com/2009/02/resting-ive-had-couple-of-drinks-this.html

    I’m glad you got the JRDF interface up and running. If you want to use any other interfaces then please let me know, and this time I promise to be more responsive.

    Paul

Leave a Reply