<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>2Paths &#187; Omar</title>
	<atom:link href="http://www.2paths.com/author/okhan/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.2paths.com</link>
	<description>Custom Software Technical Architecture, Design and Development in Vancouver, BC, Canada</description>
	<lastBuildDate>Mon, 27 Sep 2010 01:15:46 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Cassandra Key-Value Store Primer</title>
		<link>http://www.2paths.com/2010/06/07/cassandra-key-value-store-primer/</link>
		<comments>http://www.2paths.com/2010/06/07/cassandra-key-value-store-primer/#comments</comments>
		<pubDate>Mon, 07 Jun 2010 17:56:55 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Featured]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[cassandra]]></category>
		<category><![CDATA[Grails]]></category>
		<category><![CDATA[key-value store]]></category>
		<category><![CDATA[nosql]]></category>
		<category><![CDATA[thrift]]></category>

		<guid isPermaLink="false">http://www.2paths.com/?p=1286</guid>
		<description><![CDATA[The &#8220;no-sql&#8221; movement has been gaining strength over recent years. There is no denying that relational databases have their role and are highly effective in many cases. However, there are times when the scalability and availability of a relational database can become an issue. The &#8220;no-sql&#8221; movement is hoping to provide an alternative option to [...]]]></description>
			<content:encoded><![CDATA[<p>The &#8220;no-sql&#8221; movement has been gaining strength over recent years. There is no denying that relational databases have their role and are highly effective in many cases. However, there are times when the scalability and availability of a relational database can become an issue. The &#8220;no-sql&#8221; movement is hoping to provide an alternative option to relational databases. These options allow for the creation of data stores that do not necessarily require the creation of fixed schemas or the joining of tables and tend to focus on scaling horizontally. </p>
<p><strong>Apache Cassandra</strong><br />
<a href="http://cassandra.apache.org/">Cassandra</a> is a fairly recent addition to the &#8220;no-sql&#8221; movement. Initially developed by Facebook, Cassandra was open sourced in 2008 and is now housed and maintained by Apache.</p>
<blockquote><p>Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo  and the data model from Google&#8217;s BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.</p>
<p>Cassandra was open sourced by Facebook in 2008, where it was designed by Avinash Lakshman (one of the authors of Amazon&#8217;s Dynamo) and Prashant Malik ( Facebook Engineer ). In a lot of ways you can think of Cassandra as Dynamo 2.0 or a marriage of Dynamo and BigTable. Cassandra is in production use at Facebook but is still under heavy development.</p></blockquote>
<p><strong>Data Model</strong><br />
The key difference between Cassandra and many of the other key-value stores is that it possesses the concept of a &#8220;Super Column.&#8221; For more information on what exactly a super column is (and an overview of the Cassandra data model in general) read the following blog post &#8220;<a href="http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model">WTF is a SuperColumn? An Intro to the Cassandra Data Model</a>&#8221;</p>
<p><strong>Performance</strong><br />
Many may be curious about the performance of these key value stores. We found an article that compared <a href="http://blog.medallia.com/2010/05/choosing_a_keyvalue_storage_sy.html">Cassandra to Voldemort</a></p>
<p><strong>Interacting with Cassandra</strong><br />
Cassandra is developed in java however it does not allow access via JDBC. Instead it provides access to it&#8217;s repository via the &#8220;<a href="http://incubator.apache.org/thrift/">Apache Thrift</a>&#8221; framework which basically allows developers in any major language to gain access to a given service (in this case the Cassandra data store). For those like me who are used to JDBC, the thrift interface can be a bit confusing given that there isn&#8217;t much documentation for them. The Java client made available on the Cassandra website is rather rudimentary with no support for things like transactions, therefore many people choose to make use of a library called <a href="http://prettyprint.me/2010/02/23/hector-a-java-cassandra-client/">Hector</a> that provides many of the features necessary for any production worthy application making use of Cassandra.</p>
<p><strong>The Project </strong><br />
What I wanted to to do was use the Cassandra data store on a pet project that I&#8217;m working on in my spare time. I intend to use the datastore to do the following: </p>
<ol>
<li>Store messages being sent from one user to another</li>
<li>Track queries made by a given user</li>
<li>Store user complaints</li>
</ol>
<p><strong>Installation</strong><br />
Installation of the server is easy enough. Install Java 1.6 if you haven&#8217;t already. Edit <code>/bin/cassandra.in.sh</code> and change <code>JAVA_HOME</code> if you run a different version of java as your default<code> JAVA_HOME</code>. By default the server has an RMI port set to 8080, you can change the value in the <code>/bin/cassandra.in.sh</code> file as well.</p>
<p><strong>Configuration</strong><br />
Edit <code>/bin/storage-conf.xml</code> to define various key spaces and column families. The concept of a Keyspace is similar to the schema in a relational database, where as the ColumnFamily is analogous to a table. In defining the ColumnFamilies you are defining the key groupings and how they are sorted. Therefore, I defined the following:</p>
<pre class="brush: xml">
  &lt;Keyspaces&gt;
    &lt;Keyspace Name=&quot;Nikahfied&quot;&gt;

      &lt;ColumnFamily Name=&quot;Queries&quot;
                    Comment=&quot;User queries&quot;/&gt;

      &lt;ColumnFamily Name=&quot;Complaints&quot;
                    Comment=&quot;User complaints&quot;/&gt;

      &lt;ColumnFamily Name=&quot;Messages&quot;
                    ColumnType=&quot;Super&quot;
                    CompareWith=&quot;UTF8Type&quot;
                    CompareSubcolumnsWith=&quot;UTF8Type&quot;
                    RowsCached=&quot;10000&quot;
                    KeysCached=&quot;50%&quot;
                    Comment=&quot;Message threads between users of the system&quot;/&gt;

    &lt;/Keyspace&gt;
  &lt;/Keyspaces&gt;
</pre>
<p>I would be creating a schema named Nikahfied and three table are Queries, Complaints and Messages. As you will notice, there is no real structure to the ColumnFamilies other than what type of column and the CompareWith strategy as outlined in the &#8220;<a href="http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model">WTF is a SuperColumn? An Intro to the Cassandra Data Model</a>&#8221; article. In this case we have one super column and two regular columns. In code I will ensure that each of the rows fore each ColumnFamily contain the same content however there is nothing forcing you to do so. As you will notice, we can configure each of the ColumnFamiles individually. Messages will need to handle heavy reads and writes where as Queries and Complaints will be more write intensive. Messages therefore have a caching scheme defined where as the others do not as they will primarily handle writes rather than reads.</p>
<p>When initially defining the column structure I was a bit confused. I was taking the definition of Cassandra as a &#8220;key value store&#8221; literally and was confused about there being a primary key and the secondary keys. When defining a basic Column I though it would simple have a key and a value. So things would be defined as <code>Nikahfied.&lt;ColumnFamily&gt;.key = value</code> where this is actually not the case. For a basic ColumnFamily, the structure is as follows: <code>Nikahfied.&lt;ColumnFamily&gt;.&lt;Primary key&gt;.&lt;Secondary Key&gt; = value</code>. In the case of a super column, this goes one level further <code>Nikahfied.&lt;ColumnFamily&gt;.&lt;Primary key&gt;.&lt;Secondary Key&gt;.&lt;Tertiary Key&gt; = value</code>. That had me puzzled for some time&#8230;upon re-reading the &#8220;<a href="http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model">WTF is a SuperColumn? An Intro to the Cassandra Data Model</a>&#8221; article I realized my mistake so I though I&#8217;d point it out to anyone else who was confused about this construct.</p>
<p><strong>Data Structures:</strong><br />
<span style="text-decoration:underline">Queries:</span><br />
The structure of queries would be simple enough. I would store each query by date/time. To track which queries the user made when.</p>
<p><code>Nikahfied.Queries.UserId.&lt;Date/Time&gt; = &lt;Query&gt;</code></p>
<p><span style="text-decoration:underline">Complaints:</span><br />
There is only one type of complaint wherein a user can complain about another user. So I wanted to track who received the complaint, who complained and when. The structure of a complaint would be defined as follows:<br />
<code>Nikahfied.Complaints.&lt;Violator Id&gt;.&lt;Complainant Id&gt; = &lt;Date/Time&gt;</code></p>
<p><span style="text-decoration:underline">Messages:</span><br />
If there was ever a case to use a super column, a messages would be it. A user has a number of threads that contain one or more message. A user can receive and send messages. Messages themselves are the same, I simply need to track which threads contain messages that a user sent. I also need to keep track of which threads have unread messages within them. My initial design was the following:<br />
<code>Nikahfied.Messages.RecipientId.threads = A JSON Object that contains Threads defined in JSON Objects themselves. A thread being a group of<br />
                              .sent_ids = An array of thread id's that contain messages that the user has sent<br />
                              .unread_ids = An array of thread ids that contain unread messages</code></p>
<p>However, with this structure there was no need for a super column and I was determined on making use of one <img src='http://www.2paths.com/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . So I expanded out the thread structure as follows:</p>
<p><code>Nikahfied.Messages.RecipientId.threads.&lt;thread id&gt; = A JSON Object that contains an individual Thread<br />
                              .sent_ids = An array of thread id's that contain messages that the user has sent<br />
                              .unread_ids = An array of thread ids that contain unread messages</code></p>
<p>Now that we have the content structure defined I now need to create a hook between my application and the Cassandra key value store. For development I have decided to use the Grails Framework to speed up development. I have included the Hector library to access the framework but needed to define a CassandraService to be the central point of contact between various controllers and other services to the Cassandra key store. Both Cassandra&#8217;s Thrift interface and Hector have pretty poor documentation so I created a Grails service that makes the interaction with Cassandra a bit clearer (at least to me). <span style="text-decoration:underline;font-weight:bold">Note:</span> You will need to be using Java 1.6.</p>
<pre class="brush: groovy">
import me.prettyprint.cassandra.service.CassandraClientPool
import me.prettyprint.cassandra.service.CassandraClientPoolFactory
import me.prettyprint.cassandra.service.CassandraClient
import me.prettyprint.cassandra.service.Keyspace
import org.apache.cassandra.thrift.Column;
import org.apache.cassandra.thrift.ColumnPath;
import org.apache.cassandra.thrift.NotFoundException;

import org.apache.cassandra.thrift.SuperColumn

class CassandraService {

    boolean transactional = true

    def servers=[&quot;localhost:9160&quot;]
    def defaultKeyspace=&quot;Nikahfied&quot;
    private static final String NOT_FOUND = &quot;&quot;

    private execute(keyspaceName=defaultKeyspace,block){
        CassandraClientPool pool = CassandraClientPoolFactory.INSTANCE.get();
        CassandraClient client = pool.borrowClient(servers);

        try {
            Keyspace keyspace = client.getKeyspace(keyspaceName)
            return block(keyspace)
        } finally {
            pool.releaseClient(client);
        }
    }

    /**
     * Get a single super column
     * @param columnFamily
     * @param secondaryKey
     * @param key primary key
     * @return Matching super column
     */
    public SuperColumn getSuperColumn(String columnFamily, String key, String secondaryKey) {
        ColumnPath cp = new ColumnPath(columnFamily)
        cp.setSuper_column(secondaryKey.bytes)
        return execute {Keyspace keyspace -&amp;gt;
            SuperColumn sc = null;
            try {
                sc = keyspace.getSuperColumn(key, cp)
            } catch (NotFoundException nfe) {
                sc = null;
            }
            return sc
        }
    }

    /**
     * Get multiple super columns
     * @param columnFamily
     * @param secondaryKey
     * @param keys primary keys
     * @return matching super columns
     */
    public Map multigetSuperColumn(String columnFamily, List keys, String secondaryKey) {
        ColumnPath cp = new ColumnPath(columnFamily)
        cp.setSuper_column(secondaryKey.bytes)
        return execute {Keyspace keyspace -&amp;gt;
            Map scMap = keyspace.multigetSuperColumn(keys, cp)
            return scMap
        }
    }

    /**
     * Get multiple columns
     * @param secondaryKey secondaryKey
     * @param keys primary keys
     * @param columnFamily column family
     * @return results if any
     */
    public Map multigetColumn(List keys, String secondaryKey, String columnFamily) {
    	ColumnPath cp = new ColumnPath(columnFamily)
        cp.setColumn(secondaryKey.bytes)
        return execute {Keyspace keyspace -&amp;gt;
            Map cMap = keyspace.multigetColumn(keys, cp)
            return cMap
        }
    }

    /**
     * Get a single column and it&#039;s values
     * @param columnFamily
     * @param key primary key
     * @return matching column
     */
    public Column getColumn(String columnFamily, String key, String secondaryKey) {
        def cp = new ColumnPath(columnFamily)
        cp.setColumn(secondaryKey.bytes)
        return execute {Keyspace keyspace -&amp;gt;
            Column c = null;
            try {
                c = keyspace.getColumn(key, cp)
            } catch (NotFoundException nfe) {
                c = null;
            }
            return c
        }
    }

    /**
     * Sets the new value for this column path
     * @param cf Column family
     * @param secondaryKey secondary key
     * @param key primary key
     * @param value value
     */
    def setColumnPathValue(String cf, String key, String secondaryKey, String value){
        def cp = new ColumnPath(cf)
        cp.setColumn(secondaryKey.bytes)
        return setColumnValue(cp, key, value)
    }

    /**
     * Sets the new super column value for this column path
     * @param cf Column Family
     * @param sc Super Key Id
     * @param secondaryKey Secondary Key
     * @param key primary key
     * @param value value
     * @return the value of the Column requested
     */
    def setColumnPathValue(String cf, String sc, String key, String secondaryKey, String value){
        def cp = new ColumnPath(cf)
        cp.setSuper_column(sc.bytes)
        cp.setColumn(secondaryKey.bytes)
        return setColumnValue(cp, key, value)
    }

    /**
     * Set a regular column
     * @param cp Column path
     * @param key primary key
     * @param value value
     */
    private setColumnValue(ColumnPath cp, String key, String value){
        return execute{ Keyspace keyspace -&amp;gt;
            keyspace.insert(key, cp, value.bytes)
        }
    }

    /**
     * Batch insert content for a give key and it&#039;s values
     * @param key primary key
     * @param columnMap column key values
     * @param superColumnMap super column key values
     */
    public batchInsert(String key, Map&amp;lt;String, List&amp;gt; columnMap, Map&amp;lt;String, List&amp;gt; superColumnMap) {
    	return execute{ Keyspace keyspace -&amp;gt;
        	keyspace.batchInsert(key, columnMap, superColumnMap)
    	}
    }

}
</pre>
<p>Using this class I have created a MessageService, ComplaintsService and a QueriesService that interact with the CassandraService to retrieve and store data. There is an adjoining Message and Complaints controller where as the QueriesService is referenced by the SearchController. Hopefully this blog post will provide you with a better understanding of how to get started with Cassandra the CassandraService class should provide you with a jump start on integrating Cassandra into an existing Grails or Java project.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2010/06/07/cassandra-key-value-store-primer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Mulgara RDF Triple-Store</title>
		<link>http://www.2paths.com/2008/12/30/mulgara-rdf-store/</link>
		<comments>http://www.2paths.com/2008/12/30/mulgara-rdf-store/#comments</comments>
		<pubDate>Tue, 30 Dec 2008 20:45:29 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[jena]]></category>
		<category><![CDATA[jrdf]]></category>
		<category><![CDATA[mulgara]]></category>
		<category><![CDATA[rdf store]]></category>
		<category><![CDATA[rmi]]></category>
		<category><![CDATA[semantic web]]></category>
		<category><![CDATA[triples]]></category>

		<guid isPermaLink="false">http://www.2paths.com/?p=437</guid>
		<description><![CDATA[Over the past 4 months I have intermittently been looking into the use of the Mulgara RDF triple store. Mulgara is an open-source RDF triple store that boasts to be able to handle up to 7 Billion nodes and was developed some big players in the semantic web space (Zepheira, Topaz and Fedora Commons). My [...]]]></description>
			<content:encoded><![CDATA[<p>Over the past 4 months I have intermittently been looking into the use of the Mulgara RDF triple store. Mulgara is an open-source RDF triple store that boasts to be able to handle up to 7 Billion nodes and was developed some big players in the semantic web space (Zepheira, Topaz and Fedora Commons). My intention was to see if it could be of use in future projects as an alternative to a traditional database.</p>
<h3>RDF</h3>
<p>For those who are unfamilar with RDF a short description would be:</p>
<blockquote><p>The Resource Description Framework (RDF) is a language for     representing information about resources in the World Wide Web.     This Primer is designed to provide the reader with the basic     knowledge required to effectively use RDF. It introduces the basic concepts of RDF and describes its XML syntax. It describes how to define RDF vocabularies using the RDF Vocabulary Description     Language, and gives an overview of some deployed RDF applications.     It also describes the content and purpose of other RDF specification documents.</p></blockquote>
<p>For more information about RDF read the <a title="RDF Primer" href="http://www.w3.org/TR/rdf-primer/">following primer</a> hosted on the W3C&#8217;s website.</p>
<h3>RDF Stores</h3>
<p>An RDF store allows for flexible definition of content which is a shortcoming of traditional relational database. An example of this would be the need to add a field within a database. In a relational database one would have to define a new column in the database table which is not usually achievable from a program given that most systems are not foolish to allow database changes from code. However, in an RDF data-store things are stored as nodes and relationships between nodes. Therefore to add a new field simply requires that a new node be created and an establishment of a relationship between the existing nodes and the new node. This flexibility is extremely handy in cases where the data being captured by a system changes often. Add to this the creation of an RDF schema or an ontology and one has the ability to semantically query the data you have within your data-store and are able to make use of meta-data in a way that can not be done within a relational database.</p>
<p>I was thinking that this type of flexibility would be of great benefit for a project we hope to be working on in the new year. Given Zephira&#8217;s backing of Mulgara I thought I would investigate whether it could be ready for prime time and the level of complexity required in using it.</p>
<h3>Mulgara</h3>
<p>Mulgara was initiated in 2006 and was a fork of the <a href="http://sourceforge.net/projects/kowari/">Kowari project</a> which died or became unsupported as of 2005. Their claim is that Mulgara has the following featured:</p>
<ul>
<li>Native RDF support</li>
<li>Multiple databases (models) per server</li>
<li>Simple SQL-like query language</li>
<li>Small footprint</li>
<li>Full text search functionality</li>
<li>Datatype support</li>
<li>Supports and tracks W3C Specifications and guidelines</li>
<li>Large storage capacity</li>
<li>Optimized for metadata storage and retrieval</li>
<li>Multi-processor support</li>
<li>Independently tuned for both 64-bit and 32-bit architectures</li>
<li>Low memory requirements</li>
<li>On-disk joins</li>
<li>Streamed query results</li>
</ul>
<p>The remainder of this post will cover my experiences with using Mulgara.</p>
<h3>Rough Start</h3>
<p>Unfortunately things didn&#8217;t start off well.  Though the downloading and installation of the Mulgara server was painless there were a number of issues that I came across that would deter many from considering the use of it in a production level system:</p>
<h4>Connectors are not included in the default download</h4>
<p>Currently there are only a few means of accessing Mulgara to perform CRUD operations:</p>
<ul>
<li>A <a title="JRDF" href="http://jrdf.sourceforge.net/">JRDF</a> connector</li>
<li>A <a title="Jena Semantic Framework" href="http://jena.sourceforge.net/">Jena</a> connector</li>
<li>Straight RMI (remote method invocation)</li>
</ul>
<p>In the default download from the Mulgara website, the JRDF and Jena connectors are not provided. After a lot of struggling, I resorted to downloading the source and building the server from scratch to ensure that the connectors were included within the jar files.</p>
<p>After getting to this stage I tried to follow the tutorials to try and connect and interact with the server. It seems that the documentation on the mulgara website is stale and that they have not been able to keep their tutorials up to date because the code provided in them does not work unless one uses an in-memory database which is not specified in the tutorial. I posted questions to the various user group email lists and receive no responses.</p>
<p>Once I figured those aspects out things went more smoothly though there is a significant amount of boiler plate code that needs to be created to allow for simple CRUD operations.</p>
<h3>Creating a connection to the server</h3>
<p>As noted above, one can not simply use JDBC to connect to the server but have an option of using the majority of the popular RDF creation frameworks such as JRDF, Jena or Sesame. I initially tired to use Jena but it seems that support for the <a title="JenaMulgara connector" href="http://jena.hpl.hp.com/wiki/JenaMulgara">JenaMulgara connector</a> died quite a while ago and the connector does not work for the latest version of Mulgara. Therefore, I moved on to the JRDF connector given that the Sesame connector is quite immature.</p>
<pre class="brush: java">
// Create the URI of the server
java.net.URI serverURI = new java.net.URI(&quot;rmi&quot;, hostName, &quot;/&quot; + serverName, null);</pre>
<p>// Create a new session factory, ensure that it&#8217;s local<br />
SessionFactory sessionFactory = SessionFactoryFinder.newSessionFactory(serverURI, false);</p>
<p>// Get a local JRDF Session (local)<br />
Object o = sessionFactory.newJRDFSession();<br />
org.mulgara.server.JRDFSession session = (JRDFSession) sessionFactory.newJRDFSession();<br />
[/sourcecode]</p>
<h3>Create a model/database</h3>
<p>The above give you a connection to the Mulgara server. To create a model/database requires the following section of code:</p>
<pre class="brush: java">
java.net.URI modelURI = new URI(&quot;rmi&quot;, hostName, &quot;/&quot; + serverName, graphName);
java.net.URI modelType = new URI(&quot;http://mulgara.org/mulgara#Model&quot;);
session.createModel(modelURI, modelType);
org.jrdf.graph.Graph graph = ClientGraph(createGraphProxy(modelURI, session));
</pre>
<h3>Create nodes/relationships</h3>
<p>In order to create RDF triples one needs to specify a subject, predicate and an object. There are three types of Java objects that can be used in this in building these relationships:</p>
<ol>
<li>BlankNode &#8211; a blank node that is a node used to group other relationships. A BlankNode can be used as either a subject or an object</li>
<li>Literal &#8211; a literal value of some sort (String, number, etc.) A literal can only be used as an object and not a subject or predicate</li>
<li>URIReference &#8211; a URIReference is used to define the predicate or relationship between the subject and the object.</li>
</ol>
<pre class="brush: java">
org.jrdf.graph.GraphElementFactory elementFactory = graph.getElementFactory();
org.jrdf.graph.BlankNode blanknode = elementFactory.createResource();
org.jrdf.graph.URIReference predicate = elementFactory.createResource(new URI());
org.jrdf.graph.Literal literal = elementFactory.createLiteral(value);
</pre>
<p>Lastly to create and insert a triple one simply needs to do the following:</p>
<pre class="brush: java">
org.jrdf.graph.Triple triple = elementFactory.createTriple(subject, predicate, object); // Create the triple object
graph.add(triple); // Store the triple in Mulgara server
</pre>
<p>For sample code on how to do everything from connect to query and delete see the end of this post.</p>
<h3>Performance</h3>
<p>I wrote a simple application that took the contents of a relational database and converted it into RDF and stored it in Mulgara. The database I was using was big but not huge, roughly &#8230;. All these numbers are based on running the application on a MacBook Pro with a 2Gh Core Duo processor and 2GB of RAM. I initially wrote the app to do inserts one at a time which was obviously inefficient but I wanted to test out the speed of an insert. Each insert of a single triple took roughly 0.17 seconds. In round 2 I started doing inserts in batch. In batch mode it seems that inserts took roughly 0.08 seconds per insert of a node. Both speeds are not particularly fast but like I said, this is running off my laptop so one can&#8217;t expect superb performance.</p>
<h3>Conclusion</h3>
<p>Given that RDF stores are competing with regular relational databases I would have hoped that there was an easier means of connecting to the database and that RM wasn&#8217;t being used under the hood. Given the push for RESTful interfaces within the semantic world, I was surprised to see that the Mulgara server does not have a RESTful interface which would relieve the need for RMI completely. Once I got over my dependence on the abhorrent tutorials and documentation things went fairly well though I have performance concerns when dealing with large volumes of data. The main area of concern for me is the requirement of RMI which has been well documented as being not very performant. It pains me to have to use RMI when running both the server and the application on the same server. What is nice is that the server is completely transactional and any failure results in a roll-back. The last concern I have is the fact that the user community/group around Mulgara does not seem to be very active or dedicated to ensuring that the product is supported to the extent that one can rely on getting up to date documentation and a steady stream of bug fixes.</p>
<h3>Sample Code</h3>
<p>The following code sample shows how to perform inserts, updates, deletes and selection.</p>
<pre class="brush: java">
package org.twopaths.jrdf;

import java.net.InetAddress;
import java.net.URI;
import java.net.URISyntaxException;
import java.net.UnknownHostException;

import org.jrdf.graph.BlankNode;
import org.jrdf.graph.Graph;
import org.jrdf.graph.GraphElementFactory;
import org.jrdf.graph.GraphElementFactoryException;
import org.jrdf.graph.GraphException;
import org.jrdf.graph.Literal;
import org.jrdf.graph.Triple;
import org.jrdf.graph.URIReference;
import org.jrdf.util.ClosableIterator;
import org.mulgara.client.jrdf.AbstractGraphFactory;
import org.mulgara.query.QueryException;
import org.mulgara.server.JRDFSession;
import org.mulgara.server.NonRemoteSessionException;
import org.mulgara.server.SessionFactory;
import org.mulgara.server.driver.SessionFactoryFinder;
import org.mulgara.server.driver.SessionFactoryFinderException;

public class Sandbox {
	public static void main(String[] args) {
                Graph graph = null;
		try {
			// Create the host name
			String hostname = InetAddress.getLocalHost().getCanonicalHostName();

			// Create the URI of the server
			URI serverURI = new URI(&quot;rmi&quot;, hostname, &quot;/&quot; + &quot;server1&quot;, null);

			// Create a new session factory, ensure that it&#039;s local
			SessionFactory sessionFactory = SessionFactoryFinder.newSessionFactory(serverURI, false);

			// Get a local JRDF Session (local)
			Object o = sessionFactory.newJRDFSession();
			System.out.println(o.getClass().getName());
//			LocalJRDFSession session = (LocalJRDFSession) sessionFactory.newJRDFSession();
			JRDFSession session = (JRDFSession) sessionFactory.newJRDFSession();

			//create a new Model
			URI modelURI = new URI(&quot;rmi&quot;, hostname, &quot;/&quot; + &quot;server1&quot;, &quot;exampleGraph&quot;);
			URI modelType = new URI(&quot;http://mulgara.org/mulgara#Model&quot;);
			session.createModel(modelURI, modelType);

			//create a JRDF Graph for the model
//			graph = new JRDFGraph(session, modelURI);
			graph = AbstractGraphFactory.createGraph(serverURI, modelURI);

			//get the Factory
			GraphElementFactory elementFactory = graph.getElementFactory();

			//create resources
			URIReference person = elementFactory.createResource(new URI(&quot;http://example.org/staffid#85740&quot;));
			BlankNode address = elementFactory.createResource();

			//create properties
			URIReference hasAddress = elementFactory.createResource(new URI(&quot;http://example.org/terms#address&quot;));
			URIReference hasStreet = elementFactory.createResource(new URI(&quot;http://example.org/terms#street&quot;));
			URIReference hasCity = elementFactory.createResource(new URI(&quot;http://example.org/terms#city&quot;));
			URIReference hasState = elementFactory.createResource(new URI(&quot;http://example.org/terms#state&quot;));
			URIReference hasPostCode = elementFactory.createResource(new URI(&quot;http://example.org/terms#postalCode&quot;));

			//create values
			Literal street = elementFactory.createLiteral(&quot;1501 Grant Avenue&quot;);
			Literal city = elementFactory.createLiteral(&quot;Bedford&quot;);
			Literal state = elementFactory.createLiteral(&quot;Massachusetts&quot;);
			Literal postCode = elementFactory.createLiteral(&quot;01730&quot;);

			//create statements
			Triple addressStatement = elementFactory.createTriple(person, hasAddress, address);
			Triple streetStatement = elementFactory.createTriple(address, hasStreet, street);
			Triple cityStatement = elementFactory.createTriple(address, hasCity, city);
			Triple stateStatement = elementFactory.createTriple(address, hasState, state);
			Triple postCodeStatement = elementFactory.createTriple(address, hasPostCode, postCode);

			// Add triples to graph
			graph.add(addressStatement);
			graph.add(streetStatement);
			graph.add(cityStatement);
			graph.add(stateStatement);
			graph.add(postCodeStatement);

			//get all Triples
			Triple findAll = elementFactory.createTriple(null, null, null);
			ClosableIterator allTriples = graph.find(findAll);
			while (allTriples.hasNext()) {
				System.out.println(allTriples.next().toString());
			}

			//search for address (as a subject)
			Triple findAddress = elementFactory.createTriple(address, null, null);
			ClosableIterator addressSubject = graph.find(findAddress);
			while (addressSubject.hasNext()) {
				System.out.println(addressSubject.next().toString());
			}

			//search for the city: &quot;Bedford&quot;
			Triple findCity = elementFactory.createTriple(null, null, city);
			ClosableIterator bedfordCity = graph.find(findCity);
			while (bedfordCity.hasNext()) {
				System.out.println(bedfordCity.next().toString());
			}

			//search for any subject that has an address
			Triple findAddresses = elementFactory.createTriple(null, hasAddress, null);
			ClosableIterator addresses = graph.find(findAddresses);
			while (addresses.hasNext()) {
				System.out.println(addresses.next().toString());
			}

		} catch (UnknownHostException uhe) {
			uhe.printStackTrace();
		} catch (URISyntaxException urise) {
			urise.printStackTrace();
		} catch (SessionFactoryFinderException sffe) {
			sffe.printStackTrace();
		} catch (NonRemoteSessionException nrse) {
			nrse.printStackTrace();
		} catch (QueryException qe) {
			qe.printStackTrace();
		} catch (GraphException ge) {
			ge.printStackTrace();
		} catch (GraphElementFactoryException gefe) {
			gefe.printStackTrace();
		} finally {
                        try {
                            graph.close();
                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                }
		System.out.println(&quot;DONE&quot;);
	}
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2008/12/30/mulgara-rdf-store/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>QCon SF 2008 &#8211; Technology Of Interest</title>
		<link>http://www.2paths.com/2008/11/26/qcon-sf-2008-technology-of-interest/</link>
		<comments>http://www.2paths.com/2008/11/26/qcon-sf-2008-technology-of-interest/#comments</comments>
		<pubDate>Wed, 26 Nov 2008 19:33:24 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[couchdb]]></category>
		<category><![CDATA[drizzle]]></category>
		<category><![CDATA[gearman]]></category>
		<category><![CDATA[memcached]]></category>
		<category><![CDATA[memcachedb]]></category>
		<category><![CDATA[mogilefs]]></category>
		<category><![CDATA[qcon]]></category>

		<guid isPermaLink="false">http://www.2paths.com/?p=299</guid>
		<description><![CDATA[Part of the purpose of attending QCon was to get in synch with the tech community in regards to technology they have experimented with and have found use for. The following is a list of technology/frameworks/etc of interest that I took away from the conference:

CouchDB &#8211; Document oriented database

Memcached &#8211; Distributed Hashmap
MemcacheDB &#8211; A database [...]]]></description>
			<content:encoded><![CDATA[<p>Part of the purpose of attending QCon was to get in synch with the tech community in regards to technology they have experimented with and have found use for. The following is a list of technology/frameworks/etc of interest that I took away from the conference:</p>
<ul>
<li><a href="http://incubator.apache.org/couchdb/">CouchDB</a> &#8211; Document oriented database<a href="http://incubator.apache.org/couchdb/"><br />
</a></li>
<li><a href="http://www.danga.com/memcached/">Memcached</a> &#8211; Distributed Hashmap</li>
<li><a href="http://memcachedb.org/">MemcacheDB</a> &#8211; A database built as on Memcached and BerkleyDB<a href="http://memcachedb.org/"><br />
</a></li>
<li><a href="https://launchpad.net/drizzle">Drizzle</a> &#8211; Lightweight SQL Database for the cloud/web<a href="https://launchpad.net/drizzle"></a></li>
<li><a href="http://www.danga.com/gearman/">GearMan</a> &#8211; Distributed processing framework<a href="http://www.danga.com/gearman/"><br />
</a></li>
<li><a href="http://www.danga.com/mogilefs/">MogileFS</a> &#8211; Distributed file system<a href="http://www.danga.com/mogilefs/"><br />
</a></li>
</ul>
<p>I intend to spend some time investigating each of these pieces of technology in the coming months with the hope of making use of them when and where applicable.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2008/11/26/qcon-sf-2008-technology-of-interest/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>QCon SF 2008 &#8211; Day 3</title>
		<link>http://www.2paths.com/2008/11/24/qcon-sf-2008-day-3/</link>
		<comments>http://www.2paths.com/2008/11/24/qcon-sf-2008-day-3/#comments</comments>
		<pubDate>Mon, 24 Nov 2008 22:08:47 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[conclusion]]></category>
		<category><![CDATA[couchdb]]></category>
		<category><![CDATA[day 3]]></category>
		<category><![CDATA[digg]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[qcon]]></category>
		<category><![CDATA[san francisco]]></category>

		<guid isPermaLink="false">http://www.2paths.com/?p=289</guid>
		<description><![CDATA[I was pretty excited about the &#8220;Architectures you&#8217;ve always wondered about&#8221; stream for day 3 of the conference. It started off horribly. Dan Pritchett of MySpace was up first and basically took the change to showoff his personally little project that allows events from their thousands of servers to notify them what&#8217;s going on within [...]]]></description>
			<content:encoded><![CDATA[<p>I was pretty excited about the &#8220;Architectures you&#8217;ve always wondered about&#8221; stream for day 3 of the conference. It started off horribly. Dan Pritchett of MySpace was up first and basically took the change to showoff his personally little project that allows events from their thousands of servers to notify them what&#8217;s going on within them in a centralized manner. How he figured people were interested in hearing about the log files of MySpace.com baffles me&#8230;not only were people rolling their eyes but some were falling asleep (myself included). He really didn&#8217;t do anything to help the tech community have a better view of MySpace. He finished his session in 25 minutes and then opened the floor for questions&#8230;crickets for the most part. The moderator tried valiantly to get people to ask questions but what the hell do you want us to ask about log files? There went 30 minutes of my life I will never get back.</p>
<p>Joe Stump from Digg.com was up next and his session was awesome. He talked about the evolution of their architecture from the point he joined to what they are planning on rolling out early next year along with some of the bumps they&#8217;ve encountered. MySQL&#8217;s scalability had major issue so they moved to IDDB and are considering using MemcacheDB which is MemcacheD mixed with Berkley DB.  Joe pointed out their use of <a href="http://www.danga.com/mogilefs/" target="_blank">MogileFS</a> for the management of images on <a href="http://digg.com/images" target="_blank">digg images</a> which I will be taking a look at for our current project. Joe, you rock, maybe you can give some pointers to Dan&#8230;then again hopefully MySpace will send somone with a clue next year.</p>
<p>Aditya Agarwal from facebook was great. He talked about all the different aspects of facebook and the use of PHP/MySQL being the underlying architecture and how for them it . They tend to use MySQL simply as a key value store and profile content is randomly scattered across their thousands of servers. They use LAMP for the most part but have created a framework that allows them to use services written in other languages within their application.</p>
<p>The discovery of the day was <a href="http://incubator.apache.org/couchdb/" target="_blank">CouchDB</a>. One of the lead developers/designers was sitting at the same table as me during the Digg session. Tim Bray was quite excited about the technology as well having mentioned it in his keynote on day 2. I attended one of two sessions about that covered what it&#8217;s about from a high level. It has it&#8217;s roots in Lotus Notes database design given that it was created by one of the people who worked on Lotus Notes in the past. It has some very cool features:</p>
<ul>
<li>RESTful so it doesn&#8217;t require any connectors</li>
<li>Stores all content in JSON</li>
<li>Querying the database is done via javascript functions for performing map reduce like functionality</li>
<li>Very efficient and performs versioning and synchronization when distributed</li>
</ul>
<p>Jan made some good arguments was to why it has some strengths that traditional RDB&#8217;s don&#8217;t such as no need for schema management, etc. It&#8217;s a technology that I plan on taking a serious look at and may possibly use in the future.</p>
<p>In conclusion, QCon was a great experience. The hosts tried their best to make sure things were as tech focused as possible. There is a need to ensure that speakers cater their presentations to the audience and keep to the subject of the presentation. The lagging economy did have an effect on the conference, there were 200+ people that had been registered that didn&#8217;t attend (you could see all of the uncollected badges at the registration desk) and if the economy improves I think that next year will be even better. It&#8217;s truly a conference for developers by developers which I truly enjoyed. It was great to meet several of the authors of my favourite tech books. I would have preferred more opportunities to network with the people at the conference, an overall guest list would have helped with that or a formal networking event given that a large percentage of the nerds there were stereotypical introverted nerds who didn&#8217;t like having a strange person walk up to them and make conversation. Maybe I came off as sleazy or creepy or a threat to homeland security <img src='http://www.2paths.com/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> . Regardless, I hope to attend next year.</p>
<p><strong>Update:</strong> This blog post has been <a title="QConSF 2008 Summary" href="http://www.infoq.com/articles/qconsf-2008-summary" target="_blank">featured on the infoq.com website</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2008/11/24/qcon-sf-2008-day-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>QCon SF 2008 &#8211; Day 2</title>
		<link>http://www.2paths.com/2008/11/20/qcon-sf-2008-day-2/</link>
		<comments>http://www.2paths.com/2008/11/20/qcon-sf-2008-day-2/#comments</comments>
		<pubDate>Fri, 21 Nov 2008 06:17:16 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[Bob Lee]]></category>
		<category><![CDATA[concurrency]]></category>
		<category><![CDATA[open-source]]></category>
		<category><![CDATA[qcon]]></category>
		<category><![CDATA[Rod Johnson]]></category>
		<category><![CDATA[scalability]]></category>
		<category><![CDATA[Tim Bray]]></category>

		<guid isPermaLink="false">http://www.2paths.com/?p=270</guid>
		<description><![CDATA[As I mentioned in my post yesterday, I was expecting more from the conference today than I experienced yesterday. What I&#8217;m expecting at a tech conference is to talk tech such that the speaker presents their expertise or lessons learned. I didn&#8217;t get much of that yesterday but today was another story. We kicked off [...]]]></description>
			<content:encoded><![CDATA[<p>As I mentioned in <a title="QCon SF 2008 - Day 1" href="http://www.2paths.com/2008/11/19/qcon-sf-2008-day-1/">my post yesterday</a>, I was expecting more from the conference today than I experienced yesterday. What I&#8217;m expecting at a tech conference is to talk tech such that the speaker presents their expertise or lessons learned. I didn&#8217;t get much of that yesterday but today was another story. We kicked off the day with a talk by Tim Bray about the database and data access and the evolution of the database and what one needs to keep in mind when designing a system that uses them. He introduced techniques such as the use of MemCacheD that are used by all the big name web apps as a distributed cache of data retrieved from the database. He talked about Drizzle that is a light-weight database that removes all the stuff that no-one ever uses in major databases that slow down the database server and lastly column oriented databases such as Googles BigTable. All in all he provided some great insight on tech in the database layer which warrant further investigation.</p>
<p>Next was a panel discussion on scalability and emphasized the need to understand the update requirements of the system you are designing. They proposed that there are two types of design:</p>
<ol>
<li>Design for scalability up front</li>
<li>Design for simplicity and deal with bottle-necks as they pop-up</li>
</ol>
<p>I&#8217;m not sure if I&#8217;d ever go with option #2 but they made a good point about how many architects design for things to be bullet proof when one really doesn&#8217;t really have the requirement for it to be bulletproof. I fully agree with that principle. There was mention of several websites such as flickr that have made some very good decisions on having that happy medium</p>
<p>The next talk was about the Java 7 concurrency library and the need for making things concurrent given the fact that processors are not getting faster but are moving towards multiple cores. Therefore, we as developers need to ensure that we ensure our applications are making everything run in parallel (concurrent). One might feel that if you are running a website that since every request is being handled by a different thread your job is complete. However, how many of us receive enough requests 24*7 to keep all the cores busy. Therefore, we need to make sure that applications is decomposed to the extent that we are doing everything concurrently that can be done concurrently to ensure maximum CPU utilization and decreased response time and therefore an improved user experience.</p>
<p>The last session of interest was a panel discussion on the affect that the open-source movement has and on java. It was the consensus that the open-source saved java from mediocrity and J2EE from certain death. The focus of the discussion moved to open-source during the current economic condition. The panel felt that slashed budgets within most companies would allow open-source to flourish, at least those that are well established. The last point of interest was about how they make a decision within their companies about what to open-source and what to charge for. Bob Lee of Google said that they open-source low level tools and frameworks and charge for things they build upon them. Others such as a guy from MuleSource and Rod Johnson from Spring Source said that they charge when people are already likely to be paying other vendors for their product/services. For example, the quote of the night was from Rod Johnson</p>
<blockquote><p>If you use MySql, tomcat and Apache as your application stack then you can use spring for free and that&#8217;s great. However, if you are using Oracle Rack for a database, BEA Weblogic for an app server then you have no right to complain when we charge you for a Spring to Oracle RACK connector.</p></blockquote>
<p>The point being if you&#8217;re already spending money then why shouldn&#8217;t we ask to be paid for what we are offering.</p>
<p>Tomorrow has another set of very interesting streams and discussions. I&#8217;m looking forward to the stream called &#8220;Architectures you&#8217;ve always wondered about.&#8221;</p>
<p><strong>Update:</strong> This blog post has been <a title="QConSF 2008 Summary" href="http://www.infoq.com/articles/qconsf-2008-summary" target="_blank">featured on the infoq.com website</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2008/11/20/qcon-sf-2008-day-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>QCon SF 2008 &#8211; Day 1</title>
		<link>http://www.2paths.com/2008/11/19/qcon-sf-2008-day-1/</link>
		<comments>http://www.2paths.com/2008/11/19/qcon-sf-2008-day-1/#comments</comments>
		<pubDate>Wed, 19 Nov 2008 18:37:51 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Conferences]]></category>
		<category><![CDATA[beck]]></category>
		<category><![CDATA[cloud computing]]></category>
		<category><![CDATA[day 1]]></category>
		<category><![CDATA[fowler]]></category>
		<category><![CDATA[hohpe]]></category>
		<category><![CDATA[qcon]]></category>
		<category><![CDATA[san francisco]]></category>

		<guid isPermaLink="false">http://www.2paths.com/?p=265</guid>
		<description><![CDATA[Coming into this conference I have some very high expectations due to the calibre of the speakers at the conference. That being said, it started off with a bang, the opening keynote featured Martin Fowler and Rebecca Parsons of ThoughtWorks on the topic of how an architect and the agile process are complementary and not adversaries as many many think.]]></description>
			<content:encoded><![CDATA[<p>Coming into this conference I have some very high expectations due to the caliber of the speakers at the conference. That being said, it started off with a bang, the opening keynote featured Martin Fowler and Rebecca Parsons of ThoughtWorks on the topic of how an architect and the agile process are complementary and not adversaries as many many think. The talk was more aimed at the large corporations that have various hierarchies of architects and how companies set up the architect position to fail by asking them to &#8220;oversee&#8221; the development of several projects at once. This task leads to the architect being &#8220;pushed&#8221; away from the daily activities of each of the team and the application development itself. In doing this the architects tend to recoil to an ivory tower and make all sorts of decrees and standards that don&#8217;t get implemented. They made an interesting point about how companies push experience away from the development by not being willing to pay a developer with 10+ years of experience to code. If you want to be paid for your experience you have to become an architect and you can&#8217;t touch the code.</p>
<div class="mceTemp mceIEcenter">
<dl>
<dt><a href="http://www.2paths.com/wp-content/uploads/2008/11/conf_hall.png"><img class="size-full wp-image-272" src="http://www.2paths.com/wp-content/uploads/2008/11/conf_hall.png" alt="QCon Keynote Hall" width="500" height="154" /></a></dt>
</dl>
</div>
<p>It made me think about our policy here at 2paths, everyone codes to some extent. As an architect myself, I may not code large volumes of code but I do develop code. This makes my opinion relevant when we make design decisions. Ironically, their suggestion to fix the situation was just this, architects must be part of the dev team and need to look at the code developed/checked-in on a daily/weekly basis and in the ideal case code.</p>
<p style="center;"><a href="http://www.2paths.com/wp-content/uploads/2008/11/dsc01267.jpg"><img class="size-medium wp-image-282 aligncenter" src="http://www.2paths.com/wp-content/uploads/2008/11/dsc01267.jpg" alt="" width="300" height="225" /></a></p>
<p>The great thing about thing about this conference is the big name people they have as speakers. The speaker to attendee ratio is 1:4. Every big name author of the various tech books are here:</p>
<ul>
<li>Martin Fowler</li>
<li>Rod Johnson</li>
<li>Kent Beck</li>
<li>Geagor Hohpe</li>
<li>etc&#8230;</li>
</ul>
<p>These guys are treated link rock-stars here. At the keynote, the guy sitting next to me was giddy to be in the same room as Martin Fowler .The thing is that just because they are smart and wrote great books it doesn&#8217;t mean that they give good presentations/talks. Point in case, Kent Beck. I went to two talked by Kent and though he&#8217;s funny and witty the talks were very convoluted and definitely not to the point.  That being said, both sessions were packed&#8230;people were sitting on the floor&#8230;it was surreal.</p>
<p style="center;"><a href="http://www.2paths.com/wp-content/uploads/2008/11/dsc012521.jpg"><img class="alignnone size-medium wp-image-281" src="http://www.2paths.com/wp-content/uploads/2008/11/dsc012521.jpg" alt="" width="225" height="300" /></a><a href="http://www.2paths.com/wp-content/uploads/2008/11/dsc01269.jpg"> <img class="alignnone size-medium wp-image-283" src="http://www.2paths.com/wp-content/uploads/2008/11/dsc01269.jpg" alt="" width="225" height="300" /></a></p>
<p>Gregor Hohpe gave a great discussion about principles underlying the creation of appkications that use a &#8220;cloud&#8221; architecture. Cloud computing is basically the use of services from other organizations/companies within your application. Examples of this would be to have an application that uses google maps or Amazon&#8217;s EC2, etc. The main take-aways from his talk where:</p>
<ol>
<li>Learn to live with uncertainty<br />
The services you are using are not controlled by you so you have to design for the component being down/unavailable because your customers don&#8217;t care what services you are using under the covers.</li>
<li>Keep things simple and small</li>
<li>Learn to properly design for asynchronous</li>
<li>Embrace the new programming model</li>
<li>Resist applying traditional patterns</li>
</ol>
<p>Point 1 was the most relevant and leads to a redefinition of ACID transactions</p>
<p>A &#8211; Associative<br />
Service calls/requests need to be associative. Ie (A + B) + C = A + (B + C)</p>
<p>C &#8211; Commutative<br />
Service calls/requests need to be commutative. ie A + B = B + A</p>
<p>I &#8211; Idempotent<br />
One needs to design services to be able to handle a request or response that is received more than once due to re-sends due to delay, etc&#8230;</p>
<p>D &#8211; Distributed<br />
He admitted that this was a filler since applications built on the cloud are distributed by their very nature.</p>
<p>All in all day one didn&#8217;t meet my expectations entirely though there were flashes of what I expected but I have hope that tomorrow and Friday meet will be better.</p>
<p><strong>Update:</strong> This blog post has been <a title="QConSF 2008 Summary" href="http://www.infoq.com/articles/qconsf-2008-summary" target="_blank">featured on the infoq.com website</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2008/11/19/qcon-sf-2008-day-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Old habits die hard</title>
		<link>http://www.2paths.com/2008/11/06/old-habits-die-hard/</link>
		<comments>http://www.2paths.com/2008/11/06/old-habits-die-hard/#comments</comments>
		<pubDate>Thu, 06 Nov 2008 22:13:10 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Blog]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[certification]]></category>
		<category><![CDATA[ken schwaber]]></category>
		<category><![CDATA[old habits]]></category>
		<category><![CDATA[scrum]]></category>

		<guid isPermaLink="false">http://www.2paths.com/?p=249</guid>
		<description><![CDATA[I had the privilege of attending a "Certified ScrumMaster" course being taught by Ken Schwaber who is one of the co-founders of the Scrum process. I've been using the scrum process for a few years and felt fairly confident in my ability to "live" the principles of Scrum but all it took was one exercise to smack me back to reality.]]></description>
			<content:encoded><![CDATA[<p>I had the privilege of attending a &#8220;Certified ScrumMaster&#8221; course being taught by Ken Schwaber who is one of the co-founders of the Scrum process. I&#8217;ve been using the scrum process for a few years and felt fairly confident in my ability to &#8220;live&#8221; the principles of Scrum but all it took was one exercise to smack me back to reality.</p>
<p>Scrum is about providing transparency in the development process to all involved parties. What is emphasized is that you can&#8217;t really know how long it will take you to do something because the software development process is such that we can only provide best guesses. However, clients being clients, they want predictability. Unfortunately, we have given our clients the false expectation that we can provide predictability when we all know that there is no way we can. One can never know exactly what will take place in a project. Features, requirements, human issues, etc. are never 100% known so how can you say you can get this done by a specific date guaranteed unless you pad your date like crazy and even then there is no guarantee?</p>
<p>Therefore, the point of agile as a whole is to educate and inform all parties involved (especially the client) about this reality. We can reasonably say that it is feasible that we could complete the project by a given date but we can&#8217;t give a 100% guarantee and therefore, if the application is mission critical to be complete by that date that they the client should make contingency plans just in case we ca not make the given date.</p>
<p>Anyway, we went through an exercise that took place 1.5 days into the 2 day training course. We had gone over all the various concepts behind scrum, Ken had guided us through several exercises about the various concepts within scrum. So we began to do a mock project. We received our requirements, created highlevel stories, provided storypoints and person-day estimates for each of the stories and then proceeded to determine if we could get the work done by march 31st which was the hard deadline for the mock client.</p>
<p>In the end we were asked to present our estimates team by team to our mock client. This being a test, Ken started with the first team who put into practice what we had learned over the past 1.5 days.</p>
<blockquote><p>Given what we know about the application and the requirements provided, we think we could possibly provide you with a complete solution for you by the march 31st deadline. We will need to hire a few more developers to ensure we meet the deadline but it seems possible.</p></blockquote>
<p>Ken began to push them by asking questions and prodding us to commit to completeing by March 31st, 2009. Many resisted and other gave absolutely no resistance. However, in the end, the vast majority of the teams relented and committed to getting it done by the given date. At which point Ken took the shovel we used to dig our graves with and hit us over the head with it. Paraphrasing Kens words:</p>
<blockquote><p>Even after one and a half days of learning and making fun of people that made stupid mistakes with just a little bit of pushing you all forgot everything! You went back to making promises that you know you can&#8217;t keep.</p></blockquote>
<p>The room got scilent, others who stupidly choose to defend their actions were hit over the head again and again until they accepted their error. Ken himself admitted that even ave 16 years of using and teaching Scrum, he still falls back on these deeply ingrained bad habits and that it takes concious effort not to regress to old habits.</p>
<p>The course was the best course outside of university that I have ever taken. If you plan on doing a scrum master certification course, make sure you take it with him regardless of the cost/location. I learned an emmence amount from him and have found a larger passion on promoting the principles behind scrum to the people within our company and abbroad.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2008/11/06/old-habits-die-hard/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>You know you&#8217;re about to make it big when&#8230;</title>
		<link>http://www.2paths.com/2008/06/19/you-know-youre-about-to-make-it-big-when/</link>
		<comments>http://www.2paths.com/2008/06/19/you-know-youre-about-to-make-it-big-when/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 19:04:16 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[funny]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://blog.2paths.com/you-know-youre-about-to-make-it-big-when.html</guid>
		<description><![CDATA[the defense industry uses your technology. The funny thing about the semantic web conference was the vendor showcases. I went to a few, where a given semantic technology toolkit vendor would showcase how one can make use of their tool or tools. The thing that struck me was that each and every one that I [...]]]></description>
			<content:encoded><![CDATA[<p>the defense industry uses your technology. The funny thing about the semantic web conference was the vendor showcases. I went to a few, where a given semantic technology toolkit vendor would showcase how one can make use of their tool or tools. The thing that struck me was that each and every one that I attended focused on how they were helping the US department of defense catch terrorists or extract evidence of links to terrorism.</p>
<p>There was one vendor showcase in particular that I got a great laugh out of. I was sitting in the very front row, right in front of the presenter who was about 4 feet in front of me. Before I go on I should give you some information about myself, I am a Muslim who looks very Muslim&#8230;ie the full beard, middle eastern looking, etc&#8230; I, like all the other attendees has a name tag on with my full name written in large print which is visible from several feel away. So he starts his talk about the tool that they are selling. Mid way through the presentation he starts a demonstration on how the tool was used to prove that two people who claimed to not know each other did in fact know each other and that they were smuggling funds between them.</p>
<p>As he went through this presentation, he paused, saw me and my name and became really uncomfortable. You see, the names of the people he was talking about had the exact same last name as me and I don&#8217;t think he really saw me sitting there until that point. At that point he tried really really hard to come up with another example to demonstrate the tool but for the life of him he could not. I had to fight the urge not to stir the pot to see how much more uncomfortable I could make him feel.</p>
<p>Regardless, you know this technology is going to take off when the US department of defense is interested in it. It seems that they are dumping large sums of money into it&#8217;s development and use.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2008/06/19/you-know-youre-about-to-make-it-big-when/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Semantic Web Conference 2008</title>
		<link>http://www.2paths.com/2008/06/19/semantic-web-conference-2008/</link>
		<comments>http://www.2paths.com/2008/06/19/semantic-web-conference-2008/#comments</comments>
		<pubDate>Thu, 19 Jun 2008 18:48:34 +0000</pubDate>
		<dc:creator>Omar</dc:creator>
				<category><![CDATA[Conferences]]></category>
		<category><![CDATA[semantic web]]></category>

		<guid isPermaLink="false">http://blog.2paths.com/semantic-web-conference-2008.html</guid>
		<description><![CDATA[I had the chance to attend the semantic web conference that what held in San Jose May 18th to 22nd. Before I go any further, there may be many of you who are scratching your head thinking &#8220;What is the semantic web?&#8221; Dictionary.com defines semantics as the following:


Linguistics.

the study of meaning.
the study of linguistic development [...]]]></description>
			<content:encoded><![CDATA[<p>I had the chance to attend the semantic web conference that what held in San Jose May 18th to 22nd. Before I go any further, there may be many of you who are scratching your head thinking &#8220;What is the semantic web?&#8221; Dictionary.com defines semantics as the following:</p>
<ol>
<li>
Linguistics.</p>
<ul>
<li>the study of meaning.</li>
<li>the study of linguistic development by classifying and examining changes in meaning and form.</li>
</ul>
</li>
<li>Also called significs. the branch of semiotics dealing with the relations between signs and what they denote.</li>
<li>the meaning, or an interpretation of the meaning, of a word, sign, sentence, etc.: Let&#8217;s not argue about semantics.</li>
</ol>
<p>What we are talking about here is devising a means for a computer to be able to understand the meaning of language. Once it can understand language. When you enter a question into Google such as &#8220;What is the best way to exercise your lower abs?&#8221; google goes through and analyzes the phrase/sentence you entered into the search box and breaks it up into keywords. It then uses these keywords to search their repository for the information within a web page that <em>best matches</em> the <em>keywords</em> you entered. Therefore, in the example I provided above, it will go through and extract words like &#8216;best&#8217;, &#8216;exercises&#8217;, &#8216;lower&#8217; and &#8216;abs&#8217; and use these keywords to search their repository and return matches that have a <em>high likelihood</em> of containing the information you&#8217;re looking for. Google has little to no idea what the sentance &#8220;What is the best way to exercise your lower abs?&#8221; means.</p>
<p>What semantics hopes to archive is to extract information from the internet such that a computer or set of computers could perform a search understanding exactly what &#8220;What is the best way to exercise your lower abs?&#8221; means and would therefore return search results matching exactly what you are looking for. In the above example it&#8217;s fairly clear what I&#8217;m looking for but what if I entered something like &#8220;paris hilton&#8221;? What exactly am I looking for? Am I interested in the Hilton hotel located in Paris, France or am I interested in information about the trash TV celebrity? By now I&#8217;m sure you get the idea.</p>
<p>Needless to say, getting a computer to understand meaning isn&#8217;t an easy task and therefore the community that is hoping to make semantics a reality holds two conferences a year on the topic to gather like minds and to showcase how the technology has progressed along with any real world uses of the technology.  The conference has been held for the past four years and has grown steadily each and every year. This year there were over 1100 attendees which is remarkable given the downturn in the US economy.  The remainder of this post will outline the technologies of interest and some real implementations of the technology.</p>
<h3>Technologies and Specifications</h3>
<p><a href="http://www.w3.org/2004/01/rdxh/spec"><b>Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</b></a><br />
GRDDL is a W3C specification to allow people to make their content semantically accessible. For example, one can have a website containing all sorts of data within it that may or may not be structured. The GRDDL specification allows one to embed a link to an XSLT transform that will take the XHTML of the webpage as an input and output a set of RDF triples representing the content of your page.</p>
<p><a href="http://www.openrdf.org"><b>Sesame</b></a><br />
Sesame is an open source RDF framework with support for RDF Schema inferencing and querying. It can be deployed on top of a variety of storage systems (relational databases, in-memory, file systems, keyword indexers, etc.), and offers  tools to developers to leverage the power of RDF and RDF Schema.</p>
<p><a href="http://www.openrdf.org"><b>ELMO</b></a><br />
Elmo is a toolkit for developing Semantic Web applications using Sesame. Elmo wraps Sesame, providing a dedicated API for a number of well known web ontologies including Dublin Core, RSS and FOAF. The dedicated API makes it easier to work with RDF data for the supported ontologies. Elmo also offers a set of tools related to the supported ontologies, including an RDF crawler, a FOAF smusher and a FOAF validator.</p>
<p><a href="http://jena.sourceforge.net/"><b>JENA</b></a><br />
JENA is a Semantic web framework that can be used to access RDF, RDFS, OWL and SPARQL and a rule based inference engine. Included within the framework is an RDF API, an OWL API and a SPARQL query engine.</p>
<p><a href="http://www.topazproject.org"><b>Topaz</b></a><br />
Topaz is a RDF persistence and querying service framework.</p>
<p><a href="http://www.1060.org/"><b>NetKernel</b></a><br />
1060 NetKernel is a resource oriented microkernel and RESTful application server created from the convergence and unification of the powerful fundamental concepts found in the World Wide Web and Unix. NetKernel is being used by a large number of major semantic applications.</p>
<p><a href="http://www.purl.org"><b>Persistant URLs</b></a><br />
A PURL is a Persistent Uniform Resource Locator. Functionally, a PURL is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. The client can then complete the URL transaction in the normal fashion. In Web parlance, this is a standard HTTP redirect.</p>
<h3>Implementations</h3>
<p>The following are a list of applications that have been created using semantic technologies in part or as a whole. Examples of implementations such as these provide direction as to where this technology can be used to solve real world problems.</p>
<p><a href="http://www.powerset.com"><b>Powerset</b></a><br />
Powerset is a website that has sucked in the content from <a href="http://www.wikipedia.org">Wikipedia</a> and <a href="http://www.freebase.com">Freebase</a>. They have taken this content and have  translated it into a very very very large RDF store using NetKernel (or so I believe). Therefore, using their interface, one can search the data using semantic technology. If one is doing research on a given topic and has confidence in the data contained within Wikipedia, then using powerset to do your research would be of great use. Search result and content related to a particular item are easily retrivable using powerset.  For example, if I were researching the pouring of concrete and were to enter &#8220;How do you pour concrete&#8221; you would see a result of all content that matches the information you are looking for understanding as best they can the meaning of what you are asking. Upon selecting the first result you would not only be given information about concrete and a slew of related content.</p>
<p><a href="http://www.twine.com"><b>Twine</b></a><br />
Twine is a social network focused on sharing your interests. In other words, creating a grouping of people interested in the same stuff. It is currently in private beta but hopes to go into public beta sometime in the late summer/early fall. The concept revolves around the idea that if you are interested in a certain topic, twine can offer information/articles/etc that fit within that area of interest. This is where the use of semantic technology fits in. They use semantic technology to search for and filter content related to what you are discussing and so on within your group. I was extremely disappointed with the fact that they made no attempt to showcase their technology by providing a walk-through of their application, but rather spent the entire time talking high-level about the application and showing a stupid video created about the website by some pre-teen. A real waste of an opportunity.</p>
<p><a href="http://www.garlik.com"><b>Garlik</b></a><br />
Garlik is the darling of the semantic world. They are an example of where semantic technology can be used to solve real world problems. In this case, the use semantic technology to help protect your identity online. Currently they offer their services within the UK. Information is sketchy as to how exactly they do this using semantic technologies. I was really disappointed that they sent a non-techy to speak about their product but that seemed to be the running theme of the these types of companies within the conference, keep your secrets guarded regardless of whether it would further the proliferation of the technology.</p>
<p><a href="http://remix.zepheira.com/"><b>Zepheira Remix</b></a><br />
Now this tool is something that follows along the lines of Garlik in that it provides a solution to a real world problem. In this case they are focusing on sharing data. They have created a tool that significantly reduces the complexity of using semantics to share data. Remix allows you to import two files and merge, query, generate content between them with no need to understand anything about semantics or even the underlying technology. Remix features an AJAX front end that allows users to massage the data as they see fit and extract a meaningful summary of the data in an HTML format. Zepheira&#8217;s goal is to advocate for the use of semantic technologies while encouraging others to attempt to reduce the barriers of entry to harnessing the power of the technology. ie lets make it easier for people to use semantics through the use of frameworks and applications.</p>
<p>All in all, the conference was great. I have to admit, before attending the conference I was a skeptic about the technology but after the conference I think there are some great possibilities for this technology in the very near future. I look forward to attending next year. The only thing that I believe would improve the conference is an increase in details about the technology and/or implementation/usage of the technology rather than just skimming over things at a very very high level.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.2paths.com/2008/06/19/semantic-web-conference-2008/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

