Sunday, July 10, 2011

OrientDB: Connecting, writing and reading data

In my last post, I made a quick comparison of the performance of OrientDB, MongoDB and CouchDB. Since then, I received some requests for source code. Readers seemed most interested in OrientDB. This might be explained by the fact that there already is a lot of information available on MongoDB. The attention for MongoDB is understandable. It is a great and efficient product.

If you like MongoDB, you might also like OrientDB. It shares many of the characteristics and adds extra features, like a friendly REST interface and a GraphDB. As a first step, giving some extra attention to OrientDB, here is the source code for the last post. It is a bit crude, but gives a good impression of the simplicity of working with OrientDB in its most basic form.

This code reads a comma (semicolon) separated text file (CSV), breaks it up without taking care of escaped characters (semicolons within the text fields) and store all fields in OrientDB. It starts reading the first line of the CSV to serve as field names. It translates the field names from PascalCasing to camelCasing and removes a vendor-specific prefix. This step might need some changes for your own CSV file. I cannot provide my test data for legal reasons.

After that, it reads every line from the file and stores all the values under the field names from the first row. If there is an extra semicolon on that line, there will be errors in the data. In a more realistic example, such errors could be detected by comparing the split line with the number of field names from the first line. The code in this example should be used with a clean CSV file to prevent these errors.

package eu.adriaandejonge.orient;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;

import com.orientechnologies.orient.core.
db.document.ODatabaseDocumentTx;
import com.orientechnologies.orient.core.
record.impl.ODocument;

public class WriteOrient {

public static void main(String[] args) {
try {
long start = System.currentTimeMillis();

File file =
new File("E:/Development/uitjes.csv");

FileReader reader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(reader);

String firstLine = bufferedReader.readLine() + "";
String[] columns = firstLine.split(";");
int length = columns.length;
for (int i = 0; i < length; i++) {
columns[i] = columns[i].replaceAll("w3s_", "");
columns[i] =
columns[i].substring(0, 1).toLowerCase() +
columns[i].substring(1);
}
ODatabaseDocumentTx db =
new ODatabaseDocumentTx("local:/tmp/demo").create();
int cnt = 0;
String line;
while ((line = bufferedReader.readLine()) != null) {
cnt++;
if(cnt % 100 == 0) System.out.println("cnt=" + cnt);
ODocument uitje = new ODocument(db, "uitje");
String[] values = line.split(";");

for (int i = 0; i < length; i++) {
if (i < values.length)
uitje.field(columns[i], values[i]);
}
uitje.save();
}
db.close();
System.out.println("DONE in " +
(System.currentTimeMillis() - start) + "ms");

} catch (Exception e) {
e.printStackTrace();
}
}

}


To estimate the amount of code needed to communicate with OrientDB, you should focus on the bold lines. The rest of the code only serves to read the CSV files. The bold code is comparable to code for similar NoSQL databases, like MongoDB and CouchDB. Even though there is no standardized API for these databases yet, you do not have to worry about lock-in too much. As long as you isolate the database specific code, you can easily migrate to a different datastore as long as it shares the same characteristics. OrientDB, MongoDB and CouchDB can all be characterized as NoSQL document storages that are particularly well suited for storing JSON documents with nested key-value pairs.

Reading data from OrientDB is somewhat similar. You can do a lot more than demonstrated in the code example. Querying and reading specific fields to name the most basic examples. What the code demonstrates, is that if you simply want to serve JSON documents to the outside world for client side processing, you don't need to write a lot of code.

package eu.adriaandejonge.orient;

import java.io.FileWriter;
import java.io.IOException;

import com.orientechnologies.orient.core.
db.document.ODatabaseDocumentTx;
import com.orientechnologies.orient.core.
record.impl.ODocument;

public class ReadOrient {

public static void main(String[] args) {
try {
tryOrient();
} catch (Exception e) {
e.printStackTrace();
}
}

private static void tryOrient() throws IOException {
long startTime = System.currentTimeMillis();
ODatabaseDocumentTx db =
new ODatabaseDocumentTx("local:/tmp/demo")
.open("admin", "admin");
readCollection(db);

System.out.println("DONE in " +
(System.currentTimeMillis() - startTime) + "ms");
}

private static void readCollection(ODatabaseDocumentTx db)
throws IOException {
int count = 0;
FileWriter fileWriter = new FileWriter("E:/orient.txt");

for(ODocument doc : db.browseClass("uitje")) {
count++;
fileWriter.write(doc.toJSON() + "\n");
}
System.out.println("# " + count);
}
}


To test these examples, you need to set up a local instance of OrientDB with a default set up. Also, you need to copy the JAR files called orientdb-core.jar and orient-commons.jar to your /lib folder. When connecting to a remote server, you require two additional JARs, orientdb-client.jar and orientdb-enterprise.jar. More details on the libraries required to connect can be found in the OrientDB documentation.

This is just a small first step towards an actual application. Let me know what you think of it. Suggestions for improvement and follow-up posts are welcome.

10 comments:

vinod_garje said...
This comment has been removed by the author.
Luca Garulli said...

Hi,
nice post! By the way if you want to go faster on massive insertion just follow these simple 2 suggestions:

http://code.google.com/p/orient/wiki/PerformanceTuningDocument#Massive_Insertion

Adriaan de Jonge said...

@Luca: thanks! That is even a better mechanism to import lots of data. This code was actually only written for a simple performance test to compare with CouchDB and MongoDB. I'll experiment with OrientDB some more. There are still a lot more features to try.

@vinod: I must admit that I do not completely understand your question. If these exceptions occur while running the code, you should check if all the right JAR files are on the class path. It seems like these exceptions occur at some other point. I think you get better help posting this to an OrientDB forum. For example http://groups.google.com/group/orient-database

Sorry I could not be of better help. I myself am still only experimenting with OrientDB and do not yet know the product enough right now.

vinod_garje said...

Thank you ! Adriaan de Jonge i got way to run my database on OrientDB studio by making remote while connecting the server and right now i use the studio for insert , update , delete record select record without join just like other database server!!!
and its very easssy by using SQL query............

vinod_garje said...

It is possible that we use OrientDB with jsp,servlet to access data just like other database MySQL.....
then how? ( Example or idea of that)

Luca Garulli said...

It would be interesting to compare MongoDB and CouchDB performance with OrientDB using this enhanced source (+3 lines of code):

public static void main(String[] args) {
try {
long start = System.currentTimeMillis();

File file = new File("E:/Development/uitjes.csv");

FileReader reader = new FileReader(file);
BufferedReader bufferedReader = new BufferedReader(reader);

String firstLine = bufferedReader.readLine() + "";
String[] columns = firstLine.split(";");
int length = columns.length;
for (int i = 0; i < length; i++) {
columns[i] = columns[i].replaceAll("w3s_", "");
columns[i] = columns[i].substring(0, 1).toLowerCase() + columns[i].substring(1);
}

ODatabaseDocumentTx db = new ODatabaseDocumentTx("local:/tmp/demo").create();
db.declareIntent(new OIntentMassiveInsert());

int cnt = 0;
String line;

ODocument uitje = new ODocument(db, "uitje");

while ((line = bufferedReader.readLine()) != null) {
cnt++;
if (cnt % 100 == 0)
System.out.println("cnt=" + cnt);

uitje.reset();
uitje.setClassName("uitje");

String[] values = line.split(";");

for (int i = 0; i < length; i++) {
if (i < values.length)
uitje.field(columns[i], values[i]);
}
uitje.save();
}
db.close();
System.out.println("DONE in " + (System.currentTimeMillis() - start) + "ms");

} catch (Exception e) {
e.printStackTrace();
}
}

Adriaan de Jonge said...

Hi Luca,

Thank you for your suggestion. I hate to say this, but must be honest, it turns out that this code takes longer in my test setup. Roughly 6200ms.

Should I upgrade OrientDB for this to work?

Adriaan.

Luca Garulli said...

Hi Adriaan,
sounds strange. Can you share the CSV file to execute it locally?

Adriaan de Jonge said...

Hi Luca,

If you allow me a few days, I will send you a representative data set to reproduce this without sending things I am not allowed to send.

I just sent you a LinkedIn invitation in order to exchange contact information.

Adriaan.

orientDB-beginer said...

It's a pretty example for OrientDB usage. Maybe can You add a sample for a sequence of query ?