Apache TinkerPop™ is a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP).
Gremlin is the graph traversal language of Apache TinkerPop. Gremlin is a functional, data-flow language that enables users to succinctly express complex traversals on (or queries of) their application's property graph. Every Gremlin traversal is composed of a sequence of (potentially nested) steps. A step performs an atomic operation on the data stream. Every step is either a map-step (transforming the objects in the stream), a filter-step (removing objects from the stream), or a sideEffect-step (computing statistics about the stream). The Gremlin step library extends on these 3-fundamental operations to provide users a rich collection of steps that they can compose in order to ask any conceivable question they may have of their data for Gremlin is Turing Complete.
Gremlin was designed according to the "write once, run anywhere"-philosophy. This means that not only can all TinkerPop-enabled graph systems execute Gremlin traversals, but also, every Gremlin traversal can be evaluated as either a real-time database query or as a batch analytics query. The former is known as an online transactional process (OLTP) and the latter as an online analytics process (OLAP). This universality is made possible by the Gremlin traversal machine. This distributed, graph-based virtual machine understands how to coordinate the execution of a multi-machine graph traversal. Moreover, not only can the execution either be OLTP or OLAP, it is also possible for certain subsets of a traversal to execute OLTP while others via OLAP. The benefit is that the user does not need to learn both a database query language and a domain-specific BigData analytics language (e.g. Spark DSL, MapReduce, etc.). Gremlin is all that is required to build a graph-based application because the Gremlin traversal machine will handle the rest.
g.V().has("name","gremlin").as("a").
out("created").in("created").
where(neq("a")).
in("manages").
groupCount().by("name")
A Gremlin traversal can be written in either an imperative (procedural) manner, a declarative (descriptive) manner, or in a hybrid manner containing both imperative and declarative aspects. An imperative Gremlin traversal tells the traversers how to proceed at each step in the traversal. For instance, the imperative traversal in the first box first places a traverser at the vertex denoting Gremlin. That traverser then splits itself across all of Gremlin's collaborators that are not Gremlin himself. Next, the traversers walk to the managers of those collaborators to ultimately be grouped into a manager name count distribution. This traversal is imperative in that it tells the traversers to "go here and then go there" in an explicit, procedural manner.
A declarative Gremlin traversal does not tell the traversers the order in which to execute their walk, but instead, allows each traverser to select a pattern to execute from a collection of (potentially nested) patterns. The declarative traversal in the second box yields the same result as the imperative traversal above. However, the declarative traversal has the added benefit that it leverages not only a compile-time query planner (like imperative traversals), but also a runtime query planner that chooses which traversal pattern to execute next based on the historic statistics of each pattern -- favoring those patterns which tend to reduce/filter the most data.
g.V().match(
as("a").has("name","gremlin"),
as("a").out("created").as("b"),
as("b").in("created").as("c"),
as("c").in("manages").as("d"),
where("a",neq("c"))).
select("d").
groupCount().by("name")
Classic database query languages, like SQL, were conceived as being fundamentally different from the programming languages that would ultimately use them in a production setting. For this reason, classical databases require the developer to code both in their native programming language as well as in the database's respective query language. An argument can be made that the difference between "query languages" and "programming languages" are not as great as we are taught to believe. Gremlin unifies this divide because traversals can be written in any programming language that supports function composition and nesting (which every major programming language supports). In this way, the user's Gremlin traversals are written along side their application code and benefit from the advantages afforded by the host language and its tooling (e.g. type checking, syntax highlighting, dot completion, etc.). Various Gremlin language variants exist including: Gremlin-Java, Gremlin-Groovy, Gremlin-Python, Gremlin-Scala, etc.
public Class GremlinTinkerPopExample{
public void run(String name, String property){
Graph graph = GraphFactory.open(...);
GraphTraversalSource g = traversal().withEmbedded(graph);
double avg = g.V().has("name",name).
out("knows").out("created").
values(property).mean().next();
System.out.printIn("Average rating :" +avg);
}
}
public Class Sql3dbcExample {
public void run(String name, String property){
Connection connection = DriverManager.getConnection(...)
Statement statement = connection.createStatement();
ResultSet result = statement.executeQuery(
"SELECT AVG(pr." + property + ") as AVERAGE FROM PERSONS p1" +
"INNER JOIN KNOWS K ON k. person1 pl.id +
"INNER JOIN PERSONS P2 ON p2.id k.person2 +
"INNER JOIN CREATED C ON c.person = p2.id " +
"INNER JOIN PROJECTS Pr ON pr.id c.project
"WHERE p.name " + name + "');
System.out.println("Average rating"+result.next().getDouble("AVERAGE"));
}
}
Graph graph = GraphFactory.open(...);
GraphTraversalSource g;
g = traversal().withEmbedded(graph); //local OLTP
g = traversal().withRemote(DriverRemoteConnection.using("Localhost",8182)); //remote
g = traversal().withEmbedded(graph).withComputer(SparkGraphComputer.class); //distributed OLAP