December 2012 GraphDB Meetup
We-[:THANK {for:'meetup space!'}]->CustomInk
We-[:THANK {for:'pizza'}]->Ikanow
We-[:THANK {for:'beer'}]->NeoTechnologies
Agenda
6:30 - Pizza/Beer + Networking, etc.
7:00 - Announcements
7:05 - Craig Vitter: Ikanow Infinit.e + Dev API Intro
7:30 - Short Break
7:40 - Wes Freeman: Quick Cypher Review,
Demo: Quick Scala app to import Infinit.e data to Neo4j,
Example Cypher queries
Next Meetup
January ~10th! (even though we're aiming for bimonthly)
Probably somewhere in NOVA again
Why Neo4j?
- Optimized for highly connected data
- If you're doing nested self joins in your SQL, you probably need Neo4j
- Find connections between records
- Real-time queries for recommendations (as opposed to batch processing Hadoop-style)
- Hierarchical (Tree, ACL, etc.) data
- Graph ... data? Yeah, it's good at that, too.
- Proven Lucene-based indexing as default for the pluggable index provider system, and full-text search features as a bonus
Neo4j: CYPHER
- Declarative query language (can also do updates)
- Easy to learn
- Mostly unique to Neo4j
- Still new, so not entirely optimized, but improving rapidly!
- Try milestone releases for best Cypher experience (usually!)
Cypher QUERIES: BROAD STROKES
- START at starting points (often with index lookups)
- MATCH a graph pattern with a symbolic syntax
- Use WHERE to filter the resulting matches
- Use WITH to compute intermediate results for your next query part
- RETURN the [aggregated] data you want, with aliases
- LIMIT and SKIP the number of results you want
- ORDER BY works just like it does in SQL
"It all starts with the START" --MIchael Hunger

CYPHER: INSERTING/UpdatinG
- CREATE lets you create with the human-friendly Geoff format (and other formats):
CREATE (me {name:'Wes'}), me-[:is_friends_with {since:2012}]->(you {name:'Friend_01'}); - DELETE lets you delete nodes, relationships
- SET lets you update properties
- You can build queries and use predicates while updating and deleting, similar to SQL
CYPHER: There's more
- Functions (filter, extract, reduce, math, strings, etc.)
- Aggregation (min, max, avg, collect, percentiles, etc.)
- Cypher tutorial: http://www.neo4j.org/learn/cypher
- General Cypher docs:
http://docs.neo4j.org/chunked/snapshot/cypher-query-lang.html - Console! The gist/repl js-fiddle style webapp. Build your sample graph and share it, to show others exactly what you mean. http://console.neo4j.org
- Webinars about Cypher--several available here:
http://watch.neo4j.org/
- Cheatsheet: http://neo4j.org/resources/cypher
TransferRing Data from Infinit.e
- https://github.com/wfreeman/infinit.e-neo4j-demo
- Mostly written in a few hours at MoDevHack
- In Scala using AnormCypher (shameless plug)
http://anormcypher.org/
- Uses the Document Query from Infinit.e's REST API
- Entities are nodes, associations are relationships
- Uses the indexName of the entities as unique identifiers (the same entity found in multiple documents will have the same indexName, usually)
- Uses the verb and verb category together as the relationship type: "current_career", etc.
Some minor annoyances
- AnormCypher exceptions aren't descriptive (I'll fix that!)
- Cypher doesn't allow parameterized relationship types; I had to concatenate the query string (ugly!)
- No easy way to use CREATE UNIQUE with index lookups; I usually just broke it up into two Cypher calls
- JSON parsing in Scala requires specifying a schema or dealing with weird Map conversions--haven't found a great way to do it, yet.
- Index support in Cypher needs improving--coming soon!
Finally, to the cool stuff: DATA
Cleaning Up...
// delete nodes only connected to documents start n=node(*) match doc-[r:references]->n where length(n--()) = 1 delete n,r;
// delete unconnected nodes. who cares about them? start n=node(*) where not(n--()) delete n;
Some Example Queries
// get a feel for our data (what kind of verbs were extracted) start entity=node(*) match entity-[association]->entity2 where type(association) <> "references" return entity.name, type(association), entity2.name;
// find the entities most referred to by documents
start entity=node:node_auto_index('indexName:*')
match doc-[:references]->entity
return count(doc) as docCount, entity
order by docCount desc
limit 10;
// find who endorsed which candidates...
start candidate_career=node:node_auto_index('name:candidate')
match candidate_career<-[:current_career]-candidate<-[:political_endorsement]-endorser
return distinct candidate, endorser;More Example Queries
// find competitors of the competitors... if they exist...
start company=node:node_auto_index('name:*')
match company-[:company_competitor]->competitor-[:company_competitor]->competitors_of_competitor
with company, competitor.name as competitor, collect(competitors_of_competitor.name) as comps
return company.name, collect(competitor), collect(comps);
- Wes Freeman
- 1,007