Brool brool (n.) : a low roar; a deep murmur or humming

Tokyo Cabinet API for Clojure

 |  tokyo nonsql coding cabinet clojure kvstore

I’ve been playing with Tokyo Cabinet and Clojure for a bit, and while I will go on about both of them in another blog post (or not), I have to mention that Clojure is such a well designed language that it’s a pleasure to play with. It has much of the same intrinsic power as Haskell, but in a fashion that might be more approachable for people coming from Python or Ruby.

At any rate, I made a small, thin layer around the Tokyo Cabinet API, and put it on Github. Another thin wrapper can be found at this blog.

Copy of the README is below (the ultimate in lazy!).

Introduction

This is a simple interface to the Tokyo Cabinet libraries. Tokyo Cabinet is a very cool, very high performing key-value store. This library supports table mode, which essentially means that arbitrary hashmaps can be stored in the cabinet.

Note that this is appropriate for local storage only – if you’re looking to share a Tokyo Cabinet to multiple computers, you actually want Tokyo Tyrant.

Basic Usage

The with-cabinet call creates/opens a cabinet and allows the use of the various access routines within the scope of the call. For example, here’s how to create a cabinet with three entries.

(ns user (:use tokyo-cabinet)) ;; bring into our namespace (with-cabinet { :filename "test.tokyo" :mode (+ OWRITER OCREAT) } (doseq [[name val] [["1" "one"] ["2" "two"] ["3" "three"]]] (put-value name val)))

This creates a Tokyo Cabinet hash table, which allows one value per key. Now query an entry:

(with-cabinet { :filename "test.tokyo" :mode OREADER } (get-value "1")) "one"

Tables

A table in Tokyo Cabinet can be used to store arbitrary hash maps. For example:

(def params { :filename "test-table.tokyo" :mode (+ OWRITER OCREAT) :type :table } ) (with-cabinet params (put-value nil { :name "John Doe" :hobbies "rowing fishing skiing" :age 28 :gender "M" }) (put-value nil { :name "Melissa Swift" :hobbies "soccer tennis books" :age 33 :gender "F"}) (put-value nil { :name "Tom Swift" :hobbies "inventing exploring" :gender "M" }) (put-value nil { :name "Harry Potter" :hobbies "magic quidditch flying" :gender "M" :age 9 }))

Queries

Queries can be run, and you can use (hint) to take a look at how the query is being performed:

; show a hint and all rows matching (defn showrows [query] (let [showhint (atom false)] (with-query-results row query (when (compare-and-set! showhint false true) (println "Query: " query) (println "Hint: " (hint)) (println "Results:")) (println row))) (println)) (with-cabinet params (showrows [[:age ">=" 30]]) (showrows [[:hobbies "any-token" "soccer"]]))

Leads to the following output:

Query: [[:age >= 30]] Hint: scanning the whole table result set size: 1 leaving the natural order Results: {:gender F, :hobbies soccer tennis books, :name Melissa Swift, :age 33} Query: [[:hobbies any-token soccer]] Hint: scanning the whole table result set size: 1 leaving the natural order Results: {:gender F, :hobbies soccer tennis books, :name Melissa Swift, :age 33}

Indexes

Indexes can be added with create-index (and removed with delete-index), which help optimize particular queries.

The different index types:

With some optional specifiers that can be added / ored in:

Running the queries again, with indexes:

; indexes are persistent (with-cabinet params (create-index :hobbies INDEX-TOKEN) (create-index :age INDEX-DECIMAL)) ; try the queries again with the indexes in place (with-cabinet params (showrows [[:age ">=" 30]]) (showrows [[:hobbies "any-token" "soccer"]]))

Gets the following hint:

Query: [[:age >= 30]] Hint: using an index: ":age" asc (NUMGT/NUMGE) result set size: 1 leaving the natural order Results: {:gender F, :hobbies soccer tennis books, :name Melissa Swift, :age 33} Query: [[:hobbies any-token soccer]] Hint: using an index: ":hobbies" inverted (STROR) token occurrence: "soccer" 1 result set size: 1 leaving the natural order Results: {:gender F, :hobbies soccer tennis books, :name Melissa Swift, :age 33}

Optional Search Parameters

You can further control what’s fetched by using a number of optional specifiers in the query:

For example:

(with-cabinet params (with-query-results row [] (println (:name row)))) John Doe Melissa Swift Tom Swift Harry Potter (with-cabinet params (with-query-results row [[:sort :name]] (println (:name row)))) Harry Potter John Doe Melissa Swift Tom Swift (with-cabinet params (with-query-results row [[:sort :name] [:order SORT-TEXT-DESC]] (println (:name row)))) Tom Swift Melissa Swift John Doe Harry Potter (with-cabinet params (with-query-results row [[:sort :name] [:order SORT-TEXT-DESC] [:limit 1]] (println (:name row)))) Tom Swift

Lower Level

Depending on your application, it might not be convenient to have to bracket everything with with-cabinet, since that means an open and close of the cabinet. You can also use the lower level open-cabinet and close-cabinet calls, along with the “with” statement. This is also an easier way to use it at the command line. For example:

(def test-database (open-cabinet { :filename "test-open.tokyo" :mode (+ OWRITER OCREAT) })) (with test-database (put-value "1" "one")) (with test-database (get-value "1")) (with test-database (print (primary-keys))) (close-cabinet test-database)

Miscellaneous

Use (primary-keys) to return a lazy list of primary keys.

(with-cabinet { :filename "test.tokyo" :mode (+ OWRITER OCREATE) :type :table } (print (primary-keys)))

Discussion

Comments are moderated whenever I remember that I have a blog.

cmdrbatguano | 2009-08-30 23:51:05
Does this work with remote (tyrant) databases? Can you do full-text search with Dystopia using this library? Cool stuff, btw.
Reply
tim | 2009-09-01 21:23:54
@cmdrbatguano: Tokyo Tyrant is a different protocol. I've been playing around with a wrapper that maps Tokyo Tyrant to Clojure -- will post that on github sooner or later.
Reply
GoodJob | 2009-09-29 14:31:01
Good job. I recently integrated Tokyo Tyrant into a Clojure project. What worked for me: in-memory hash db (master) that replicates to multiple slaves on multiply physical devices. The master is used for writes, and the slaves have read requests load balanced between them. On top of this I have a clustered cache system that keeps in sync with writes and handles a good % of reads to offload the slaves even more.
Reply
Add a comment