Running RethinkDB with Python C++ Protobuf Drivers on OSX
I ran across RethinkDB a few days ago on O'Reilly Radar and decided to give it a shot, since I'm a big fan of MongoDB and Riak, and RethinkDB seems to take the best features of both (slick json data structures with easy administration) and mash them together.
I figured moving from Flask and Mongo to Flask and RethinkDB would be pretty easy, so I ran
brew install rethinkdb pip install rethinkdb rethinkdb
and was off to the races. I was having a great time with the repl and RethinkDB's awesome built-in admin interface and stuffing things into the DB left and right.
However, query performance in my python app kinda sucked. I wanted to fix that.
You want protoc, not Python
I did some reading and learned that RethinkDB's python driver uses protobuf. If you haven't heard of it before, protobuf or "protocol buffers" is a lower-level data transport library by Google written in C++. In simplistic terms you can think of it as a binary json stream, which is space efficient when sent over the network.
Riak also uses protobuf. When I started using Riak, Google's C/C++ version of protobuf didn't support Python 3, so the performance was abysmal. So you had to use Python 2.7 if you wanted the best performance. (Google's protobuf still doesn't support Python 3 afaik, but you can try one from OpenX if you're adventurous.)
In any case, RethinkDB performs best with the C/C++ version of the python protobuf module. By default -- and you'll notice this in your console spam when you run
pip install rethinkdb if you pay close attention -- the C/C++ library is missing so RethinkDB uses a python implementation as a fallback.
If you're curious, you can find out which one RethinkDB is using via:
python import rethinkdb as r r.protobuf_implementation
This will output either
cpp to indicate whether you're running native code (you want it to say
cpp for the best performance). Mine said
python. So of course, the next step was to fix that and install the C/C++ version of the protobuf module.
Wait, what's this C/C++ thing mean?
Python modules are written in C. Protobuf is written in C++. So first you have to compile protobuf and then compile a C python module wrapper for the C++ library. All of this is included in the protobuf tarball and aside from wiring it all together there's not much involved.
Protobuf Version 2.4.1
At the time of writing, the latest protobuf library is 2.5.0. This is what gets installed via
pip. The version of RethinkDB I installed via homebrew is 1.9.0 (1.10.0 actually came out two days ago but it's not in the brew repo yet).
After trying and failing I learned that protobuf 2.5.0 is ostensibly not compatible with Rethink 1.9.0 on OSX. From some of the github issues I was able to infer that RethinkDB is using 2.4.1, apparently dictated by what's in Ubuntu's repositories right now. I don't know if the build-from-source docs make note of this but the python drivers page certainly doesn't.
No problem. I can download 2.4.1 instead.
Now's the easy part. We'll compile and install protobuf 2.4.1 from source.
tar -xjf ~/Downloads/protobuf-2.4.1.tar.bz2 cd protobuf-2.4.1 ./configure make make install
Normally I'd say run
make test, but protobuf doesn't have one. I suspect the python module tests are used to test the library instead.
At this point you should be able to run
protoc --version and get
libprotoc 2.4.1 in response.
Note: If you already
make installed the wrong version of protobuf from source like I did, you can remove it with
make uninstall. Also one point I also was getting segfaults from python so I ran
rm /usr/local/lib/libproto* and then ran
make install again from the 2.4.1 directory.
Building the Native Python Module
Next, the python module. But first, we need to set an environment variable.
This indicates that we want to use the native C/C++ version of the protobuf python module, which is faster than the pure python version. If you don't have this environment variable set,
setup.py will build the pure python version, which is slow and not what you want.
Since the RethinkDB python module had already installed a more recent version of the protobuf module, I took this opportunity to remove it.
pip uninstall protobuf -y
Next you should
protobuf-2.4.1/python. If you followed directly from the example above you'll just drop into
python from where you are.
cd python python setup.py build python setup.py test python setup.py install --prefix /usr/local
--prefix /usr/local because I'm using python 2.7.5 from homebrew, not system python on OSX. If you try to install the module to the wrong place, setuptools will complain so you probably can't break this step accidentally. At this point,
pip list | grep protobuf should show you
Reinstall the RethinkDB Module
Finally, we need to reinstall the RethinkDB python module so it can be made aware if the C/C++ python module that we've installed. I don't know exactly why this is necessary but it does some linking step when you
pip install, so just having the proper protobuf module seems to be insufficient.
pip uninstall rethinkdb -y pip install rethinkdb
You can tell if it worked correctly because won't see a message indicating that it's using the pure python fallback. But to be more scientific we can use the same trick from earlier:
python import rethinkdb as r r.protobuf_implementation
It should now say
'cpp'. And you're done! Have fun adding RethinkDB to your python app.