Sphinx Search with PostgreSQL


Posted

in

,

While I don’t plan on moving away from Apache Solr for my searching needs any time soon, Jeremy Zawodny’s post on Sphinx at craigslist made me want to take a closer look. Sphinx works with MySQL, PostgreSQL, and XML input as data sources, but MySQL seems to be the best documented. I’m a PostgreSQL guy so I ran in to a few hiccups along the way. These instructions, based on instructions on the Sphinx wiki, got me up and running on Ubuntu Server 8.10.

Install build toolchain:

$ sudo aptitude install build-essential checkinstall

Install Postgres:

$ sudo aptitude install postgresql postgresql-client \\
postgresql-client-common postgresql-contrib \\
postgresql-server-dev-8.3

Get Sphinx source:

$ wget http://www.sphinxsearch.com/downloads/sphinx-0.9.8.1.tar.gz
$ tar xzvf sphinx-0.9.8.1.tar.gz
$ cd sphinx-0.9.8.1

Configure and make:

$ ./configure --without-mysql --with-pgsql \\
--with-pgsql-includes=/usr/include/postgresql/ \\
--with-pgsql-lib=/usr/lib/postgresql/8.3/lib/
$ make

Run checkinstall:

$ mkdir /usr/local/var
$ sudo checkinstall

Sphinx is now installed in /usr/local. Check out /usr/local/etc/ for configuration info.

Create something to index:

$ createdb -U postgres test
$ psql -U postgres test
test=# create table test (id integer primary key not null, text text);
test=# insert into test (text) values ('Hello, World!');
test=# insert into test (text) values ('This is a test.');
test=# insert into test (text) values ('I have another thing to test.');
test=# -- A user with a password is required.
test=# create user foo with password 'bar';
test=# alter table test owner to foo;
test=# \\q

Configure sphinx (replace nano with your editor of choice):

$ cd /usr/local/etc
$ sudo cp sphinx-min.conf.dist sphinx.conf
$ sudo nano sphinx.conf

These values worked for me. I left configuration for indexer and searchd unchanged:

source src1
{
  type = pgsql
  sql_host = localhost
  sql_user = foo
  sql_pass = bar
  sql_db = test
  sql_port = 5432
  sql_query = select id, text from test
  sql_query_info = SELECT * from test WHERE id=$id
}

index test1
{
  source = src1
  path = /var/data/test1
  docinfo = extern
  charset_type = utf-8
}

Reindex:

$ sudo mkdir /var/data
$ sudo indexer --all

Run searchd:

$ sudo searchd

Play:

$ search world

Sphinx 0.9.8.1-release (r1533)
Copyright (c) 2001-2008, Andrew Aksyonoff

using config file '/usr/local/etc/sphinx.conf'...
index 'test1': query 'world ': returned 1 matches of 1 total in 0.000 sec

displaying matches:
1. document=1, weight=1

words:
1. 'world': 1 documents, 1 hits

Use Python:

cd sphinx-0.9.8.1/api
python
>>> import sphinxapi, pprint
>>> c = sphinxapi.SphinxClient()
>>> q = c.Query('world')
>>> pprint.pprint(q)
{'attrs': [],
 'error': '',
 'fields': ['text'],
 'matches': [{'attrs': {}, 'id': 1, 'weight': 1}],
 'status': 0,
 'time': '0.000',
 'total': 1,
 'total_found': 1,
 'warning': '',
 'words': [{'docs': 1, 'hits': 1, 'word': 'world'}]}

If you add new data and want to reindex, make sure you use the --rotate flag:

sudo indexer --rotate --all

This is an extremely quick and dirty installation designed to give me a sandbox
to play with. For production use you would want to run as a non-privileged user
and would probably want to have an /etc/init.d script for searchd or run it
behind supervised. If you’re looking to experiment with Sphinx and MySQL,
there should be plenty of documentation out there to get you started.