A presentation at RubyConf Portugal 2014 in in Braga, Portugal by Caleb Hearth
Multi-table Full Text Search Postgres with a somewhat nonstandard approach to search Often search will be done with external service we’ll use the tool already responsible for storing and finding data
@calebthompson calebthompson.io * Tuesday Hug * Close your laptops, there’s a lot of code here.s
I don’t want to jump into my credentials the point of credentials is to get you to listen, and you’re already here. So I’ll have more about me at the end but I do have one claim-to-fame
My birthday, so you know I’m an expert.
I'm going to talk to you about
A real-life feature. iterate, explore options, and optimize
Full Text Search What is it, how can it help us
Multi-table Full Text Search Postgres with You probably could have guessed that from the title
Views database views
Performance performance implications of using this approach and ways to mitigate them
Materialize d Views a somewhat recent feature
Gems that can help us out
Other Options for search in your application
Let's search for Articles We're going to look at a classic example
Article.where( "body LIKE %?%" , query)
Exact substrings That'll get us articles that contain an exact string
but that's not very useful
Article.where("body ILIKE %?%", query) That's better, at least we ignore case
Search on title What about the title?
Article.where(
"body ILIKE %?% OR title ILIKE %?%" , query, query) I'm sure you've seen this
Search by Author’s name Okay, now we want to search by author name too
Scopes def self.search(query)
joins(:user) .where(<<-SQL, query, query, query) articles.body ILIKE %?% OR articles.title ILIKE %?% OR users.name ILIKE %?% SQL
end We could get crazy with scopes
Query object class Search def self.for(query) [
Article.where("title ILIKE %?%", query), Article.where("body ILIKE %?%", query), Article.joins(:user) .where("users.name ILIKE %?%", query),
].flatten.uniq end end We could try a query object
Poor results And the results really aren't all that great Only matches for case-insensitive substrings
Full Text Search Enter query for rows based on natural language searching
natural language searching
remove stop words and, the, also, they, would
eliminate casing “Factory” and “factory” should return the same results
synonyms hunger and hungrily
stemming “try” and “trying”, “tries”, and “tried” will be recorded in the index under the single concept word "tri."
SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Example of making that same query
SELECT DISTINCT(id) FROM ( SELECT id AS id,
title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text Grab the text we want to search on operator
FROM articles UNION SELECT articles.id AS id,
authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Grab the text we want to search on operator
SELECT DISTINCT(id) FROM ( SELECT
id AS id, title || ' ' || body AS the_text
FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors Also get the id for the article
title || ' ' || body AS the_text
FROM articles UNION SELECT
articles.id AS id, authors.name AS the_text
FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Also get the id for the article
SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT uniq as with query
That’s a lot of SQL. Where do we put it
We could throw that into our query object class Search def self.for(query) <<-SQL SQL end end
Scopes def self.search(query) where(<<-SQL, query) SQL end But SQL doesn’t belong in .rb files
Postgres has our answer
Views
Partial queries
Stored in the Database
Can be SELECT e d from
Return set of columns Compose the set of fields to be searched with views
Multi- source Can be pulled from multiple tables
Complete the query then perform a WHERE or other query on it when you need the results
CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; unique users, all rows, active_at column from activities active over the past week
CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; Syntax to create a view
CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; Unique users
CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id)
users.*,
activities.created_at AS active_at
FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; All rows from user
CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*,
activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; plus an active_at row from activities’ created_at
CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; only those active over the past week
Querying a view looks just like querying a table SELECT * FROM users_with_recent_activity WHERE id IN (1,2,3…) ORDER BY active_at DESC
ActiveRecord can use a view as its backend
So we can create a fairly vanilla model
class UserWithRecentActivity
< ActiveRecord::Base
def self.table_name
"users_with_recent_activity"
end
def readonly?
true
end
end
class
UserWithRecentActivity
< ActiveRecord::Base
def self.table_name "users_with_recent_activity" end def readonly?
class UserWithRecentActivity
< ActiveRecord::Base
def self.table_name
"users_with_recent_activity"
end
def readonly?
true
end
end
def self.table_name "users_with_recent_activity" end def readonly? true end end Not necessary; some views can be written/deleted
Will it work with full text search?
⚲ Textacular
Takes care of the Full Text Search portions of queries
Search over every text field on a record
Variant search options like basic_search, fuzzy search, and advanced search
Game .basic_search('Sonic') most generally useful for a search
Game .basic_search (
title: 'Mario', system: 'Nintendo' ) next simplest useful thing
So let's go back and look at the search we wrote
SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION
SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id;
Now our search result is really simple in the Rails side
class Search < ActiveRecord::Base
include Textacular belongs_to :article end
class Search < ActiveRecord::Base
include Textacular belongs_to :article end
Search. basic_search("Sandi") .map(&:article)
Search. basic_search("Sandi") . map(&:article)
class SearchResult
include Enumerable
def initialize( query )
@results = Search.basic_search(query)
end def each @results.each end end
CREATE Migration We create the view in a migration
class CreateUsersWithRecentActivity < ActiveRecord::Migration def up
ActiveRecord::Base.connection.execute(
"--The CREATE VIEW sql from before" )
end def down
ActiveRecord::Base.connection.execute( 'DROP VIEW users_with_recent_activity' )
end end
How resistant to change is it?
Let's find out - time for some feature creep!
Articles whose comments match the query
Searching on • Article (title, body) • Author (name) • Comments (body)
SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION
SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION
SELECT article_id, body AS the_text FROM comments
articles.id AS article_id,
authors.name AS the_text
FROM authors JOIN articles ON authors.id = articles.author_id
UNION
SELECT article_id, body AS the_text FROM comments
UPDATE Migration Look at an update view migration
class CreateUsersWithRecentActivity < ActiveRecord::Migration def up
ActiveRecord::Base.connection.execute(
"CREATE OR REPLACE VIEW searches AS --The sql from the new view" )
end def down
ActiveRecord::Base.connection.execute( "CREATE OR REPLACE VIEW searches AS --The sql from the old view" )
end end the down migration has to have the entire SQL of the previous view
UPDATE Migration Unfortunately, can’t (always) update can’t remove columns
class CreateUsersWithRecentActivity < ActiveRecord::Migration def up
ActiveRecord::Base.connection.execute(
"DROP VIEW searches; CREATE VIEW searches AS --The sql from the new view" )
end def down
ActiveRecord::Base.connection.execute( "DROP VIEW searches; CREATE VIEW searches AS --The sql from the old view" )
end end DROP first
Can’t be dumped to db/schema.rb
config.database_format = :structure db/structure.sql
Scenic Adds view methods to migrations allows views to be dumped to db/schema.rb attempts to make views easier to work with
[show how scenic gets migrations from view.sql definitions] show how scenic gets migrations from view.sql definitions syntax highlighting in .sql
[show that it has versions] show that it has versions
Model generator $ rails generate scenic:model search
create app/models/search.rb create db/views/searches_v01.sql create db/migrate/..._create_searches.rb Can be a lot to remember - generators
$ cat app/views/searches_v01.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id
View generator $ rails generate scenic:view search create db/views/searches_v02.sql create db/migrate/..._update_searches_to_version_2.rb re-running scenic:view creates new version
$ cat app/views/searches_v02.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id And the new sql file includes the previous definition
$ cat app/views/searches_v02.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION SELECT article_id, body AS the_text FROM comments Because most times you’ll have a small change to make and that’s easier.
Performance
This query is pretty slow
it has to search across three tables to get us all of the results we need
Full Text Search ~400 ms ILIKE ~40 ms Full Text Search vs ILIKE
Views can’t be indexed
Underlying tables can
Add
indices
There are several types of indices
btree Usually you'll have used btree
GIN & GIST The ones that apply to full text search are GIN and GIST Generalized Inverted iNdex Generalized Inverted Search Tree
GIN index lookups are about three times faster than GiST
GIN indexes take about three times longer to build than GiST
GIN indexes are moderately slower to update than GiST indexes in some cases can be much slower depending on database settings (fast-update)
GIN indexes are two-to-three times larger than GiST indexes
Use GIN • You don’t have hundreds of thousands of rows • You’re not concerned about longer writes blocking the db • You’re adding the index late in the game • You don’t care about disk space • You want fast lookups (read-heavy)
Use GiST • Very large tables • You have performance concerns • For some reason disk space is important • Your table is write-heavy
add_index :articles, :body, using: :gin add_index :articles, :title, using: :gin add_index :authors, :name, using: :gin add_index :comments, :body, using: :gin
Materialized Views
Pre-populate the results of the view by running the partial query the view provides and storing it in a temporary table
Query against result set Those results are queried against when you query the view, rather than performing the full query for each search
Full Text Search ~400 ms ILIKE ~40 ms Materialized ~ 5.394 ms
REFRESH MATERIALIZED VIEW searches Materialized views need to be updated whenever the underlying data changes
after_commit :refresh def refresh
class.connection.execute( "REFRESH MATERIALIZED VIEW #{table_name}" ) end This can be hooked into ActiveRecord callbacks
What about pre-built solutions? There are a lot of good options out there.
ElasticSearch via Tire or elasticsearch-rails or Chewy require 'elasticsearch/model' class Article < ActiveRecord::Base include Elasticsearch::Model include Elasticsearch::Model::Callbacks end
Solr via Sunspot class Post < ActiveRecord::Base searchable do text :title, :body text :comments do comments.map { |comment| comment.body } end boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end end end
text :title, :body text :comments do comments.map {|comment| comment.body} end
boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at
string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end
sphinx with ThinkingSphinx ThinkingSphinx::Index.define :article, :with => :active_record do indexes subject, :sortable => true indexes content indexes author.name, :as => :author, :sortable => true has author_id, created_at, updated_at end
These services excel at faceted search
These services excel at faceted search More di f ficult with full text search It can be done, but you’d need to work a bit harder
Run on your
development machine .
Run on your
production machine .
Needs to be faked in tests
Some of these have lots of
cru f t in models.
Remove a data
concern from your database
Arcane syntax
:( they make me sadface
By combining
materialized views
full text search
Rails magic ✨
we have a pretty cool search feature
that doesn't require any new
dependencies
:)
Thank you
Bibliography • http://shisaa.jp/postset/postgresql-full-text-search-part-1.html
• http://blog.lostpropertyhq.com/postgres-full-text-search-is-good- enough/
• http://www.postgresql.org/docs/9.3/static/textsearch.html
• http://linuxgazette.net/164/sephton.html
• http://www.postgresql.org/docs/current/static/sql-createview.html
• http://www.postgresql.org/docs/current/static/indexes-examine.html
• http://www.postgresql.org/docs/current/static/textsearch-indexes.html
• http://en.wikipedia.org/wiki/Gin_and_tonic
http://calebthompson.io/talks/search.html
Come say hi That’s why I do this
Easily searching across an application’s data is a pervasive need. If you are lucky, you can get away with simple sorting or searching on a single column, but it is more likely that you need full text search across multiple models, all from a single search <input>
.
Thanks to the power of Postgres’ full text search, rolling your own search isn't too difficult.
Following an actual feature evolution which I worked on for a client, we will start with a search feature that queries a single column with LIKE
and build up to finding results across multiple columns on multiple tables using database views, optimizing the query time with full text indices and representing the results as a polymorphic association on an ActiveRecord model in Rails.
The following resources were mentioned during the presentation or are useful additional information.
Here’s what was said about this presentation on social media.
Implementing Multi-Table Full Text Search with Postgres in Rails
— thoughtbot (@thoughtbot) August 16, 2013
http://t.co/VNqorNjzg7
With that hat @calebthompson is officially a Ruby Warrior #rubyconfpt cc: @_zzak pic.twitter.com/VDkrVMxjP0
— PJ Hagerty (@aspleenic) October 14, 2014
@calebthompson I really love the name scenic for something that deals with views :)
— Christian Weyer (@chrisweyer) October 14, 2014