Multi-table Full Text Search in Postgres

A presentation at RubyConf Portugal 2014 in October 2014 in Braga, Portugal by Caleb Hearth

Slide 1

Slide 1

Multi-table Full Text Search Postgres with a somewhat nonstandard approach to search Often search will be done with external service we’ll use the tool already responsible for storing and finding data

Slide 2

Slide 2

@calebthompson calebthompson.io * Tuesday Hug * Close your laptops, there’s a lot of code here.s

Slide 3

Slide 3

I don’t want to jump into my credentials the point of credentials is to get you to listen, and you’re already here. So I’ll have more about me at the end but I do have one claim-to-fame

Slide 4

Slide 4

My birthday, so you know I’m an expert.

Slide 5

Slide 5

I'm going to talk to you about

Slide 6

Slide 6

A real-life feature. iterate, explore options, and optimize

Slide 7

Slide 7

Full Text Search What is it, how can it help us

Slide 8

Slide 8

Multi-table Full Text Search Postgres with You probably could have guessed that from the title

Slide 9

Slide 9

Views database views

Slide 10

Slide 10

Performance performance implications of using this approach and ways to mitigate them

Slide 11

Slide 11

Materialize d Views a somewhat recent feature

Slide 12

Slide 12

Gems that can help us out

Slide 13

Slide 13

Other Options for search in your application

Slide 14

Slide 14

Let's search for Articles We're going to look at a classic example

Slide 15

Slide 15

Article.where( "body LIKE %?%" , query)

Slide 16

Slide 16

Exact substrings That'll get us articles that contain an exact string

Slide 17

Slide 17

but that's not very useful

Slide 18

Slide 18

Article.where("body ILIKE %?%", query) That's better, at least we ignore case

Slide 19

Slide 19

Search on title What about the title?

Slide 20

Slide 20

Article.where(

"body ILIKE %?% OR title ILIKE %?%" , query, query) I'm sure you've seen this

Slide 21

Slide 21

Search by Author’s name Okay, now we want to search by author name too

Slide 22

Slide 22

Scopes def self.search(query)

joins(:user) .where(<<-SQL, query, query, query) articles.body ILIKE %?% OR articles.title ILIKE %?% OR users.name ILIKE %?% SQL

end We could get crazy with scopes

Slide 23

Slide 23

Query object class Search def self.for(query) [

Article.where("title ILIKE %?%", query), Article.where("body ILIKE %?%", query), Article.joins(:user) .where("users.name ILIKE %?%", query),

].flatten.uniq end end We could try a query object

Slide 24

Slide 24

Poor results And the results really aren't all that great Only matches for case-insensitive substrings

Slide 25

Slide 25

Full Text Search Enter query for rows based on natural language searching

Slide 26

Slide 26

natural language searching

Slide 27

Slide 27

remove stop words and, the, also, they, would

Slide 28

Slide 28

eliminate casing “Factory” and “factory” should return the same results

Slide 29

Slide 29

synonyms hunger and hungrily

Slide 30

Slide 30

stemming “try” and “trying”, “tries”, and “tried” will be recorded in the index under the single concept word "tri."

Slide 31

Slide 31

SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Example of making that same query

Slide 32

Slide 32

SELECT DISTINCT(id) FROM ( SELECT id AS id,

title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS id, authors.name AS the_text Grab the text we want to search on operator

Slide 33

Slide 33

FROM articles UNION SELECT articles.id AS id,

authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Grab the text we want to search on operator

Slide 34

Slide 34

SELECT DISTINCT(id) FROM ( SELECT

id AS id, title || ' ' || body AS the_text

FROM articles UNION SELECT articles.id AS id, authors.name AS the_text FROM authors Also get the id for the article

Slide 35

Slide 35

title || ' ' || body AS the_text 

FROM articles UNION SELECT

articles.id AS id, authors.name AS the_text

FROM authors JOIN articles ON authors.id = articles.author_id ) AS this_doesnt_matter WHERE to_tsvector(the_text) @@ to_tsquery('?'); Also get the id for the article

Slide 36

Slide 36

SELECT DISTINCT(id) FROM ( SELECT id AS id, title || ' ' || body AS the_text FROM articles UNION SELECT uniq as with query

Slide 37

Slide 37

That’s a lot of SQL. Where do we put it

Slide 38

Slide 38

We could throw that into our query object class Search def self.for(query) <<-SQL SQL end end

Slide 39

Slide 39

Scopes def self.search(query) where(<<-SQL, query) SQL end But SQL doesn’t belong in .rb files

Slide 40

Slide 40

Postgres has our answer

Slide 41

Slide 41

Views

Slide 42

Slide 42

Partial queries

Slide 43

Slide 43

Stored in the Database

Slide 44

Slide 44

Can be SELECT e d from

Slide 45

Slide 45

Return set of columns Compose the set of fields to be searched with views

Slide 46

Slide 46

Multi- source Can be pulled from multiple tables

Slide 47

Slide 47

Complete the query then perform a WHERE or other query on it when you need the results

Slide 48

Slide 48

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; unique users, all rows, active_at column from activities active over the past week

Slide 49

Slide 49

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; Syntax to create a view

Slide 50

Slide 50

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; Unique users

Slide 51

Slide 51

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id)

users.*,

   activities.created_at AS active_at 

FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; All rows from user

Slide 52

Slide 52

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*,

activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; plus an active_at row from activities’ created_at

Slide 53

Slide 53

CREATE VIEW users_with_recent_activity AS SELECT DISTINCT ON (users.id) users.*, activities.created_at AS active_at FROM users JOIN activities ON activities.user_id = users.id WHERE activities.created_at >= CURRENT_DATE - interval '7 days'; only those active over the past week

Slide 54

Slide 54

Querying a view looks just like querying a table SELECT * FROM users_with_recent_activity WHERE id IN (1,2,3…) ORDER BY active_at DESC

Slide 55

Slide 55

ActiveRecord can use a view as its backend

Slide 56

Slide 56

So we can create a fairly vanilla model

Slide 57

Slide 57

class UserWithRecentActivity
< ActiveRecord::Base def self.table_name "users_with_recent_activity" end def readonly? true end end

Slide 58

Slide 58

class UserWithRecentActivity
< ActiveRecord::Base

def self.table_name "users_with_recent_activity" end def readonly?

Slide 59

Slide 59

class UserWithRecentActivity
< ActiveRecord::Base def self.table_name "users_with_recent_activity" end def readonly? true end end

Slide 60

Slide 60

def self.table_name "users_with_recent_activity" end def readonly? true end end Not necessary; some views can be written/deleted

Slide 61

Slide 61

Will it work with full text search?

Slide 62

Slide 62

⚲ Textacular

Slide 63

Slide 63

Slide 64

Slide 64

Takes care of the Full Text Search portions of queries

Slide 65

Slide 65

Search over every text field on a record

Slide 66

Slide 66

Variant search options like basic_search, fuzzy search, and advanced search

Slide 67

Slide 67

Game .basic_search('Sonic') most generally useful for a search

Slide 68

Slide 68

Game .basic_search (

title: 'Mario', system: 'Nintendo' ) next simplest useful thing

Slide 69

Slide 69

So let's go back and look at the search we wrote

Slide 70

Slide 70

SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION

SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id;

Slide 71

Slide 71

Now our search result is really simple in the Rails side

Slide 72

Slide 72

class Search < ActiveRecord::Base

include Textacular belongs_to :article end

Slide 73

Slide 73

class Search < ActiveRecord::Base

include Textacular belongs_to :article end

Slide 74

Slide 74

Search. basic_search("Sandi") .map(&:article)

Slide 75

Slide 75

Search. basic_search("Sandi") . map(&:article)

Slide 76

Slide 76

class SearchResult

include Enumerable

def initialize( query )

@results = Search.basic_search(query)

end def each @results.each end end

Slide 77

Slide 77

CREATE Migration We create the view in a migration

Slide 78

Slide 78

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up

ActiveRecord::Base.connection.execute(

"--The CREATE VIEW sql from before" )

end def down

ActiveRecord::Base.connection.execute( 'DROP VIEW users_with_recent_activity' )

end end

Slide 79

Slide 79

How resistant to change is it?

Slide 80

Slide 80

Let's find out - time for some feature creep!

Slide 81

Slide 81

Articles whose comments match the query

Slide 82

Slide 82

Searching on • Article (title, body) • Author (name) • Comments (body)

Slide 83

Slide 83

SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION

SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION

SELECT article_id, body AS the_text FROM comments

Slide 84

Slide 84

articles.id AS article_id, 
authors.name AS the_text 

FROM authors JOIN articles ON authors.id = articles.author_id

UNION

SELECT article_id, body AS the_text FROM comments

Slide 85

Slide 85

UPDATE Migration Look at an update view migration

Slide 86

Slide 86

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up

ActiveRecord::Base.connection.execute(

"CREATE OR REPLACE VIEW searches AS --The sql from the new view" )

end def down

ActiveRecord::Base.connection.execute( "CREATE OR REPLACE VIEW searches AS --The sql from the old view" )

end end the down migration has to have the entire SQL of the previous view

Slide 87

Slide 87

UPDATE Migration Unfortunately, can’t (always) update can’t remove columns

Slide 88

Slide 88

class CreateUsersWithRecentActivity < ActiveRecord::Migration def up

ActiveRecord::Base.connection.execute(

"DROP VIEW searches; CREATE VIEW searches AS --The sql from the new view" )

end def down

ActiveRecord::Base.connection.execute( "DROP VIEW searches; CREATE VIEW searches AS --The sql from the old view" )

end end DROP first

Slide 89

Slide 89

Can’t be dumped to db/schema.rb

Slide 90

Slide 90

config.database_format = :structure db/structure.sql

Slide 91

Slide 91

Scenic Adds view methods to migrations allows views to be dumped to db/schema.rb attempts to make views easier to work with

Slide 92

Slide 92

Slide 93

Slide 93

[show how scenic gets migrations from view.sql definitions] show how scenic gets migrations from view.sql definitions syntax highlighting in .sql

Slide 94

Slide 94

[show that it has versions] show that it has versions

Slide 95

Slide 95

Model generator $ rails generate scenic:model search

create app/models/search.rb create db/views/searches_v01.sql create db/migrate/..._create_searches.rb Can be a lot to remember - generators

Slide 96

Slide 96

$ cat app/views/searches_v01.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id

Slide 97

Slide 97

View generator $ rails generate scenic:view search create db/views/searches_v02.sql create db/migrate/..._update_searches_to_version_2.rb re-running scenic:view creates new version

Slide 98

Slide 98

$ cat app/views/searches_v02.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id And the new sql file includes the previous definition

Slide 99

Slide 99

$ cat app/views/searches_v02.sql SELECT id AS article_id, title || ' ' || body AS the_text FROM articles UNION SELECT articles.id AS article_id, authors.name AS the_text FROM authors JOIN articles ON authors.id = articles.author_id UNION SELECT article_id, body AS the_text FROM comments Because most times you’ll have a small change to make and that’s easier.

Slide 100

Slide 100

Performance

Slide 101

Slide 101

This query is pretty slow

Slide 102

Slide 102

it has to search across three tables to get us all of the results we need

Slide 103

Slide 103

Full Text Search ~400 ms ILIKE ~40 ms Full Text Search vs ILIKE

Slide 104

Slide 104

Views can’t be indexed

Slide 105

Slide 105

Underlying tables can

Slide 106

Slide 106

Add

indices

Slide 107

Slide 107

There are several types of indices

Slide 108

Slide 108

btree Usually you'll have used btree

Slide 109

Slide 109

GIN & GIST The ones that apply to full text search are GIN and GIST Generalized Inverted iNdex Generalized Inverted Search Tree

Slide 110

Slide 110

GIN index lookups are about three times faster than GiST

Slide 111

Slide 111

GIN indexes take about three times longer to build than GiST

Slide 112

Slide 112

GIN indexes are moderately slower to update than GiST indexes in some cases can be much slower depending on database settings (fast-update)

Slide 113

Slide 113

GIN indexes are two-to-three times larger than GiST indexes

Slide 114

Slide 114

Use GIN • You don’t have hundreds of thousands of rows • You’re not concerned about longer writes blocking the db • You’re adding the index late in the game • You don’t care about disk space • You want fast lookups (read-heavy)

Slide 115

Slide 115

Use GiST • Very large tables • You have performance concerns • For some reason disk space is important • Your table is write-heavy

Slide 116

Slide 116

add_index :articles, :body, using: :gin add_index :articles, :title, using: :gin add_index :authors, :name, using: :gin add_index :comments, :body, using: :gin

Slide 117

Slide 117

Materialized Views

Slide 118

Slide 118

Pre-populate the results of the view by running the partial query the view provides and storing it in a temporary table

Slide 119

Slide 119

Query against result set Those results are queried against when you query the view, rather than performing the full query for each search

Slide 120

Slide 120

Full Text Search ~400 ms ILIKE ~40 ms Materialized ~ 5.394 ms

Slide 121

Slide 121

REFRESH MATERIALIZED VIEW searches Materialized views need to be updated whenever the underlying data changes

Slide 122

Slide 122

after_commit :refresh def refresh

class.connection.execute( "REFRESH MATERIALIZED VIEW #{table_name}" ) end This can be hooked into ActiveRecord callbacks

Slide 123

Slide 123

What about pre-built solutions? There are a lot of good options out there.

Slide 124

Slide 124

ElasticSearch via Tire or elasticsearch-rails or Chewy require 'elasticsearch/model' class Article < ActiveRecord::Base include Elasticsearch::Model include Elasticsearch::Model::Callbacks end

Slide 125

Slide 125

Solr via Sunspot class Post < ActiveRecord::Base searchable do text :title, :body text :comments do comments.map { |comment| comment.body } end boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end end end

Slide 126

Slide 126

text :title, :body text :comments do comments.map {|comment| comment.body} end

Slide 127

Slide 127

boolean :featured integer :blog_id integer :author_id integer :category_ids, :multiple => true double :average_rating time :published_at time :expired_at

Slide 128

Slide 128

string :sort_title do title.downcase.gsub(/^(an?|the)/, '') end

Slide 129

Slide 129

sphinx with ThinkingSphinx ThinkingSphinx::Index.define :article, :with => :active_record do indexes subject, :sortable => true indexes content indexes author.name, :as => :author, :sortable => true has author_id, created_at, updated_at end

Slide 130

Slide 130

These services excel at faceted search

Slide 131

Slide 131

These services excel at faceted search More di f ficult with full text search It can be done, but you’d need to work a bit harder

Slide 132

Slide 132

Run on your

development machine .

Slide 133

Slide 133

Run on your

production machine .

Slide 134

Slide 134

Needs to be faked in tests

Slide 135

Slide 135

Some of these have lots of

cru f t in models.

Slide 136

Slide 136

Remove a data

concern from your database

Slide 137

Slide 137

Arcane syntax

Slide 138

Slide 138

:( they make me sadface

Slide 139

Slide 139

By combining

Slide 140

Slide 140

materialized views

Slide 141

Slide 141

full text search

Slide 142

Slide 142

Rails magic ✨

Slide 143

Slide 143

we have a pretty cool search feature

Slide 144

Slide 144

that doesn't require any new

dependencies

Slide 145

Slide 145

:)

Slide 146

Slide 146

Thank you

Slide 147

Slide 147

Bibliography • http://shisaa.jp/postset/postgresql-full-text-search-part-1.html

• http://blog.lostpropertyhq.com/postgres-full-text-search-is-good- enough/

• http://www.postgresql.org/docs/9.3/static/textsearch.html

• http://linuxgazette.net/164/sephton.html

• http://www.postgresql.org/docs/current/static/sql-createview.html

• http://www.postgresql.org/docs/current/static/indexes-examine.html

• http://www.postgresql.org/docs/current/static/textsearch-indexes.html

• http://en.wikipedia.org/wiki/Gin_and_tonic

Slide 148

Slide 148

Slide 149

Slide 149

Slide 150

Slide 150

http://calebthompson.io/talks/search.html

Slide 151

Slide 151

Come say hi That’s why I do this