Table of Contents
Sphinx for Rotation\Tube sites
Basically script uses mysql to search DB, but on large DB mysql too slow. + it doesnt use morphology.
Sphinx (sphinxsearch.com) - is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It faster then Mysql and deliver better quality of results.
Setup
- Ask admin to setup sphinx
- create folder scj/sphinx
- ask admin to add scj index into sphinx config. (See config example below )
- Run indexer (or ask admin to do it)
indexer --all --rotate
- In Rotation - Settings edit sphinx options.
That's it.
Note if you change DB (add\remove content) - please, run indexer every day to keep it's base up to date.
Add to cron something like (ask admin)
indexer --all --rotate
Sphinx configs TCMS
TCMS (version 2.X)
searchd { listen = your_port:mysql41 addition of mysql41 is the most important part. It's called SphinxQL source your_name_source { type = mysql sql_host = localhost sql_user = sql_pass = sql_db = sql_port = 3306 # optional, default is 3306 sql_query_pre = SET NAMES utf8 sql_query = SELECT gi.gallery_id, UNIX_TIMESTAMP(gi.activation_date) as date, alt, description, gi.duration, sponsor_id, gs.total_ctr, gi.content_type, \ (SELECT group_concat(tag_name) FROM rot_gal2tag g2t \ LEFT JOIN rot_tags as t on t.tag_id = g2t.tag_id \ WHERE g2t.gallery_id = gi.gallery_id) as tags, \ (SELECT group_concat(tag_id) FROM rot_gal2tag g2t \ WHERE g2t.gallery_id = gi.gallery_id) as tag_ids, \ (SELECT group_concat(gss.group_id) FROM rot_gallery_stats1 as gss \ WHERE gss.gallery_id = gi.gallery_id AND group_id != 0) as categories \ FROM rot_gallery_info AS gi \ JOIN rot_gallery_data1 AS gd ON gi.gallery_id = gd.gallery_id \ JOIN rot_gallery_stats1 AS gs ON gs.gallery_id = gi.gallery_id \ WHERE gallery_status = 'active' and gallery_type = 0 \ and gs.best_thumb = 'yes' and gs.group_id = 0 sql_attr_timestamp = date sql_attr_uint = duration sql_attr_uint = sponsor_id sql_attr_float = total_ctr sql_attr_uint = content_type sql_attr_multi = uint categories from field; sql_attr_multi = uint tag_ids from field; } index your_name_index { source = your_name_source path = /your_path docinfo = extern morphology = stem_en charset_type = utf-8 }
1.52
source scj { type = mysql sql_host = localhost sql_user = scj_mysql_login sql_pass = scj_mysql_password sql_db = scj_db sql_port = 3306 # optional, default is 3306 sql_query = SELECT id, date, tags, alt, description, duration, sponsor_id, rgroup FROM rot_galleries as g \ JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0 // for version 49 //sql_query = SELECT id, UNIX_TIMESTAMP(activation_date) as date, tags, alt, description, duration, sponsor_id, rgroup FROM rot_galleries as g \ // JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id JOIN rot_gallery_data AS gd ON gd.gallery_md5 = g.gallery_md5 WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0 sql_attr_timestamp = date sql_attr_uint = duration sql_attr_uint = sponsor_id sql_attr_uint = rgroup sql_query_info = SELECT * FROM rot_galleries WHERE id=$id } index scj_index { source = scj path = /path_to_data/scj/sphinx/scj_index docinfo = extern morphology = stem_en # or stem_ru for example }
1.51
source scj { type = mysql sql_host = ............ sql_user = ............. sql_pass = .............. sql_db = .............. sql_port = 3306 # optional, default is 3306 sql_query = SELECT id, crc32(g.gallery_md5) as gallery_md5_crc, UNIX_TIMESTAMP(activation_date) as date, tags, alt, description, duration, sponsor_id, rgroup, gs.total_ctr, \ g.content_type, (SELECT group_concat(group_id) FROM rot_gal2group as g2gr WHERE g2gr.gal_id = g.id) as categories FROM rot_galleries as g JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id \ JOIN rot_gallery_info AS gi ON g.gallery_md5 = gi.gallery_md5 \ JOIN rot_gallery_data AS gd ON g.gallery_md5 = gd.gallery_md5 \ WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0 and gs.group_id = 0 sql_attr_timestamp = date sql_attr_uint = duration sql_attr_uint = sponsor_id sql_attr_uint = rgroup sql_attr_float = total_ctr sql_attr_uint = content_type sql_attr_uint = gallery_md5_crc sql_attr_multi = uint categories from field; sql_query_info = SELECT * FROM rot_galleries WHERE id=$id } You have to edit here: scj_mysql_login scj_mysql_password scj_db path_to_data
Sphinx Delta Config
For Version 2.X
This part make sense only if you have a really big DB (1M+ , for example a demo version bigbase.smartcj.com has 12M+ galleries)
When Sphinx creates an index it takes a snapshot of a current state of a DB. So if something has been changed since then - it won't be reflected in index. And if you have a big DB it takes a lot of time to reindex it.
So the good idea is to just reindex those parts that have been changed. That's why you need so called delta index
To make use of it you need to add after
sql_query_pre = SET NAMES utf8
a new line
sql_query_post_index = UPDATE rot_settings SET value = (SELECT MAX(gallery_id) FROM rot_gallery_info) WHERE name = 'sphinx_max_gallery_id'
And add a new source
source delta : your_name_source { sql_query_pre = SET NAMES utf8 sql_query = SELECT gi.gallery_id, UNIX_TIMESTAMP(gi.activation_date) as date, alt, description, gi.duration, sponsor_id, gs.total_ctr, gi.content_type, \ (SELECT group_concat(tag_name) FROM rot_gal2tag g2t \ LEFT JOIN rot_tags as t on t.tag_id = g2t.tag_id \ WHERE g2t.gallery_id = gi.gallery_id) as tags, \ (SELECT group_concat(gss.group_id) FROM rot_gallery_stats1 as gss \ WHERE gss.gallery_id = gi.gallery_id AND group_id != 0) as categories \ FROM rot_gallery_info AS gi \ JOIN rot_gallery_data1 AS gd ON gi.gallery_id = gd.gallery_id \ JOIN rot_gallery_stats1 AS gs ON gs.gallery_id = gi.gallery_id \ WHERE gi.gallery_id > ( SELECT value FROM rot_settings WHERE name = 'sphinx_max_gallery_id' ) \ AND gallery_status = 'active' and gallery_type = 0 \ and gs.best_thumb = 'yes' and gs.group_id = 0 } index delta : your_name_index { source = delta path = /your_full_path/data/delta }
You can note that it has the same query with an addition of sphinx_max_gallery_id.
Now you have to add a new crontab task that will reindex new parts
indexer --rotate delta
Now you have to add 'delta' into Sphinx Delta Index (Rotation settings)
once a day you can marge main index and delta index
indexer --rotate --merge your_name_index delta
That's it.