User Tools

Site Tools


new_rotation_sphinx

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
new_rotation_sphinx [2017/04/07 16:24]
127.0.0.1 external edit
new_rotation_sphinx [2019/01/06 10:34] (current)
admin
Line 1: Line 1:
 +====== Sphinx for Rotation\Tube sites ======
  
 +Basically script uses mysql to search DB, but on large DB mysql too slow. + it doesnt use morphology.
 +
 +Sphinx (sphinxsearch.com) - is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It faster then Mysql and deliver better quality of results.
 +
 +
 +Setup
 +
 +  * Ask admin to setup sphinx 
 +  * create folder scj/sphinx
 +  * ask admin to add scj index into sphinx config. (See config example below )
 +  * Run indexer (or ask admin to do it) <code> indexer --all --rotate </code> 
 +  * In Rotation - Settings edit sphinx options.
 +
 +That's it.
 +
 +**Note** if you change DB (add\remove content) - please, run indexer every day to keep it's base up to date. 
 +
 +Add to cron something like (ask admin)
 +
 +<code>
 +indexer --all --rotate
 +</code>
 +
 +
 +====== Sphinx configs TCMS  ======
 +
 +===== TCMS (version 2.X) =====
 +
 +<code>
 +
 +
 +searchd
 +{
 + listen      = your_port:mysql41
 +
 +addition of mysql41 is the most important part. It's called SphinxQL
 +
 +
 +
 +source your_name_source
 +{
 + type = mysql
 +
 + sql_host = localhost
 + sql_user = 
 + sql_pass =
 + sql_db = 
 + sql_port = 3306 # optional, default is 3306
 +
 + sql_query_pre = SET NAMES utf8
 +
 + sql_query = SELECT gi.gallery_id, UNIX_TIMESTAMP(gi.activation_date) as date, alt, description, gi.duration, sponsor_id, gs.total_ctr, gi.content_type, \
 + (SELECT group_concat(tag_name) FROM rot_gal2tag g2t \
 + LEFT JOIN rot_tags as t on t.tag_id = g2t.tag_id \
 + WHERE g2t.gallery_id = gi.gallery_id) as tags, \
 + (SELECT group_concat(tag_id) FROM rot_gal2tag g2t \
 + WHERE g2t.gallery_id = gi.gallery_id) as tag_ids, \
 + (SELECT group_concat(gss.group_id) FROM rot_gallery_stats1 as gss \
 + WHERE gss.gallery_id = gi.gallery_id AND group_id != 0) as categories \
 + FROM rot_gallery_info AS gi \
 + JOIN rot_gallery_data1 AS gd ON gi.gallery_id = gd.gallery_id \
 + JOIN rot_gallery_stats1 AS gs ON gs.gallery_id = gi.gallery_id \
 + WHERE gallery_status = 'active' and gallery_type = 0 \
 + and gs.best_thumb = 'yes' and gs.group_id = 0
 +
 +
 + sql_attr_timestamp = date
 + sql_attr_uint     = duration
 + sql_attr_uint     = sponsor_id
 + sql_attr_float     = total_ctr
 + sql_attr_uint     = content_type
 +    sql_attr_multi = uint categories from field; 
 +    sql_attr_multi = uint tag_ids from field; 
 +
 +}
 +
 +
 +index your_name_index
 +{
 + source = your_name_source
 + path = /your_path
 + docinfo = extern
 + morphology              = stem_en
 + charset_type = utf-8
 +
 +}
 +
 +</code>
 +
 +===== 1.52 =====
 +
 +<code>
 +source scj
 +{
 + type = mysql
 +
 + sql_host = localhost
 + sql_user = scj_mysql_login
 + sql_pass = scj_mysql_password
 + sql_db = scj_db
 + sql_port = 3306 # optional, default is 3306
 +
 + sql_query = SELECT id, date, tags, alt, description, duration, sponsor_id, rgroup FROM rot_galleries as g \
 +   JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0
 +        // for version 49
 + //sql_query = SELECT id, UNIX_TIMESTAMP(activation_date) as date, tags, alt, description, duration, sponsor_id, rgroup FROM rot_galleries as g \
 + //   JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id JOIN rot_gallery_data AS gd ON gd.gallery_md5 = g.gallery_md5 WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0
 +
 + sql_attr_timestamp = date
 + sql_attr_uint     = duration
 + sql_attr_uint     = sponsor_id
 + sql_attr_uint     = rgroup
 +
 + sql_query_info         = SELECT * FROM rot_galleries WHERE id=$id
 +}
 +
 +
 +index scj_index
 +{
 + source = scj
 + path = /path_to_data/scj/sphinx/scj_index
 + docinfo = extern
 + morphology                              = stem_en # or stem_ru for example
 +}
 +
 +
 +</code>
 +===== 1.51 =====
 +
 +<code>
 +source scj
 +{
 + type = mysql
 +
 + sql_host =  ............
 + sql_user =  .............
 + sql_pass =  ..............
 + sql_db =  ..............
 + sql_port = 3306 # optional, default is 3306
 +
 + sql_query = SELECT id, crc32(g.gallery_md5) as gallery_md5_crc,  UNIX_TIMESTAMP(activation_date) as date, tags, alt, description, duration, sponsor_id, rgroup, gs.total_ctr, \
 +   g.content_type, (SELECT group_concat(group_id) FROM rot_gal2group as g2gr WHERE g2gr.gal_id = g.id) as categories FROM rot_galleries as g JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id \
 +   JOIN rot_gallery_info AS gi ON g.gallery_md5 = gi.gallery_md5 \
 +   JOIN rot_gallery_data AS gd ON g.gallery_md5 = gd.gallery_md5 \
 +   WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0 and gs.group_id = 0
 + sql_attr_timestamp = date
 + sql_attr_uint     = duration
 + sql_attr_uint     = sponsor_id
 + sql_attr_uint     = rgroup
 + sql_attr_float     = total_ctr
 + sql_attr_uint     = content_type
 + sql_attr_uint = gallery_md5_crc
 +    sql_attr_multi = uint categories from field; 
 +
 + sql_query_info = SELECT * FROM rot_galleries WHERE id=$id
 +}
 +
 +
 +
 +
 +
 +You have to edit here:
 +
 +scj_mysql_login
 +scj_mysql_password
 +scj_db
 +path_to_data
 +</code>
 +====== Sphinx Delta Config ======
 +
 +For Version 2.X
 +
 +This part make sense only if you have a really big DB (1M+ , for example a demo version bigbase.smartcj.com has 12M+ galleries)
 +
 +When Sphinx creates an index it takes a snapshot of a current state of a DB. So if something has been changed since then - it won't be reflected in index. And if you have a big DB it takes a lot of time to reindex it.
 +
 +So the good idea is to just reindex those parts that have been changed. That's why you need so called delta index
 +
 +To make use of it you need to add after 
 +
 +  sql_query_pre = SET NAMES utf8
 +
 +a new line
 +
 +  sql_query_post_index = UPDATE rot_settings SET value = (SELECT MAX(gallery_id) FROM rot_gallery_info) WHERE name = 'sphinx_max_gallery_id'
 +
 +
 +And add a new source 
 +
 +<code>
 +
 +source delta : your_name_source
 +{
 +    sql_query_pre = SET NAMES utf8
 +
 + sql_query = SELECT gi.gallery_id, UNIX_TIMESTAMP(gi.activation_date) as date, alt, description, gi.duration, sponsor_id, gs.total_ctr, gi.content_type, \
 + (SELECT group_concat(tag_name) FROM rot_gal2tag g2t \
 + LEFT JOIN rot_tags as t on t.tag_id = g2t.tag_id \
 + WHERE g2t.gallery_id = gi.gallery_id) as tags, \
 + (SELECT group_concat(gss.group_id) FROM rot_gallery_stats1 as gss \
 + WHERE gss.gallery_id = gi.gallery_id AND group_id != 0) as categories \
 + FROM rot_gallery_info AS gi \
 + JOIN rot_gallery_data1 AS gd ON gi.gallery_id = gd.gallery_id \
 + JOIN rot_gallery_stats1 AS gs ON gs.gallery_id = gi.gallery_id \
 + WHERE gi.gallery_id > ( SELECT value FROM rot_settings WHERE name = 'sphinx_max_gallery_id' ) \
 + AND gallery_status = 'active' and gallery_type = 0 \
 + and gs.best_thumb = 'yes' and gs.group_id = 0
 +    
 +}
 +
 +
 +index delta : your_name_index
 +{
 +    source = delta
 +    path = /your_full_path/data/delta
 +}
 +
 +
 +</code>
 +
 +
 +You can note that it has the same query with an addition of sphinx_max_gallery_id.
 +
 +Now you have to add a new crontab task that will reindex new parts
 +
 +  indexer --rotate delta
 +
 +Now you have to add 'delta' into  Sphinx Delta Index  (Rotation settings)
 +
 +once a day you can marge main index and delta index
 +
 +  indexer --rotate --merge your_name_index delta
 +
 +
 +That's it.
new_rotation_sphinx.txt ยท Last modified: 2019/01/06 10:34 by admin