User Tools

Site Tools


Translations of this page:



new_rotation_sphinx

Sphinx for Rotation\Tube sites

Basically script uses mysql to search DB, but on large DB mysql too slow. + it doesnt use morphology.

Sphinx (sphinxsearch.com) - is an open source full text search server, designed from the ground up with performance, relevance (aka search quality), and integration simplicity in mind. It faster then Mysql and deliver better quality of results.

Setup

  • Ask admin to setup sphinx
  • create folder scj/sphinx
  • ask admin to add scj index into sphinx config. (See config example below )
  • Run indexer (or ask admin to do it)
     indexer --all --rotate 
  • In Rotation - Settings edit sphinx options.

That's it.

Note if you change DB (add\remove content) - please, run indexer every day to keep it's base up to date.

Add to cron something like (ask admin)

indexer --all --rotate

Sphinx configs TCMS

TCMS (version 2.X)


searchd
{
	listen      			= your_port:mysql41

addition of mysql41 is the most important part. It's called SphinxQL



source your_name_source
{
	type					= mysql

	sql_host				= localhost
	sql_user				= 
	sql_pass				=
	sql_db					= 
	sql_port				= 3306	# optional, default is 3306

	sql_query_pre = SET NAMES utf8

	sql_query				= SELECT gi.gallery_id, UNIX_TIMESTAMP(gi.activation_date) as date, alt, description, gi.duration, sponsor_id, gs.total_ctr, gi.content_type, \
							(SELECT group_concat(tag_name) FROM rot_gal2tag g2t \
							LEFT JOIN rot_tags as t on t.tag_id = g2t.tag_id \
							WHERE g2t.gallery_id = gi.gallery_id) as tags, \
							(SELECT group_concat(tag_id) FROM rot_gal2tag g2t \
							WHERE g2t.gallery_id = gi.gallery_id) as tag_ids, \
							(SELECT group_concat(gss.group_id) FROM rot_gallery_stats1 as gss \
							WHERE gss.gallery_id = gi.gallery_id AND group_id != 0) as categories \
							FROM rot_gallery_info AS gi \
							JOIN rot_gallery_data1 AS gd ON gi.gallery_id = gd.gallery_id \
							JOIN rot_gallery_stats1 AS gs ON gs.gallery_id = gi.gallery_id \
							WHERE gallery_status = 'active' and gallery_type = 0 \
							and gs.best_thumb = 'yes' and gs.group_id = 0


	sql_attr_timestamp		= date
	sql_attr_uint		    = duration
	sql_attr_uint		    = sponsor_id
	sql_attr_float		    = total_ctr
	sql_attr_uint		    = content_type
    sql_attr_multi 			= uint categories from field; 
    sql_attr_multi 			= uint tag_ids from field; 

}


index your_name_index
{
	source					= your_name_source
	path					= /your_path
	docinfo					= extern
	morphology              = stem_en
	charset_type 			= utf-8

}

1.52

source scj
{
	type					= mysql

	sql_host				= localhost
	sql_user				= scj_mysql_login
	sql_pass				= scj_mysql_password
	sql_db					= scj_db
	sql_port				= 3306	# optional, default is 3306

	sql_query				= SELECT id, date, tags, alt, description, duration, sponsor_id, rgroup FROM rot_galleries as g \
							  JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0
        // for version 49
	//sql_query				= SELECT id, UNIX_TIMESTAMP(activation_date) as date, tags, alt, description, duration, sponsor_id, rgroup FROM rot_galleries as g \
	//						  JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id JOIN rot_gallery_data AS gd ON gd.gallery_md5 = g.gallery_md5 WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0

	sql_attr_timestamp		= date
	sql_attr_uint		    = duration
	sql_attr_uint		    = sponsor_id
	sql_attr_uint		    = rgroup

	sql_query_info			        = SELECT * FROM rot_galleries WHERE id=$id
}


index scj_index
{
	source					= scj
	path					= /path_to_data/scj/sphinx/scj_index
	docinfo					= extern
	morphology                              = stem_en # or stem_ru for example
}

1.51

source scj
{
	type					= mysql

	sql_host				=  ............
	sql_user				=  .............
	sql_pass				=  ..............
	sql_db					=  ..............
	sql_port				= 3306	# optional, default is 3306

	sql_query				= SELECT id, crc32(g.gallery_md5) as gallery_md5_crc,  UNIX_TIMESTAMP(activation_date) as date, tags, alt, description, duration, sponsor_id, rgroup, gs.total_ctr, \
							  g.content_type, (SELECT group_concat(group_id) FROM rot_gal2group as g2gr WHERE g2gr.gal_id = g.id) as categories FROM rot_galleries as g JOIN rot_gallery_stats AS gs ON g.id = gs.thumb_id \
							  JOIN rot_gallery_info AS gi ON g.gallery_md5 = gi.gallery_md5 \
							  JOIN rot_gallery_data AS gd ON g.gallery_md5 = gd.gallery_md5 \
							  WHERE status = 1 and gs.best_thumb = 'yes' and rgroup != 0 and gs.group_id = 0
	sql_attr_timestamp		= date
	sql_attr_uint		    = duration
	sql_attr_uint		    = sponsor_id
	sql_attr_uint		    = rgroup
	sql_attr_float		    = total_ctr
	sql_attr_uint		    = content_type
	sql_attr_uint			= gallery_md5_crc
    sql_attr_multi 			= uint categories from field; 

	sql_query_info			= SELECT * FROM rot_galleries WHERE id=$id
}





You have to edit here:

scj_mysql_login
scj_mysql_password
scj_db
path_to_data

Sphinx Delta Config

For Version 2.X

This part make sense only if you have a really big DB (1M+ , for example a demo version bigbase.smartcj.com has 12M+ galleries)

When Sphinx creates an index it takes a snapshot of a current state of a DB. So if something has been changed since then - it won't be reflected in index. And if you have a big DB it takes a lot of time to reindex it.

So the good idea is to just reindex those parts that have been changed. That's why you need so called delta index

To make use of it you need to add after

sql_query_pre = SET NAMES utf8

a new line

sql_query_post_index = UPDATE rot_settings SET value = (SELECT MAX(gallery_id) FROM rot_gallery_info) WHERE name = 'sphinx_max_gallery_id'

And add a new source

source delta : your_name_source
{
    sql_query_pre = SET NAMES utf8

	sql_query				= SELECT gi.gallery_id, UNIX_TIMESTAMP(gi.activation_date) as date, alt, description, gi.duration, sponsor_id, gs.total_ctr, gi.content_type, \
							(SELECT group_concat(tag_name) FROM rot_gal2tag g2t \
							LEFT JOIN rot_tags as t on t.tag_id = g2t.tag_id \
							WHERE g2t.gallery_id = gi.gallery_id) as tags, \
							(SELECT group_concat(gss.group_id) FROM rot_gallery_stats1 as gss \
							WHERE gss.gallery_id = gi.gallery_id AND group_id != 0) as categories \
							FROM rot_gallery_info AS gi \
							JOIN rot_gallery_data1 AS gd ON gi.gallery_id = gd.gallery_id \
							JOIN rot_gallery_stats1 AS gs ON gs.gallery_id = gi.gallery_id \
							WHERE gi.gallery_id > ( SELECT value FROM rot_settings WHERE name = 'sphinx_max_gallery_id' ) \
							AND gallery_status = 'active' and gallery_type = 0 \
							and gs.best_thumb = 'yes' and gs.group_id = 0
    
}


index delta : your_name_index
{
    source = delta
    path = /your_full_path/data/delta
}

You can note that it has the same query with an addition of sphinx_max_gallery_id.

Now you have to add a new crontab task that will reindex new parts

indexer --rotate delta

Now you have to add 'delta' into Sphinx Delta Index (Rotation settings)

once a day you can marge main index and delta index

indexer --rotate --merge your_name_index delta

That's it.

new_rotation_sphinx.txt · Last modified: 2019/01/06 10:34 by admin