网站需要中文分词搜索,安装coreseek,个人理解coreseek就是sphinx加了个中文分词方法和分词库.
1、介绍:
CoreSeek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等应用场景。目前作者已经停止维护。
2、安装前准备:
1 |
yum install make gcc g++ gcc-c++ libtool autoconf automake imake libxml2-devel expat-devel |
3、下载并解压:
1 2 3 |
cd /opt wget https://down.yundreams.com/uploads/soft/coreseek-4.1-beta.tar.gz tar -xzvf coreseek-4.1-beta.tar.gz |
4、安装mmseg:(可以安装到其他地方,csft的--with-mmseg-includes和--with-mmseg-libs写对应地址就行)
1 2 3 4 5 6 |
cd /opt/coreseek-4.1-beta/mmseg-3.2.14 ./bootstrap ./configure --prefix=/opt/coreseek-4.1 make make install make clean |
5、安装csft:
安装前先编辑configure.ac,位置13行
1 2 3 4 5 6 |
cd /opt/coreseek-4.1-beta/csft-4.1 vi configure.ac AM_INIT_AUTOMAKE([-Wall -Werror foreign]) 修改为 AM_INIT_AUTOMAKE([-Wall foreign]) |
1 2 3 4 5 |
./buildconf.sh ./configure --prefix=/opt/coreseek-4.1 --without-unixodbc --with-mmseg --with-mmseg-includes=/opt/coreseek-4.1/include/mmseg/ --with-mmseg-libs=/opt/coreseek-4.1/lib/ --with-mysql make make install make clean |
遇到报错sphinxexpr.cpp:1746:43: error:‘ExprEval’ was not declared in this scope的话按下面操作,没遇到请跳过
一共有三处需要调整,分别是1746行、1777行和1823行
1 |
vi /opt/coreseek-4.1-beta/csft-4.1/src/sphinxexpr.cpp |
将文件中的
1 |
T val = ExprEval ( this->m_pArg, tMatch ); // 'this' fixes gcc braindamage |
替换为:
1 |
T val = this->ExprEval ( this->m_pArg, tMatch ); // 'this' fixes gcc braindamage |
想省事的话,一个sed搞定
1 |
sed -i 's/=\ ExprEval/=\ this\->ExprEval/g' /opt/coreseek-4.1-beta/csft-4.1/src/sphinxexpr.cpp |
6、配置使用:
(1)、配置数据源:
1 2 3 |
cd /opt/coreseek-4.1/etc cp sphinx-min.conf.dist csft.conf vim /opt/coreseek-4.1/etc/csft.conf |
内容如下:(改成自己需要的库和sql)
sphinx配置文件详细参数说明可以查看默认的/opt/coreseek-4.1/etc/sphinx.conf.dist文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# # Minimal Sphinx configuration sample (clean, simple, functional) # source src1 { type = mysql sql_host = localhost sql_user = test sql_pass = sql_db = test sql_port = 3306 # optional, default is 3306 sql_query = \ SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \ FROM documents sql_attr_uint = group_id sql_attr_timestamp = date_added sql_query_info = SELECT * FROM documents WHERE id=$id } index test1 { source = src1 path = /opt/coreseek-4.1/var/data/test1 docinfo = extern charset_type = sbcs } index testrt { type = rt rt_mem_limit = 32M path = /opt/coreseek-4.1/var/data/testrt charset_type = utf-8 rt_field = title rt_field = content rt_attr_uint = gid } indexer { mem_limit = 32M } searchd { listen = 9312 listen = 9306:mysql41 log = /opt/coreseek-4.1/var/log/searchd.log query_log = /opt/coreseek-4.1/var/log/query.log read_timeout = 5 max_children = 30 pid_file = /opt/coreseek-4.1/var/log/searchd.pid max_matches = 1000 seamless_rotate = 1 preopen_indexes = 1 unlink_old = 1 workers = threads # for RT to work } |
(2)、开启searchd服务,生成索引
1 2 |
/opt/coreseek-4.1/bin/searchd -c /opt/coreseek-4.1/etc/csft.conf /opt/coreseek-4.1/bin/indexer -c /opt/coreseek-4.1/etc/csft.conf --all --rotate |
(3)、配合PHP使用:
1 2 3 4 5 6 7 8 9 10 11 |
require_once('sphinxapi'); $s = new SphinxClient(); $s->SetServer('127.0.0.1','9313'); //设置searchd的主机名和TCP端口 $s->SetConnectTimeout(2); // 设置连接超时 $s->SetMatchMode(SPH_MATCH_BOOLEAN); //设置全文查询的匹配模式 $page_size=5;//自己定义的页数 $s->SetLimits($start,$page_size); //设置返回结果集偏移量和数目 $s->SetSortMode( SPH_SORT_EXTENDED,"good_count DESC, @id DESC" ); // 设置排序 $s->SetArrayResult(true);//控制搜索结果集的返回格式 $res = $s->Query($keyword,'*');// 执行搜索查询 $res_list = $res['matches']; |
7、常用命令:
1 2 3 4 5 6 7 8 |
#启动 /opt/coreseek-4.1/bin/searchd -c /opt/coreseek-4.1/etc/csft.conf #停止 /opt/coreseek-4.1/bin/searchd -c /opt/coreseek-4.1/etc/csft.conf --stop #建立索引 /opt/coreseek-4.1/bin/indexer -c /opt/coreseek-4.1/etc/csft.conf --all #重建索引 /opt/coreseek-4.1/bin/indexer -c /opt/coreseek-4.1/etc/csft.conf --all --rotate |