网站需要中文分词搜索,安装coreseek,个人理解coreseek就是sphinx加了个中文分词方法和分词库.
1、介绍:
CoreSeek 是一款中文全文检索/搜索软件,以GPLv2许可协议开源发布,基于Sphinx研发并独立发布,专攻中文搜索和信息处理领域,适用于行业/垂直搜索、论坛/站内搜索、数据库搜索、文档/文献检索、信息检索、数据挖掘等应用场景。目前作者已经停止维护。
2、安装前准备:
| 1 | yum install make gcc g++ gcc-c++ libtool autoconf automake imake libxml2-devel expat-devel | 
3、下载并解压:
| 1 2 3 | cd /opt wget https://down.yundreams.com/uploads/soft/coreseek-4.1-beta.tar.gz tar -xzvf coreseek-4.1-beta.tar.gz | 
4、安装mmseg:(可以安装到其他地方,csft的--with-mmseg-includes和--with-mmseg-libs写对应地址就行)
| 1 2 3 4 5 6 | cd /opt/coreseek-4.1-beta/mmseg-3.2.14 ./bootstrap ./configure --prefix=/opt/coreseek-4.1 make make install make clean | 
5、安装csft:
安装前先编辑configure.ac,位置13行
| 1 2 3 4 5 6 | cd /opt/coreseek-4.1-beta/csft-4.1 vi configure.ac AM_INIT_AUTOMAKE([-Wall -Werror foreign]) 修改为 AM_INIT_AUTOMAKE([-Wall foreign]) | 
| 1 2 3 4 5 | ./buildconf.sh ./configure --prefix=/opt/coreseek-4.1 --without-unixodbc --with-mmseg --with-mmseg-includes=/opt/coreseek-4.1/include/mmseg/ --with-mmseg-libs=/opt/coreseek-4.1/lib/ --with-mysql make make install make clean | 
遇到报错sphinxexpr.cpp:1746:43: error:‘ExprEval’ was not declared in this scope的话按下面操作,没遇到请跳过
一共有三处需要调整,分别是1746行、1777行和1823行
| 1 | vi /opt/coreseek-4.1-beta/csft-4.1/src/sphinxexpr.cpp | 
将文件中的
| 1 | T val = ExprEval ( this->m_pArg, tMatch ); // 'this' fixes gcc braindamage | 
替换为:
| 1 | T val = this->ExprEval ( this->m_pArg, tMatch ); // 'this' fixes gcc braindamage | 
想省事的话,一个sed搞定
| 1 | sed -i 's/=\ ExprEval/=\ this\->ExprEval/g' /opt/coreseek-4.1-beta/csft-4.1/src/sphinxexpr.cpp | 
6、配置使用:
(1)、配置数据源:
| 1 2 3 | cd /opt/coreseek-4.1/etc cp sphinx-min.conf.dist csft.conf vim /opt/coreseek-4.1/etc/csft.conf | 
内容如下:(改成自己需要的库和sql)
sphinx配置文件详细参数说明可以查看默认的/opt/coreseek-4.1/etc/sphinx.conf.dist文件
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | # # Minimal Sphinx configuration sample (clean, simple, functional) # source src1 {         type                    = mysql         sql_host                = localhost         sql_user                = test         sql_pass                =         sql_db                  = test         sql_port                = 3306  # optional, default is 3306         sql_query               = \                 SELECT id, group_id, UNIX_TIMESTAMP(date_added) AS date_added, title, content \                 FROM documents         sql_attr_uint           = group_id         sql_attr_timestamp      = date_added         sql_query_info          = SELECT * FROM documents WHERE id=$id } index test1 {         source                  = src1         path                    = /opt/coreseek-4.1/var/data/test1         docinfo                 = extern         charset_type            = sbcs } index testrt {         type                    = rt         rt_mem_limit            = 32M         path                    = /opt/coreseek-4.1/var/data/testrt         charset_type            = utf-8         rt_field                = title         rt_field                = content         rt_attr_uint            = gid } indexer {         mem_limit               = 32M } searchd {         listen                  = 9312         listen                  = 9306:mysql41         log                     = /opt/coreseek-4.1/var/log/searchd.log         query_log               = /opt/coreseek-4.1/var/log/query.log         read_timeout            = 5         max_children            = 30         pid_file                = /opt/coreseek-4.1/var/log/searchd.pid         max_matches             = 1000         seamless_rotate         = 1         preopen_indexes         = 1         unlink_old              = 1         workers                 = threads # for RT to work } | 
(2)、开启searchd服务,生成索引
| 1 2 | /opt/coreseek-4.1/bin/searchd -c /opt/coreseek-4.1/etc/csft.conf /opt/coreseek-4.1/bin/indexer -c /opt/coreseek-4.1/etc/csft.conf --all --rotate | 
(3)、配合PHP使用:
| 1 2 3 4 5 6 7 8 9 10 11 | require_once('sphinxapi'); $s = new SphinxClient(); $s->SetServer('127.0.0.1','9313'); //设置searchd的主机名和TCP端口 $s->SetConnectTimeout(2); // 设置连接超时 $s->SetMatchMode(SPH_MATCH_BOOLEAN); //设置全文查询的匹配模式 $page_size=5;//自己定义的页数 $s->SetLimits($start,$page_size); //设置返回结果集偏移量和数目 $s->SetSortMode( SPH_SORT_EXTENDED,"good_count DESC, @id DESC" ); // 设置排序 $s->SetArrayResult(true);//控制搜索结果集的返回格式 $res = $s->Query($keyword,'*');// 执行搜索查询 $res_list = $res['matches']; | 
7、常用命令:
| 1 2 3 4 5 6 7 8 | #启动 /opt/coreseek-4.1/bin/searchd -c /opt/coreseek-4.1/etc/csft.conf #停止 /opt/coreseek-4.1/bin/searchd -c /opt/coreseek-4.1/etc/csft.conf --stop #建立索引 /opt/coreseek-4.1/bin/indexer -c /opt/coreseek-4.1/etc/csft.conf --all #重建索引 /opt/coreseek-4.1/bin/indexer -c /opt/coreseek-4.1/etc/csft.conf --all --rotate |