Django+Mysql性能优化小记
最近在对一个Django+Mysql的小网站进行性能优化,发现一些有趣的事,记录于此。
首先是中文全文搜索的问题。比较新的Mysql是可以创建全文索引并且进行全文搜索的,但对汉语的支持不好。英语天然地用空格分割单词,很容易实现全文搜索,但汉语却没有这样的天然优势。一个自然的想法便是若汉语的词与词之间也有空格分割,岂不是就可以直接用Mysql的全文索引和全文搜索了。为此,对需要进行全文搜索的字段,比如title和description,新建两个对应的字段用于存放带空格的中文句子:title_index和description_index。然后遍历整张表,读取title和description字段的值,对其进行汉语分词(比如使用jieba)后用空格分割分词结果,分别写回到title_index和description_index字段中去。可以用类似下面的Python代码去实现这一过程(注意下面的代码只是片段,是不完整的)。
#使用execute方法执行SQL语句
cursor.execute('''select id,title,description from tablename''')
for row in cursor.fetchall():#遍历整张表
#建立全文索引
if row[1]:
title_index = ' '.join(list(jieba.cut(row[1]))).replace("\"", "'")
else:
title_index = ''
if row[2]:
description_index = ' '.join(list(jieba.cut(row[2]))).replace("\"", "'")
else:
description_index = ''
sql = '''update tablename set `title_index`="{0}", `description_index`="{1}" \
where id={2};'''.format(title_index, description_index, row[0])
cursor.execute(sql)
db.commit()
之后建立对title_index和description_index的全文索引,搜索时也在这两个字段上搜索就可以了。但在试验时发现这样做只能搜索到较长的词组,比如“中华人民共和国”,但搜索不到较短的词组,比如“中国”。查阅资料显示,Mysql认为较短的单词(比如a、an这样的)对于全文搜索是有害的,所以会忽略这样的单词。这一长度由ft_min_word_len指定,其默认值是4。为了能较好的支持中文,将其改为2。Ubuntu中需要在配置文件/etc/mysql/my.cnf的[mysqld]段(section)中添加一行:
ft_min_word_len=2
加不对地方是没用的。我刚开始时将ft_min_word_len=2添加在配置文件末尾,再怎么重启Mysql,用SQL语句“SHOW VARIABLES LIKE ‘ft_min_word_len’;”查看其值,依旧是4,后来在其官网找到说明,才知必须添加在特定的段中才有效。改好这一配置并重启后再次尝试,就会发现Mysql可以对“中国”这样两个字的词进行有效的全文搜索了,但Mysql依旧会忽略“的”这样一个字的词,这正是我想要的。至于如何将Mysql全文搜索集成到Django中,其实这个问题不用考虑,较新的Django中集成了这样的API,以下部分复制于Django文档:
search¶
Deprecated since version 1.10: See the 1.10 release notes for how to replace it.
A boolean full-text search, taking advantage of full-text indexing. This is like contains but is significantly faster due to full-text indexing.
Example:
Entry.objects.filter(headline__search="+Django -jazz Python")
SQL equivalent:
SELECT ... WHERE MATCH(tablename, headline) AGAINST (+Django -jazz Python IN BOOLEAN MODE);
Note this is only available in MySQL and requires direct manipulation of the database to add the full-text index. By default Django uses BOOLEAN MODE for full text searches. See the MySQL documentation for additional details.
解决了全文搜索问题之后便要考虑效率了。我在维护的网站大约有10万数据,每搜索一次都需要几秒时间,即使添加了如上所述的全文搜索。数据很少,为何会这样呢?以为是字符匹配很慢,结果引入全文搜索依旧如此。一直到搞不清楚原因,直到偶然间发现:
mysql> SELECT count(*) FROM tablename WHERE MATCH (title_index,description_index) AGAINST ('中国' IN BOOLEAN MODE);
+----------+
| count(*) |
+----------+
| 8483 |
+----------+
1 row in set (3.68 sec)
mysql> SELECT count(*) FROM tablename WHERE MATCH (title_index) AGAINST ('中国' IN BOOLEAN MODE);
+----------+
| count(*) |
+----------+
| 3401 |
+----------+
1 row in set (0.02 sec)
mysql> SELECT count(*) FROM tablename WHERE MATCH (description_index) AGAINST ('中国' IN BOOLEAN MODE);
+----------+
| count(*) |
+----------+
| 8225 |
+----------+
1 row in set (0.05 sec)
mysql>
差距竟然如此之大!注意到这一巨大的差距,便去修改Django项目的代码,将:
if only_title:
conflist = Conf.objects.filter(title_index__search=s)
else:
conflist = Conf.objects.filter(Q(description_index__search=s)|Q(title_index__search=s))
修改为:
conflist = Conf.objects.filter(title_index__search=s)
if only_title:
pass
else:
conflist = conflist or Conf.objects.filter(description_index__search=s)
新的代码相当于分别从title_index和description_index中查询,再将查到的结果(是两个集合)并(or)起来。改好之后再试试,搜索果然变得很快很快了,由以前的好几秒变成了不到1秒。
补充说明,只有很新的Mysql的InnoDB和MyISAM都支持全文搜索,稍老一些版本的Mysql只有MyISAM支持全文搜索。而如果数据表是InnoDB,就需要转换为MyISAM了。下面展示的是我将InnoDB表转换为MyISAM表过程中做的备忘录:
# 设置和查看ft_min_word_len
nano /etc/mysql/my.cnf
Set the following values, which increase the maximum attachment size and make it possible to search for short words and terms:
Alter on Line 52: max_allowed_packet=100M
Add as new line 32, in the [mysqld] section: ft_min_word_len=2
SHOW VARIABLES LIKE 'ft_min_word_len';
# mysql innodb 转MyISAM
create table tt7_tmp like tablename;
insert into tt7_tmp select * from tablename;
alter table tt7_tmp engine=MyISAM;
SET FOREIGN_KEY_CHECKS = 0;
drop table tablename;
rename table tt7_tmp to tablename;
# 新增字段
alter table tablename add title_index varchar(2000);
alter table tablename add description_index longtext;
# 添加全文索引
ALTER TABLE tablename ADD FULLTEXT(title_index);
ALTER TABLE tablename ADD FULLTEXT(description_index);
Alter table `tablename` add fulltext(`title_index`);
Alter table `tablename` add fulltext(`description_index`);
repair table tablename;