MySQL

参考《高性能 MySQL》和官方文档。使用 MySQL 的 Sakila 示例数据库,使用 MySQL 版本 8.0.41 + InnoDB 存储引擎。各种术语的官方定义可以看 MySQL 术语表,锁相关的内容可以看 InnoDB Locking,更多数据库概念可以看 CUM 15-445 课程总结。任何描述都是简化的,不会涵盖所有情况,更多细节只能看文档。或者说没有必要去记细节,任何细节都取决于实现,而实现会随着版本更新而变化,需要学习的是整体的策略。

基本概念

快照(snapshot):数据在特定时间的表示。一致性读consistent read):根据快照显示查询结果,也被称为一致性非锁定读。在读已提交和可重复读隔离级别下执行 SELECT 语句的默认模式是一致性读,也就是说会使用多版本并发控制MVCC)读取数据。读已提交在每次执行一致性读时都会重置快照,而可重复读只在第一次一致性读时建立快照。锁定读locking read):使用 SELECT ... FOR SHARE 或者 SELECT ... FOR UPDATE 读取数据,会加读锁或者写锁,在事务提交或者回滚时释放(2PL)。

MVCC 是使用撤销日志Undo Log)和读取视图Read View)实现的。事务在修改记录之后会记录撤销日志,事务的回滚指针会指向该日志。读取视图包含一致性读不可见的事务 ID(事务 ID 不会在启动事务之后立即分配),读取记录时会比较记录的事务 ID 和读取视图,如果该记录对当前事务不可见,则执行撤销日志直到达到可见状态。删除操作被视为修改操作,通过修改删除标志位实现,只有当该版本记录对所有事务不可见时,才会被真正删除。撤销日志也会被记录到重做日志Redo Log)中,实现崩溃恢复。

关于幻读的问题:DML 语句可以破坏快照,从而在可重复读隔离级别会出现幻读。(参考一致性读文档)如果将快照读和当前读混用,也会出现幻读。

1
2
3
4
5
6
7
8
9
10
11
SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz';
-- Returns 0: no rows match.
DELETE FROM t1 WHERE c1 = 'xyz';
-- Deletes several rows recently committed by other transaction.

SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc';
-- Returns 0: no rows match.
UPDATE t1 SET c2 = 'cba' WHERE c2 = 'abc';
-- Affects 10 rows: another txn just committed 10 rows with 'abc' values.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'cba';
-- Returns 10: this txn can now see the rows it just updated.

常用命令

1
2
3
4
5
6
7
8
9
10
SHOW FULL PROCESSLIST;
SHOW ENGINE INNODB STATUS;
SELECT * FROM performance_schema.data_locks\G
SHOW STATUS LIKE 'Last_query_cost';
SHOW [GLOBAL | SESSION] VARIABLES [LIKE 'pattern' | WHERE expr];
START TRANSACTION; BEGIN; COMMIT; ROLLBACK;
[CREATE | ALTER | DROP | OPTIMIZE | ANALYZE | CHECK | REPAIR ] TABLE tbl_name;
SHOW TABLE STATUS [LIKE 'pattern' | WHERE expr];
SHOW INDEX FROM tbl_name;
EXPLAIN [FORMAT=TREE | ANALYZE] select_statement; SHOW WARNINGS;

前缀索引

使用前缀索引可以减少索引的空间开销,内部节点的字符串比较会更高效,但是选择性会更低,那么匹配的主键会更多,从而需要更多次回表。所以需要选择合适的前缀长度,尽可能提高索引的选择性。索引的选择性是指不重复的索引值(基数,cardinality)和数据表的记录总数的比值。无法利用前缀索引执行分组、排序和覆盖索引扫描。

执行如下语句,city_demo 最终有 19200 条记录。

1
2
3
4
CREATE TABLE city_demo(city VARCHAR(50) NOT NULL);
INSERT INTO city_demo(city) SELECT city FROM city;
INSERT INTO city_demo(city) SELECT city FROM city_demo; -- 重复执行 5 次
UPDATE city_demo SET city = (SELECT city FROM city ORDER BY RAND() LIMIT 1); -- 随机化数据

查询出现次数最多的前 10 个城市,然后使用各种前缀测试索引的选择性。前缀长度为 7 比较合适,因为统计数量的偏差不算很大。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
SELECT COUNT(*) AS c, city
FROM city_demo GROUP BY city ORDER BY c DESC LIMIT 10;
+----+-----------------+
| c | city |
+----+-----------------+
| 54 | Ondo |
| 51 | London |
| 49 | Olomouc |
| 49 | Pontianak |
| 47 | Kurgan |
| 46 | Almirante Brown |
| 46 | Changzhou |
| 46 | Funafuti |
| 46 | Jodhpur |
| 45 | Plock |
+----+-----------------+
SELECT COUNT(*) AS c, LEFT(city, 3) AS pref
FROM city_demo GROUP BY pref ORDER BY c DESC LIMIT 10;
+-----+------+
| c | pref |
+-----+------+
| 477 | San |
| 197 | Cha |
| 171 | Tan |
| 157 | al- |
| 152 | Sou |
| 148 | Bat |
| 146 | Sal |
| 146 | Shi |
| 126 | Kam |
| 125 | Val |
+-----+------+
SELECT COUNT(*) AS c, LEFT(city, 7) AS pref
FROM city_demo GROUP BY pref ORDER BY c DESC LIMIT 10;
+----+---------+
| c | pref |
+----+---------+
| 76 | San Fel |
| 66 | Valle d |
| 63 | Santiag |
| 54 | Ondo |
| 51 | London |
| 49 | Pontian |
| 49 | Olomouc |
| 47 | Kurgan |
| 46 | Jodhpur |
| 46 | Almiran |
+----+---------+

不能只根据前缀索引选择性的值确定前缀长度,例如前缀长度为 5 的选择性看上去很接近完整列的选择性,但是如果查看出现次数最多的前 10 个城市,和完整列的结果相比,会发现数据分布很不均匀。如果查询以 South 前缀的某个城市,那么回表的次数会更多。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
SELECT COUNT(DISTINCT city) / COUNT(*) AS sel,
COUNT(DISTINCT LEFT(city, 3)) / COUNT(*) AS sel3,
COUNT(DISTINCT LEFT(city, 4)) / COUNT(*) AS sel4,
COUNT(DISTINCT LEFT(city, 5)) / COUNT(*) AS sel5,
COUNT(DISTINCT LEFT(city, 6)) / COUNT(*) AS sel6,
COUNT(DISTINCT LEFT(city, 7)) / COUNT(*) AS sel7
FROM city_demo;
+--------+--------+--------+--------+--------+--------+
| sel | sel3 | sel4 | sel5 | sel6 | sel7 |
+--------+--------+--------+--------+--------+--------+
| 0.0312 | 0.0236 | 0.0293 | 0.0305 | 0.0309 | 0.0310 |
+--------+--------+--------+--------+--------+--------+
SELECT COUNT(*) AS c, LEFT(city, 5) AS pref
FROM city_demo GROUP BY pref ORDER BY c DESC LIMIT 10;
+-----+--------+
| c | pref |
+-----+--------+
| 118 | South |
| 104 | Santa |
| 78 | Chang |
| 76 | San F |
| 69 | Toulo |
| 68 | Xi´an |
| 66 | Valle |
| 64 | Saint |
| 64 | Shimo |
| 63 | Santi |
+-----+--------+

多列索引

在 film_actor 上有主键索引 PRIMARY KEY (actor_id,film_id),有普通索引 KEY idx_fk_film_id (film_id),以下查询计划表示合并两个索引的查询结果。索引合并策略有时效果不错,但更多时候表明表的索引建得很槽糕。如果优化器需要对多个索引做相交操作(由于多个 AND 条件),那么查看是否可以使用多列索引进行优化。

如果优化器需要对多个索引做联合操作(由于多个 OR 条件),且索引的选择性不高时,通常会在缓存、排序和合并上消耗大量 CPU 和内存资源。然而,优化器不会将这些操作计算到查询成本中,优化器只关心随机页面读取,所以有时索引合并的性能还不如全表扫描。这还会影响并发的查询,此时使用 UNION 改写,将单个查询拆分为多个查询,可以避免单个查询的执行时间过长,影响其他并发的查询(由于 2PL 协议,单个查询会在提交时才释放锁,当然不同隔离级别有细微差别)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
EXPLAIN SELECT film_id, actor_id FROM film_actor
WHERE actor_id = 1 OR film_id = 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film_actor
partitions: NULL
type: index_merge
possible_keys: PRIMARY,idx_fk_film_id
key: PRIMARY,idx_fk_film_id
key_len: 2,2
ref: NULL
rows: 29
filtered: 100.00
Extra: Using union(PRIMARY,idx_fk_film_id); Using where

聚簇索引

聚簇索引的每个叶子节点都包含主键值、事务 ID、用于事务和 MVCC 的回滚指针,以及所有的剩余列。二级索引的叶子节点存储主键值而不是物理指针,所以在移动主键索引中的记录时不需要修改二级索引。如果按照主键顺序插入行,聚簇索引不会发生页分裂,从而插入性能更高(减少磁盘 I/O)也不会产生内部碎片。不应该使用随机的主键(例如 UUID),如果没有按照主键顺序插入数据,可以使用 OPTIMIZE TABLE 命令重新组织表来消除内部碎片。

对于高并发负载,按主键顺序插入会产生较多竞争,可以通过更改 innodb_autoinc_lock_mode 配置来提升性能。简单来说,有传统、连续和交错三种锁定模式,不同模式会根据语句的不同使用互斥锁(Lock)或者轻量级锁(CAS)。

查询优化

可以根据慢查询日志来优化查询,使用 EXPLAIN 查看执行计划。基本准则:只选择需要的列,而不是使用 SELECT *,从而允许覆盖索引优化、减少时间(I/O)和空间开销。不要重复执行相同的查询,可以在应用层使用缓存。对于不是很重要的查询,可以将大查询分解为小查询,将查询的时间分散到一个时间段中,减少查询对服务器性能的影响(减少单次查询持有锁的时间,以及避免事务日志堆积)。例如:DELETE 大量数据,可以使用 LIMIT 分解执行;当中间查询结果能缓存和重用时,可以将连接查询分解,然后在应用层做连接。

应用 WHERE 条件的方式:① 将条件从服务器下推到存储引擎,直接在索引中使用条件过滤记录(索引条件下推,ICP),从而减少服务器访问存储引擎的次数,以及存储引擎访问基表的次数。② 使用覆盖索引获取记录(不需要回表),然后在服务器使用条件过滤记录。③ 回表之后,在服务器层使用条件。

MySQL 的局限性:无法将 UNION 外层的条件下推到内层。某些时候,等值传递会有问题(详细看书)。针对单个语句而言,无法利用多核并行执行查询,无法同时对某个表进行查询和更新(特指相关子查询)。

UNION

由于 UNION 查询无法使用到外层条件,所以需要手动将条件下推到 UNION 的各个子查询中。最好使用 UNION ALL 而不是 UNION,这样可以避免对临时表去重(UNION 查询总是会创建临时表)。

COUNT

COUNT(*) 统计行数,COUNT(expr) 统计非 NULL 值的数量。性能 COUNT(*)=COUNT(1)>COUNT(主键列)>COUNT(普通列),如下所示 COUNT(*) 实际上是 COUNT(0)。根据条件统计数量,可以使用 SUM(IF(expr, 1, 0)) 或者 COUNT(expr OR NULL)。如果允许使用近似值代替精确值,则可以去掉 WHERE 和 DISTINCT 之类的条件,优化查询性能。利用索引覆盖扫描优化性能,因为索引会比基表更小,减少 I/O 次数。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
mysql> EXPLAIN SELECT COUNT(*) FROM film\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
partitions: NULL
type: index
possible_keys: NULL
key: idx_fk_language_id
key_len: 1
ref: NULL
rows: 1000
filtered: 100.00
Extra: Using index
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select count(0) AS `COUNT(*)` from `sakila`.`film`
1 row in set (0.00 sec)

IN & EXIST

常见的说法是,在子查询数据较少时使用 IN,而在子查询数据较大时使用 EXIST。因为使用 IN 是不相关子查询,会创建临时表,然后在临时表中查找匹配的数据。而使用 EXIST 是相关子查询,会直接在内表中查找匹配的数据(多次执行子查询)。但是,实际上优化器会做优化,使用 IN 并不意味着就会创建临时表。下面查询所有没有交易记录的顾客信息,执行计划显示该查询被转化为相关子查询,会使用覆盖索引查找匹配的数据。(推荐阅读 Optimizing Subqueries with the EXISTS Strategy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
mysql> EXPLAIN SELECT * FROM customer WHERE customer_id IN (
-> SELECT customer_id FROM payment
-> )\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: customer
partitions: NULL
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 599
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: payment
partitions: NULL
type: ref
possible_keys: idx_fk_customer_id
key: idx_fk_customer_id
key_len: 2
ref: sakila.customer.customer_id
rows: 1
filtered: 100.00
Extra: Using index; FirstMatch(customer)
2 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `sakila`.`customer`.`customer_id` AS `customer_id`,`sakila`.`customer`.`store_id` AS `store_id`,`sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name`,`sakila`.`customer`.`email` AS `email`,`sakila`.`customer`.`address_id` AS `address_id`,`sakila`.`customer`.`active` AS `active`,`sakila`.`customer`.`create_date` AS `create_date`,`sakila`.`customer`.`last_update` AS `last_update` from `sakila`.`customer` semi join (`sakila`.`payment`) where (`sakila`.`payment`.`customer_id` = `sakila`.`customer`.`customer_id`)
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM customer WHERE customer_id IN (
-> SELECT customer_id FROM payment
-> )\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop semijoin (cost=271 rows=599) (actual time=0.0621..2.97 rows=599 loops=1)
-> Table scan on customer (cost=61.2 rows=599) (actual time=0.048..0.401 rows=599 loops=1)
-> Covering index lookup on payment using idx_fk_customer_id (customer_id=customer.customer_id) (cost=0.25 rows=1) (actual time=0.00418..0.00418 rows=1 loops=599)

1 row in set (0.00 sec)

而使用 EXIST 也不意味着执行相关子查询,如果将 payment 中的索引 idx_fk_customer_id 删除,然后重新执行上述查询。执行计划显示,子查询会生成索引临时表,索引用于去重,然后外表使用该索引查找匹配的数据。如果临时表较小,则会使用 MEMORY 存储引擎创建内存临时表,否则使用 InnoDB 存储引擎创建磁盘临时表。(推荐阅读 Optimizing Subqueries with Materialization

1
ALTER TABLE payment DROP FOREIGN KEY fk_payment_customer, DROP INDEX idx_fk_customer_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
mysql> EXPLAIN SELECT * FROM customer WHERE EXISTS (
-> SELECT 1 FROM payment WHERE customer.customer_id = customer_id
-> )\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: customer
partitions: NULL
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 599
filtered: 100.00
Extra: NULL
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: <subquery2>
partitions: NULL
type: eq_ref
possible_keys: <auto_distinct_key>
key: <auto_distinct_key>
key_len: 2
ref: sakila.customer.customer_id
rows: 1
filtered: 100.00
Extra: NULL
*************************** 3. row ***************************
id: 2
select_type: MATERIALIZED
table: payment
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 16086
filtered: 100.00
Extra: NULL
3 rows in set, 2 warnings (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1276
Message: Field or reference 'sakila.customer.customer_id' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `sakila`.`customer`.`customer_id` AS `customer_id`,`sakila`.`customer`.`store_id` AS `store_id`,`sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name`,`sakila`.`customer`.`email` AS `email`,`sakila`.`customer`.`address_id` AS `address_id`,`sakila`.`customer`.`active` AS `active`,`sakila`.`customer`.`create_date` AS `create_date`,`sakila`.`customer`.`last_update` AS `last_update` from `sakila`.`customer` semi join (`sakila`.`payment`) where (`<subquery2>`.`customer_id` = `sakila`.`customer`.`customer_id`)
2 rows in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM customer WHERE EXISTS (
-> SELECT 1 FROM payment WHERE customer.customer_id = customer_id
-> )\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=963672 rows=9.64e+6) (actual time=6.27..6.92 rows=599 loops=1)
-> Table scan on customer (cost=61.2 rows=599) (actual time=0.0789..0.422 rows=599 loops=1)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (customer_id=customer.customer_id) (cost=3241..3241 rows=1) (actual time=0.0107..0.0107 rows=1 loops=599)
-> Materialize with deduplication (cost=3241..3241 rows=16086) (actual time=6.18..6.18 rows=599 loops=1)
-> Table scan on payment (cost=1633 rows=16086) (actual time=0.245..3.54 rows=16044 loops=1)

1 row in set, 1 warning (0.01 sec)

JOIN

确保 ON 或者 USING 的列上有索引。确保 GROUP BY 和 ORDER BY 只涉及一个表中的列,从而允许利用索引优化(松散/紧密索引扫描、利用索引排序)。如果需要对聚合的结果做超级聚合,可以使用 GROUP BY xxx WITH ROLLUP(注意查看查询计划,确定是否有性能问题),也可以使用其他等价语句或者在应用层聚合。

上面执行 SHOW WARNINGS\G 都会显示半连接(semi join),所谓半连接就是从左表中查询和右表匹配的行。反连接(anti join)正好相反,从左表中查询和右表不匹配的行。下面的查询和上面的等价,但是更慢,没有使用半连接,在覆盖索引中使用 LIMIT 1 去重,在最后使用临时表去重(实际上没有必要,LIMIT 1 已经去重)。(推荐阅读 Optimizing IN and EXISTS Subquery Predicates with Semijoin and Antijoin Transformations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
mysql> EXPLAIN SELECT DISTINCT customer.* FROM customer INNER JOIN payment USING(customer_id)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: customer
partitions: NULL
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 599
filtered: 100.00
Extra: Using temporary
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: payment
partitions: NULL
type: ref
possible_keys: idx_fk_customer_id
key: idx_fk_customer_id
key_len: 2
ref: sakila.customer.customer_id
rows: 1
filtered: 100.00
Extra: Using index; Distinct
2 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select distinct `sakila`.`customer`.`customer_id` AS `customer_id`,`sakila`.`customer`.`store_id` AS `store_id`,`sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name`,`sakila`.`customer`.`email` AS `email`,`sakila`.`customer`.`address_id` AS `address_id`,`sakila`.`customer`.`active` AS `active`,`sakila`.`customer`.`create_date` AS `create_date`,`sakila`.`customer`.`last_update` AS `last_update` from `sakila`.`customer` join `sakila`.`payment` where (`sakila`.`payment`.`customer_id` = `sakila`.`customer`.`customer_id`)
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT DISTINCT customer.* FROM customer INNER JOIN payment USING(customer_id)\G
*************************** 1. row ***************************
EXPLAIN: -> Table scan on <temporary> (cost=331..341 rows=599) (actual time=3.3..3.38 rows=599 loops=1)
-> Temporary table with deduplication (cost=331..331 rows=599) (actual time=3.3..3.3 rows=599 loops=1)
-> Nested loop inner join (cost=271 rows=599) (actual time=0.0668..2.13 rows=599 loops=1)
-> Table scan on customer (cost=61.2 rows=599) (actual time=0.0492..0.404 rows=599 loops=1)
-> Limit: 1 row(s) (cost=0.25 rows=1) (actual time=0.00271..0.00273 rows=1 loops=599)
-> Covering index lookup on payment using idx_fk_customer_id (customer_id=customer.customer_id) (cost=0.25 rows=1) (actual time=0.00262..0.00262 rows=1 loops=599)

1 row in set (0.00 sec)

LIMIT

如果偏移量很大,例如 LIMIT 10000, 20,那么会扫描 10020 条记录,而只有最后 20 条是有效的。如果行中有很多数据,那么 I/O 次数就会很多。MySQL 的 LIMIT OFFSET 似乎不会下推到索引,从而 10020 条记录都会回表查询。查询计划中确实没有下推,奇怪的是索引只会扫描 10020 条记录,说明索引是有 LIMIT OFFSET 信息的。(参考 Limit Offset 下推

优化方式:① 使用覆盖索引执行 LIMIT,索引中只包含必要的列(如 ORDER BY 的列,当然默认包含主键),那么 LIMIT 扫描的数据会减少很多,最后使用主键做连接得到完整数据。(不能使用 IN,因为 ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery')② 如果只允许顺序翻页的话,通过记录上个页面的边界值,下次查询就可以使用该值定位到目标位置,而不会做无效的扫描。③ 一次性获取多页数据,在应用层缓存起来(类似缓存 I/O)。④ 其他方法,使用预先计算的汇总表,或者使用冗余表。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
mysql> EXPLAIN ANALYZE SELECT * FROM payment ORDER BY customer_id LIMIT 10000, 20\G
*************************** 1. row ***************************
EXPLAIN: -> Limit/Offset: 20/10000 row(s) (cost=1633 rows=20) (actual time=11.6..11.6 rows=20 loops=1)
-> Sort: payment.customer_id, limit input to 10020 row(s) per chunk (cost=1633 rows=16086) (actual time=10.6..11.4 rows=10020 loops=1)
-> Table scan on payment (cost=1633 rows=16086) (actual time=0.434..5.5 rows=16044 loops=1)

1 row in set (0.01 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM payment FORCE INDEX (idx_fk_customer_id) ORDER BY customer_id LIMIT 10000, 20\G
*************************** 1. row ***************************
EXPLAIN: -> Limit/Offset: 20/10000 row(s) (cost=3129 rows=20) (actual time=11.1..11.1 rows=20 loops=1)
-> Index scan on payment using idx_fk_customer_id (cost=3129 rows=10020) (actual time=1.14..10.8 rows=10020 loops=1)

1 row in set (0.01 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM payment INNER JOIN (SELECT payment_id FROM payment ORDER BY customer_id LIMIT 10000, 20) AS lim USING(payment_id)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=3151 rows=20) (actual time=2.29..2.32 rows=20 loops=1)
-> Table scan on lim (cost=641..644 rows=20) (actual time=2.27..2.28 rows=20 loops=1)
-> Materialize (cost=641..641 rows=20) (actual time=2.27..2.27 rows=20 loops=1)
-> Limit/Offset: 20/10000 row(s) (cost=639 rows=20) (actual time=2.12..2.12 rows=20 loops=1)
-> Covering index scan on payment using idx_fk_customer_id (cost=639 rows=10020) (actual time=0.178..1.89 rows=10020 loops=1)
-> Single-row index lookup on payment using PRIMARY (payment_id=lim.payment_id) (cost=0.25 rows=1) (actual time=0.00204..0.00207 rows=1 loops=20)

1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM payment WHERE payment_id IN (SELECT payment_id FROM (SELECT payment_id FROM payment ORDER BY customer_id LIMIT 10000, 20) AS lim)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join (cost=35413 rows=321720) (actual time=11.3..15.9 rows=20 loops=1)
-> Table scan on payment (cost=1633 rows=16086) (actual time=0.382..6.93 rows=16044 loops=1)
-> Single-row index lookup on <subquery2> using <auto_distinct_key> (payment_id=payment.payment_id) (cost=646..646 rows=1) (actual time=460e-6..460e-6 rows=0.00125 loops=16044)
-> Materialize with deduplication (cost=646..646 rows=20) (actual time=2.07..2.07 rows=20 loops=1)
-> Table scan on lim (cost=641..644 rows=20) (actual time=2.06..2.06 rows=20 loops=1)
-> Materialize (cost=641..641 rows=20) (actual time=2.06..2.06 rows=20 loops=1)
-> Limit/Offset: 20/10000 row(s) (cost=639 rows=20) (actual time=2.05..2.06 rows=20 loops=1)
-> Covering index scan on payment using idx_fk_customer_id (cost=639 rows=10020) (actual time=0.095..1.82 rows=10020 loops=1)

1 row in set (0.02 sec)

NOT IN

从文本文件读取数据,num 列有 100 万个 1、100 万个 2 和 1 个 3,测试一下 NOT IN (1, 2) 是否会使用索引。EXPLAIN ANALYZE 的结果显示,查询被分为三个区间,这样就可以利用索引范围扫描。(推荐阅读 Optimizing INSERT StatementsMySQL EXPLAIN ANALYZEA must-know about NOT IN in SQL - more antijoin optimization)。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
SET @@GLOBAL.local_infile = 1;
CREATE TABLE test (num TINYINT NOT NULL, dummy TINYINT NOT NULL DEFAULT 0);
LOAD DATA LOCAL INFILE 'data.txt' INTO TABLE test (num);
ALTER TABLE test ADD INDEX idx_num (num);

mysql> EXPLAIN SELECT * FROM test WHERE num NOT IN (1, 2)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test
partitions: NULL
type: range
possible_keys: idx_num
key: idx_num
key_len: 1
ref: NULL
rows: 3
filtered: 100.00
Extra: Using index condition
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `sakila`.`test`.`num` AS `num`,`sakila`.`test`.`dummy` AS `dummy` from `sakila`.`test` where (`sakila`.`test`.`num` not in (1,2))
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM test WHERE num NOT IN (1, 2)\G
*************************** 1. row ***************************
EXPLAIN: -> Index range scan on test using idx_num over (num < 1) OR (1 < num < 2) OR (2 < num), with index condition: (test.num not in (1,2)) (cost=3.38 rows=3) (actual time=0.0355..0.0383 rows=1 loops=1)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM test WHERE num NOT IN (3)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: test
partitions: NULL
type: ALL
possible_keys: idx_num
key: NULL
key_len: NULL
ref: NULL
rows: 1996905
filtered: 50.00
Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: /* select#1 */ select `sakila`.`test`.`num` AS `num`,`sakila`.`test`.`dummy` AS `dummy` from `sakila`.`test` where (`sakila`.`test`.`num` <> 3)
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM test WHERE num NOT IN (3)\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (test.num <> 3) (cost=201305 rows=998453) (actual time=0.0239..1133 rows=2e+6 loops=1)
-> Table scan on test (cost=201305 rows=2e+6) (actual time=0.0222..976 rows=2e+6 loops=1)

1 row in set (1.26 sec)