Ligh0x74's Blog

2023-01-01发表2025-04-08更新数据库 / MySQL37 分钟读完 (大约5548个字)

参考《高性能 MySQL》和官方文档。使用 MySQL 的 Sakila 示例数据库，使用 MySQL 版本 8.0.41 + InnoDB 存储引擎。各种术语的官方定义可以看 MySQL 术语表，锁相关的内容可以看 InnoDB Locking，更多数据库概念可以看 CUM 15-445 课程总结。任何描述都是简化的，不会涵盖所有情况，更多细节只能看文档。或者说没有必要去记细节，任何细节都取决于实现，而实现会随着版本更新而变化，需要学习的是整体的策略。

基本概念

快照（snapshot）：数据在特定时间的表示。一致性读（consistent read）：根据快照显示查询结果，也被称为一致性非锁定读。在读已提交和可重复读隔离级别下执行 SELECT 语句的默认模式是一致性读，也就是说会使用多版本并发控制（MVCC）读取数据。读已提交在每次执行一致性读时都会重置快照，而可重复读只在第一次一致性读时建立快照。锁定读（locking read）：使用 SELECT ... FOR SHARE 或者 SELECT ... FOR UPDATE 读取数据，会加读锁或者写锁，在事务提交或者回滚时释放（2PL）。

MVCC 是使用撤销日志（Undo Log）和读取视图（Read View）实现的。事务在修改记录之后会记录撤销日志，事务的回滚指针会指向该日志。读取视图包含一致性读不可见的事务 ID（事务 ID 不会在启动事务之后立即分配），读取记录时会比较记录的事务 ID 和读取视图，如果该记录对当前事务不可见，则执行撤销日志直到达到可见状态。删除操作被视为修改操作，通过修改删除标志位实现，只有当该版本记录对所有事务不可见时，才会被真正删除。撤销日志也会被记录到重做日志（Redo Log）中，实现崩溃恢复。

关于幻读的问题：DML 语句可以破坏快照，从而在可重复读隔离级别会出现幻读。（参考一致性读文档）如果将快照读和当前读混用，也会出现幻读。

SELECT COUNT(c1) FROM t1 WHERE c1 = 'xyz';
-- Returns 0: no rows match.
DELETE FROM t1 WHERE c1 = 'xyz';
-- Deletes several rows recently committed by other transaction.

SELECT COUNT(c2) FROM t1 WHERE c2 = 'abc';
-- Returns 0: no rows match.
UPDATE t1 SET c2 = 'cba' WHERE c2 = 'abc';
-- Affects 10 rows: another txn just committed 10 rows with 'abc' values.
SELECT COUNT(c2) FROM t1 WHERE c2 = 'cba';
-- Returns 10: this txn can now see the rows it just updated.

常用命令

SHOW FULL PROCESSLIST;
SHOW ENGINE INNODB STATUS;
SELECT * FROM performance_schema.data_locks\G
SHOW STATUS LIKE 'Last_query_cost';
SHOW [GLOBAL | SESSION] VARIABLES [LIKE 'pattern' | WHERE expr];
START TRANSACTION; BEGIN; COMMIT; ROLLBACK;
[CREATE | ALTER | DROP | OPTIMIZE | ANALYZE | CHECK | REPAIR ] TABLE tbl_name;
SHOW TABLE STATUS [LIKE 'pattern' | WHERE expr];
SHOW INDEX FROM tbl_name;
EXPLAIN [FORMAT=TREE | ANALYZE] select_statement; SHOW WARNINGS;

前缀索引

使用前缀索引可以减少索引的空间开销，内部节点的字符串比较会更高效，但是选择性会更低，那么匹配的主键会更多，从而需要更多次回表。所以需要选择合适的前缀长度，尽可能提高索引的选择性。索引的选择性是指不重复的索引值（基数，cardinality）和数据表的记录总数的比值。无法利用前缀索引执行分组、排序和覆盖索引扫描。

执行如下语句，city_demo 最终有 19200 条记录。

CREATE TABLE city_demo(city VARCHAR(50) NOT NULL);
INSERT INTO city_demo(city) SELECT city FROM city;
INSERT INTO city_demo(city) SELECT city FROM city_demo; -- 重复执行 5 次
UPDATE city_demo SET city = (SELECT city FROM city ORDER BY RAND() LIMIT 1); -- 随机化数据

查询出现次数最多的前 10 个城市，然后使用各种前缀测试索引的选择性。前缀长度为 7 比较合适，因为统计数量的偏差不算很大。

SELECT COUNT(*) AS c, city
FROM city_demo GROUP BY city ORDER BY c DESC LIMIT 10;
+----+-----------------+
| c  | city            |
+----+-----------------+
| 54 | Ondo            |
| 51 | London          |
| 49 | Olomouc         |
| 49 | Pontianak       |
| 47 | Kurgan          |
| 46 | Almirante Brown |
| 46 | Changzhou       |
| 46 | Funafuti        |
| 46 | Jodhpur         |
| 45 | Plock           |
+----+-----------------+
SELECT COUNT(*) AS c, LEFT(city, 3) AS pref
FROM city_demo GROUP BY pref ORDER BY c DESC LIMIT 10;
+-----+------+
| c   | pref |
+-----+------+
| 477 | San  |
| 197 | Cha  |
| 171 | Tan  |
| 157 | al-  |
| 152 | Sou  |
| 148 | Bat  |
| 146 | Sal  |
| 146 | Shi  |
| 126 | Kam  |
| 125 | Val  |
+-----+------+
SELECT COUNT(*) AS c, LEFT(city, 7) AS pref
FROM city_demo GROUP BY pref ORDER BY c DESC LIMIT 10;
+----+---------+
| c  | pref    |
+----+---------+
| 76 | San Fel |
| 66 | Valle d |
| 63 | Santiag |
| 54 | Ondo    |
| 51 | London  |
| 49 | Pontian |
| 49 | Olomouc |
| 47 | Kurgan  |
| 46 | Jodhpur |
| 46 | Almiran |
+----+---------+

不能只根据前缀索引选择性的值确定前缀长度，例如前缀长度为 5 的选择性看上去很接近完整列的选择性，但是如果查看出现次数最多的前 10 个城市，和完整列的结果相比，会发现数据分布很不均匀。如果查询以 South 前缀的某个城市，那么回表的次数会更多。

SELECT COUNT(DISTINCT city) / COUNT(*) AS sel,
COUNT(DISTINCT LEFT(city, 3)) / COUNT(*) AS sel3,
COUNT(DISTINCT LEFT(city, 4)) / COUNT(*) AS sel4,
COUNT(DISTINCT LEFT(city, 5)) / COUNT(*) AS sel5,
COUNT(DISTINCT LEFT(city, 6)) / COUNT(*) AS sel6,
COUNT(DISTINCT LEFT(city, 7)) / COUNT(*) AS sel7
FROM city_demo;
+--------+--------+--------+--------+--------+--------+
| sel    | sel3   | sel4   | sel5   | sel6   | sel7   |
+--------+--------+--------+--------+--------+--------+
| 0.0312 | 0.0236 | 0.0293 | 0.0305 | 0.0309 | 0.0310 |
+--------+--------+--------+--------+--------+--------+
SELECT COUNT(*) AS c, LEFT(city, 5) AS pref
FROM city_demo GROUP BY pref ORDER BY c DESC LIMIT 10;
+-----+--------+
| c   | pref   |
+-----+--------+
| 118 | South  |
| 104 | Santa  |
|  78 | Chang  |
|  76 | San F  |
|  69 | Toulo  |
|  68 | Xi´an  |
|  66 | Valle  |
|  64 | Saint  |
|  64 | Shimo  |
|  63 | Santi  |
+-----+--------+

多列索引

在 film_actor 上有主键索引 PRIMARY KEY (actor_id,film_id)，有普通索引 KEY idx_fk_film_id (film_id)，以下查询计划表示合并两个索引的查询结果。索引合并策略有时效果不错，但更多时候表明表的索引建得很槽糕。如果优化器需要对多个索引做相交操作（由于多个 AND 条件），那么查看是否可以使用多列索引进行优化。

如果优化器需要对多个索引做联合操作（由于多个 OR 条件），且索引的选择性不高时，通常会在缓存、排序和合并上消耗大量 CPU 和内存资源。然而，优化器不会将这些操作计算到查询成本中，优化器只关心随机页面读取，所以有时索引合并的性能还不如全表扫描。这还会影响并发的查询，此时使用 UNION 改写，将单个查询拆分为多个查询，可以避免单个查询的执行时间过长，影响其他并发的查询（由于 2PL 协议，单个查询会在提交时才释放锁，当然不同隔离级别有细微差别）。

EXPLAIN SELECT film_id, actor_id FROM film_actor
WHERE actor_id = 1 OR film_id = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film_actor
   partitions: NULL
         type: index_merge
possible_keys: PRIMARY,idx_fk_film_id
          key: PRIMARY,idx_fk_film_id
      key_len: 2,2
          ref: NULL
         rows: 29
     filtered: 100.00
        Extra: Using union(PRIMARY,idx_fk_film_id); Using where

聚簇索引

聚簇索引的每个叶子节点都包含主键值、事务 ID、用于事务和 MVCC 的回滚指针，以及所有的剩余列。二级索引的叶子节点存储主键值而不是物理指针，所以在移动主键索引中的记录时不需要修改二级索引。如果按照主键顺序插入行，聚簇索引不会发生页分裂，从而插入性能更高（减少磁盘 I/O）也不会产生内部碎片。不应该使用随机的主键（例如 UUID），如果没有按照主键顺序插入数据，可以使用 OPTIMIZE TABLE 命令重新组织表来消除内部碎片。

对于高并发负载，按主键顺序插入会产生较多竞争，可以通过更改 innodb_autoinc_lock_mode 配置来提升性能。简单来说，有传统、连续和交错三种锁定模式，不同模式会根据语句的不同使用互斥锁（Lock）或者轻量级锁（CAS）。

对非唯一二级索引更新操作的优化：当更新非唯一二级索引时，如果页面不在缓冲池中，则使用 Change Buffer 缓存更新操作。当其他读取操作将页面读到缓冲池中时，则会合并缓冲池中的更新操作，从而减少随机磁盘 I/O。

查询优化

可以根据慢查询日志来优化查询，使用 EXPLAIN 查看执行计划。基本准则：只选择需要的列，而不是使用 SELECT *，从而允许覆盖索引优化、减少时间（I/O）和空间开销。不要重复执行相同的查询，可以在应用层使用缓存。对于不是很重要的查询，可以将大查询分解为小查询，将查询的时间分散到一个时间段中，减少查询对服务器性能的影响（减少单次查询持有锁的时间，以及避免事务日志堆积）。例如：DELETE 大量数据，可以使用 LIMIT 分解执行；当中间查询结果能缓存和重用时，可以将连接查询分解，然后在应用层做连接。

应用 WHERE 条件的方式：① 将条件从服务器下推到存储引擎，直接在索引中使用条件过滤记录（索引条件下推，ICP），从而减少服务器访问存储引擎的次数，以及存储引擎访问基表的次数。② 使用覆盖索引获取记录（不需要回表），然后在服务器使用条件过滤记录。③ 回表之后，在服务器层使用条件。

MySQL 的局限性：无法将 UNION 外层的条件下推到内层。某些时候，等值传递会有问题（详细看书）。针对单个语句而言，无法利用多核并行执行查询，无法同时对某个表进行查询和更新（特指相关子查询）。

UNION

由于 UNION 查询无法使用到外层条件，所以需要手动将条件下推到 UNION 的各个子查询中。最好使用 UNION ALL 而不是 UNION，这样可以避免对临时表去重（UNION 查询总是会创建临时表）。

COUNT

COUNT(*) 统计行数，COUNT(expr) 统计非 NULL 值的数量。性能 COUNT(*)=COUNT(1)>COUNT(主键列)>COUNT(普通列)，如下所示 COUNT(*) 实际上是 COUNT(0)。根据条件统计数量，可以使用 SUM(IF(expr, 1, 0)) 或者 COUNT(expr OR NULL)。如果允许使用近似值代替精确值，则可以去掉 WHERE 和 DISTINCT 之类的条件，优化查询性能。利用索引覆盖扫描优化性能，因为索引会比基表更小，减少 I/O 次数。

mysql> EXPLAIN SELECT COUNT(*) FROM film\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: film
   partitions: NULL
         type: index
possible_keys: NULL
          key: idx_fk_language_id
      key_len: 1
          ref: NULL
         rows: 1000
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select count(0) AS `COUNT(*)` from `sakila`.`film`
1 row in set (0.00 sec)

IN & EXIST

常见的说法是，在子查询数据较少时使用 IN，而在子查询数据较大时使用 EXIST。因为使用 IN 是不相关子查询，会创建临时表，然后在临时表中查找匹配的数据。而使用 EXIST 是相关子查询，会直接在内表中查找匹配的数据（多次执行子查询）。但是，实际上优化器会做优化，使用 IN 并不意味着就会创建临时表。下面查询所有没有交易记录的顾客信息，执行计划显示该查询被转化为相关子查询，会使用覆盖索引查找匹配的数据。（推荐阅读 Optimizing Subqueries with the EXISTS Strategy）

mysql> EXPLAIN SELECT * FROM customer WHERE customer_id IN (
    ->   SELECT customer_id FROM payment
    -> )\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 599
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: payment
   partitions: NULL
         type: ref
possible_keys: idx_fk_customer_id
          key: idx_fk_customer_id
      key_len: 2
          ref: sakila.customer.customer_id
         rows: 1
     filtered: 100.00
        Extra: Using index; FirstMatch(customer)
2 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `sakila`.`customer`.`customer_id` AS `customer_id`,`sakila`.`customer`.`store_id` AS `store_id`,`sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name`,`sakila`.`customer`.`email` AS `email`,`sakila`.`customer`.`address_id` AS `address_id`,`sakila`.`customer`.`active` AS `active`,`sakila`.`customer`.`create_date` AS `create_date`,`sakila`.`customer`.`last_update` AS `last_update` from `sakila`.`customer` semi join (`sakila`.`payment`) where (`sakila`.`payment`.`customer_id` = `sakila`.`customer`.`customer_id`)
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM customer WHERE customer_id IN (
    ->   SELECT customer_id FROM payment
    -> )\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop semijoin  (cost=271 rows=599) (actual time=0.0621..2.97 rows=599 loops=1)
    -> Table scan on customer  (cost=61.2 rows=599) (actual time=0.048..0.401 rows=599 loops=1)
    -> Covering index lookup on payment using idx_fk_customer_id (customer_id=customer.customer_id)  (cost=0.25 rows=1) (actual time=0.00418..0.00418 rows=1 loops=599)

1 row in set (0.00 sec)

而使用 EXIST 也不意味着执行相关子查询，如果将 payment 中的索引 idx_fk_customer_id 删除，然后重新执行上述查询。执行计划显示，子查询会生成索引临时表，索引用于去重，然后外表使用该索引查找匹配的数据。如果临时表较小，则会使用 MEMORY 存储引擎创建内存临时表，否则使用 InnoDB 存储引擎创建磁盘临时表。（推荐阅读 Optimizing Subqueries with Materialization）

1	ALTER TABLE payment DROP FOREIGN KEY fk_payment_customer, DROP INDEX idx_fk_customer_id;

mysql> EXPLAIN SELECT * FROM customer WHERE EXISTS (
    ->   SELECT 1 FROM payment WHERE customer.customer_id = customer_id
    -> )\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 599
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: <subquery2>
   partitions: NULL
         type: eq_ref
possible_keys: <auto_distinct_key>
          key: <auto_distinct_key>
      key_len: 2
          ref: sakila.customer.customer_id
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 3. row ***************************
           id: 2
  select_type: MATERIALIZED
        table: payment
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 16086
     filtered: 100.00
        Extra: NULL
3 rows in set, 2 warnings (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1276
Message: Field or reference 'sakila.customer.customer_id' of SELECT #2 was resolved in SELECT #1
*************************** 2. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `sakila`.`customer`.`customer_id` AS `customer_id`,`sakila`.`customer`.`store_id` AS `store_id`,`sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name`,`sakila`.`customer`.`email` AS `email`,`sakila`.`customer`.`address_id` AS `address_id`,`sakila`.`customer`.`active` AS `active`,`sakila`.`customer`.`create_date` AS `create_date`,`sakila`.`customer`.`last_update` AS `last_update` from `sakila`.`customer` semi join (`sakila`.`payment`) where (`<subquery2>`.`customer_id` = `sakila`.`customer`.`customer_id`)
2 rows in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM customer WHERE EXISTS (
    ->   SELECT 1 FROM payment WHERE customer.customer_id = customer_id
    -> )\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=963672 rows=9.64e+6) (actual time=6.27..6.92 rows=599 loops=1)
    -> Table scan on customer  (cost=61.2 rows=599) (actual time=0.0789..0.422 rows=599 loops=1)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (customer_id=customer.customer_id)  (cost=3241..3241 rows=1) (actual time=0.0107..0.0107 rows=1 loops=599)
        -> Materialize with deduplication  (cost=3241..3241 rows=16086) (actual time=6.18..6.18 rows=599 loops=1)
            -> Table scan on payment  (cost=1633 rows=16086) (actual time=0.245..3.54 rows=16044 loops=1)

1 row in set, 1 warning (0.01 sec)

JOIN

确保 ON 或者 USING 的列上有索引。确保 GROUP BY 和 ORDER BY 只涉及一个表中的列，从而允许利用索引优化（松散/紧密索引扫描、利用索引排序）。如果需要对聚合的结果做超级聚合，可以使用 GROUP BY xxx WITH ROLLUP（注意查看查询计划，确定是否有性能问题），也可以使用其他等价语句或者在应用层聚合。

上面执行 SHOW WARNINGS\G 都会显示半连接（semi join），所谓半连接就是从左表中查询和右表匹配的行。反连接（anti join）正好相反，从左表中查询和右表不匹配的行。下面的查询和上面的等价，但是更慢，没有使用半连接，在覆盖索引中使用 LIMIT 1 去重，在最后使用临时表去重（实际上没有必要，LIMIT 1 已经去重）。（推荐阅读 Optimizing IN and EXISTS Subquery Predicates with Semijoin and Antijoin Transformations）

mysql> EXPLAIN SELECT DISTINCT customer.* FROM customer INNER JOIN payment USING(customer_id)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: customer
   partitions: NULL
         type: ALL
possible_keys: PRIMARY
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 599
     filtered: 100.00
        Extra: Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: payment
   partitions: NULL
         type: ref
possible_keys: idx_fk_customer_id
          key: idx_fk_customer_id
      key_len: 2
          ref: sakila.customer.customer_id
         rows: 1
     filtered: 100.00
        Extra: Using index; Distinct
2 rows in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select distinct `sakila`.`customer`.`customer_id` AS `customer_id`,`sakila`.`customer`.`store_id` AS `store_id`,`sakila`.`customer`.`first_name` AS `first_name`,`sakila`.`customer`.`last_name` AS `last_name`,`sakila`.`customer`.`email` AS `email`,`sakila`.`customer`.`address_id` AS `address_id`,`sakila`.`customer`.`active` AS `active`,`sakila`.`customer`.`create_date` AS `create_date`,`sakila`.`customer`.`last_update` AS `last_update` from `sakila`.`customer` join `sakila`.`payment` where (`sakila`.`payment`.`customer_id` = `sakila`.`customer`.`customer_id`)
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT DISTINCT customer.* FROM customer INNER JOIN payment USING(customer_id)\G
*************************** 1. row ***************************
EXPLAIN: -> Table scan on <temporary>  (cost=331..341 rows=599) (actual time=3.3..3.38 rows=599 loops=1)
    -> Temporary table with deduplication  (cost=331..331 rows=599) (actual time=3.3..3.3 rows=599 loops=1)
        -> Nested loop inner join  (cost=271 rows=599) (actual time=0.0668..2.13 rows=599 loops=1)
            -> Table scan on customer  (cost=61.2 rows=599) (actual time=0.0492..0.404 rows=599 loops=1)
            -> Limit: 1 row(s)  (cost=0.25 rows=1) (actual time=0.00271..0.00273 rows=1 loops=599)
                -> Covering index lookup on payment using idx_fk_customer_id (customer_id=customer.customer_id)  (cost=0.25 rows=1) (actual time=0.00262..0.00262 rows=1 loops=599)

1 row in set (0.00 sec)

LIMIT

如果偏移量很大，例如 LIMIT 10000, 20，那么会扫描 10020 条记录，而只有最后 20 条是有效的。如果行中有很多数据，那么 I/O 次数就会很多。MySQL 的 LIMIT OFFSET 似乎不会下推到索引，从而 10020 条记录都会回表查询。查询计划中确实没有下推，奇怪的是索引只会扫描 10020 条记录，说明索引是有 LIMIT OFFSET 信息的。（参考 Limit Offset 下推）

优化方式：① 使用覆盖索引执行 LIMIT，索引中只包含必要的列（如 ORDER BY 的列，当然默认包含主键），那么 LIMIT 扫描的数据会减少很多，最后使用主键做连接得到完整数据。（不能使用 IN，因为 ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery'）② 如果只允许顺序翻页的话，通过记录上个页面的边界值，下次查询就可以使用该值定位到目标位置，而不会做无效的扫描。③ 一次性获取多页数据，在应用层缓存起来（类似缓存 I/O）。④ 其他方法，使用预先计算的汇总表，或者使用冗余表。

mysql> EXPLAIN ANALYZE SELECT * FROM payment ORDER BY customer_id LIMIT 10000, 20\G
*************************** 1. row ***************************
EXPLAIN: -> Limit/Offset: 20/10000 row(s)  (cost=1633 rows=20) (actual time=11.6..11.6 rows=20 loops=1)
    -> Sort: payment.customer_id, limit input to 10020 row(s) per chunk  (cost=1633 rows=16086) (actual time=10.6..11.4 rows=10020 loops=1)
        -> Table scan on payment  (cost=1633 rows=16086) (actual time=0.434..5.5 rows=16044 loops=1)

1 row in set (0.01 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM payment FORCE INDEX (idx_fk_customer_id) ORDER BY customer_id LIMIT 10000, 20\G
*************************** 1. row ***************************
EXPLAIN: -> Limit/Offset: 20/10000 row(s)  (cost=3129 rows=20) (actual time=11.1..11.1 rows=20 loops=1)
    -> Index scan on payment using idx_fk_customer_id  (cost=3129 rows=10020) (actual time=1.14..10.8 rows=10020 loops=1)

1 row in set (0.01 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM payment INNER JOIN (SELECT payment_id FROM payment ORDER BY customer_id LIMIT 10000, 20) AS lim USING(payment_id)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=3151 rows=20) (actual time=2.29..2.32 rows=20 loops=1)
    -> Table scan on lim  (cost=641..644 rows=20) (actual time=2.27..2.28 rows=20 loops=1)
        -> Materialize  (cost=641..641 rows=20) (actual time=2.27..2.27 rows=20 loops=1)
            -> Limit/Offset: 20/10000 row(s)  (cost=639 rows=20) (actual time=2.12..2.12 rows=20 loops=1)
                -> Covering index scan on payment using idx_fk_customer_id  (cost=639 rows=10020) (actual time=0.178..1.89 rows=10020 loops=1)
    -> Single-row index lookup on payment using PRIMARY (payment_id=lim.payment_id)  (cost=0.25 rows=1) (actual time=0.00204..0.00207 rows=1 loops=20)

1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM payment WHERE payment_id IN (SELECT payment_id FROM (SELECT payment_id FROM payment ORDER BY customer_id LIMIT 10000, 20) AS lim)\G
*************************** 1. row ***************************
EXPLAIN: -> Nested loop inner join  (cost=35413 rows=321720) (actual time=11.3..15.9 rows=20 loops=1)
    -> Table scan on payment  (cost=1633 rows=16086) (actual time=0.382..6.93 rows=16044 loops=1)
    -> Single-row index lookup on <subquery2> using <auto_distinct_key> (payment_id=payment.payment_id)  (cost=646..646 rows=1) (actual time=460e-6..460e-6 rows=0.00125 loops=16044)
        -> Materialize with deduplication  (cost=646..646 rows=20) (actual time=2.07..2.07 rows=20 loops=1)
            -> Table scan on lim  (cost=641..644 rows=20) (actual time=2.06..2.06 rows=20 loops=1)
                -> Materialize  (cost=641..641 rows=20) (actual time=2.06..2.06 rows=20 loops=1)
                    -> Limit/Offset: 20/10000 row(s)  (cost=639 rows=20) (actual time=2.05..2.06 rows=20 loops=1)
                        -> Covering index scan on payment using idx_fk_customer_id  (cost=639 rows=10020) (actual time=0.095..1.82 rows=10020 loops=1)

1 row in set (0.02 sec)

NOT IN

从文本文件读取数据，num 列有 100 万个 1、100 万个 2 和 1 个 3，测试一下 NOT IN (1, 2) 是否会使用索引。EXPLAIN ANALYZE 的结果显示，查询被分为三个区间，这样就可以利用索引范围扫描。（推荐阅读 Optimizing INSERT Statements，MySQL EXPLAIN ANALYZE，A must-know about NOT IN in SQL - more antijoin optimization）。

SET @@GLOBAL.local_infile = 1;
CREATE TABLE test (num TINYINT NOT NULL, dummy TINYINT NOT NULL DEFAULT 0);
LOAD DATA LOCAL INFILE 'data.txt' INTO TABLE test (num);
ALTER TABLE test ADD INDEX idx_num (num);

mysql> EXPLAIN SELECT * FROM test WHERE num NOT IN (1, 2)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test
   partitions: NULL
         type: range
possible_keys: idx_num
          key: idx_num
      key_len: 1
          ref: NULL
         rows: 3
     filtered: 100.00
        Extra: Using index condition
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `sakila`.`test`.`num` AS `num`,`sakila`.`test`.`dummy` AS `dummy` from `sakila`.`test` where (`sakila`.`test`.`num` not in (1,2))
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM test WHERE num NOT IN (1, 2)\G
*************************** 1. row ***************************
EXPLAIN: -> Index range scan on test using idx_num over (num < 1) OR (1 < num < 2) OR (2 < num), with index condition: (test.num not in (1,2))  (cost=3.38 rows=3) (actual time=0.0355..0.0383 rows=1 loops=1)

1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM test WHERE num NOT IN (3)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test
   partitions: NULL
         type: ALL
possible_keys: idx_num
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1996905
     filtered: 50.00
        Extra: Using where
1 row in set, 1 warning (0.00 sec)

mysql> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `sakila`.`test`.`num` AS `num`,`sakila`.`test`.`dummy` AS `dummy` from `sakila`.`test` where (`sakila`.`test`.`num` <> 3)
1 row in set (0.00 sec)

mysql> EXPLAIN ANALYZE SELECT * FROM test WHERE num NOT IN (3)\G
*************************** 1. row ***************************
EXPLAIN: -> Filter: (test.num <> 3)  (cost=201305 rows=998453) (actual time=0.0239..1133 rows=2e+6 loops=1)
    -> Table scan on test  (cost=201305 rows=2e+6) (actual time=0.0222..976 rows=2e+6 loops=1)

1 row in set (1.26 sec)