一些基础概念。

myisam和innodb

innodb:支持事务, 有外键, b+treey叶子是data,聚簇索引要有主键,不存表行,记录存内存

myisam:不支持事务,无外键,叶子是point,非聚簇索引,存表行,记录存磁盘

数据库索引

  • 分类
    按物理:聚簇索引,非聚簇索引
    按数据结构:B+,hash,全文索引, R-tree
    按逻辑:主键索引,唯一索引,组合索引,全文索引,普通索引

innoDB为什么用B+

→ 二叉树:高度较高 & B+树比较次数少 ,I/O少,速度快。O(logN)

→ hash: hash乱序,不支持范围查询、模糊查询和索引排序;有hash碰撞所以不稳定

→ 红黑树:可能退化成链表。加数据时还是会增加高度,对大量数据不友好。需要高度增加不快的树

→ B树: (百万量也可能只三到五层)

① Btree非叶子节点都存数据,B+只在叶子节点存数据,单个节点数据量小,一次I/O可得到更多节点

② B+叶子节点之间双链表连接,可实现范围、顺序查找。

索引使用

  • 适用:唯一性限制的字段。常用于where、group by、order by的字
  • 不需要的:where group order无的 | 数据量少 | 重复多的 | 经常更新的
  • 失效情况:在B+上查询时不能确定进一步子叶以缩小范围的情况:
    • like以%开头索引无效
    • 计算、函数、比较,not,<>,!=索引无效
    • 字符串不加引号导致类型转换
    • 联合索引时没最左
    • 在索引列上使用 IS NULL 或 IS NOT NULL操作

索引优化

回表:查询了非主键列,会先检索辅助索引的B+tree,找到主键值,再通过主键index获取整行。

  • 建立前缀索引:对某字段文本的前几个字符建立索引。
    减少索引大小。order by无法用
  • 覆盖索引:建立要查的所有列的索引。不用查整行,不用回表,减少I/O
  • 联合索引:最左匹配。
    · 区分度大的放最前
    · index有序 where后的可无序但一定要有最左列
    · 第一个>的本人可命中索引,之后的不命中

聚簇索引 & 非聚簇索引

  • 聚簇索引: 顺序就是数据的物理存储顺序

    聚簇索引的叶节点就是数据节点

  • 非聚簇索引:索引顺序与数据物理排列顺序无关,逻辑上连续,物理存储并不连续

    非聚簇索引的叶节点是指向数据节点的地址

聚簇索引的唯一性:
   正式聚簇索引的顺序就是数据的物理存储顺序,所以一个表最多只能有一个聚簇索引,因为物理存储只能有一个顺序。正因为一个表最多只能有一个聚簇索引,所以它显得更为珍贵,一个表设置什么为聚簇索引对性能很关键。

数据库的并发控制,保持一致性问题

事务:一系列逻辑操作

特性:
Atomicity,Isolation,Consistency,Durability

实现:
Atomicity —— redolog
Isolation —— MVCC 锁
Consistency —— undolog
Durability —— redo log

并发一致性问题:

并发环境下,因为事务的隔离性很难保证,所以会出现很多并发问题

隔离级别:

  • 读未提交 发生下面所有
  • 读已提交 解决: dirty read: A还没提交写,B就读了
  • 可重复读 解决: unrepeateable read: 一个事务内A改了,B读了两次不一样B没提交
    = 串行化 解决: phantom read:范围之间多插了一条

实现:

  • 串行化:加读写锁
  • 读提交 & 可重复读:Read View。 like snapshot
  • 读提交:每个语句执行前生成snapshot
  • 可重复读:启动事务时生成一个 Read View。由MVCC实现

MVCC:

读:旧版本快照
写:新版本+version

Innodb 引擎为了解决「可重复读」隔离级别使用「当前读」而造成的幻读问题,就引出了 next-key 锁

next-key锁 = 记录锁+间隙锁

  • 记录锁,锁的是记录本身;
  • 间隙锁,锁的就是两个值之间的空隙,以防止其他事务在这个空隙间插入新的数据,从而避免幻读现象。

死锁,活锁和饥饿

There may be more than one process competing for a limited amount of resources in a multiprogramming environment. If a process requests a resource that is not currently available, the process will wait until the resource becomes available. In some cases, the waiting process may not gain access to the resource, resulting in deadlock, livelock, and starvation.
- Deadlock: Processes can become stuck in a deadlock when they can’t acquire resources and stop advancing as each of them waits to get the resource held by the other process.
- Livelock: In a livelock, the state of the processes involved changes constantly. This is a deadlock-like situation where processes block each other by changing their states repeatedly and yet do not progress.
- Starvation: Starvation occurs when a process does not have regular access to the resources needed to complete its task and so becomes unable to perform any work. All the low priority processes get blocked, while the high priority ones continue.

死锁不能自行解开,活锁和饥饿可以

死锁一直等待,活锁可能不断改变状态

回滚 | rollback

In SQL, a rollback can be defined as the process of restoring a database to its previous state if there has been an error during the execution of a transaction. A rollback can either be performed automatically by a database system or manually by an end-user. The rollback function brings a transaction back to the beginning or undo any changes since the last COMMIT. Additionally, it releases resources used by the transaction

Commit vs Rollback

COMMIT is applied after a successful transaction.
ROLLBACK statements in SQL are used to undo unfinished transactions in the database.
A COMMIT statement in SQL is used to permanently save the changes made during a transaction in a table/database.In case of an error during transaction execution, this command will restore the database to its previous state.