SQL题——连续问题

- 1.连续问题的统一思路
- 2.相关题目
- - 1285 找到连续区间的开始和结束数字
  - 2173 最多连胜的次数
  - 1454 活跃用户
  - 1225 报告系统状态的连续日期
  - 2292 连续两年有3个及以上订单的产品
  - 2752 在连续天数上进行了最多交易次数的顾客
  - 2474 购买量严格增加的客户
  - 3140 连续空余座位II

1.连续问题的统一思路

核心思想：连续变量−排序变量=常数C，根据常数C分组求和，即可判断连续次数

1.首先要有连续的参考列，如果没有，则需要自己构造
2.用row_number()对目标做排序，比如日期
3.排序后做diff，连续的diff一定相同，然后按diff取目标结果即可

2.相关题目

1285 找到连续区间的开始和结束数字

表：Logs

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| log_id        | int     |
+---------------+---------+

id 是上表具有唯一值的列。
上表的每一行包含日志表中的一个 ID。

编写解决方案，得到 Logs 表中的连续区间的开始数字和结束数字。

返回结果表按照 start_id 排序。

结果格式如下面的例子。

示例 1：

输入：
Logs 表：

+------------+
| log_id     |
+------------+
| 1          |
| 2          |
| 3          |
| 7          |
| 8          |
| 10         |
+------------+

输出：

+------------+--------------+
| start_id   | end_id       |
+------------+--------------+
| 1          | 3            |
| 7          | 8            |
| 10         | 10           |
+------------+--------------+

解释：
结果表应包含 Logs 表中的所有区间。
从 1 到 3 在表中。
从 4 到 6 不在表中。
从 7 到 8 在表中。
9 不在表中。
10 在表中。

答案：

sql">select   min(log_id) as start_id 
        ,max(log_id) as end_id
from 
    (select *
            ,log_id - row_number() over(order by log_id) as diff
    from logs ) t0
group by diff
order by start_id
;

2173 最多连胜的次数

表: Matches

+-------------+------+
| Column Name | Type |
+-------------+------+
| player_id   | int  |
| match_day   | date |
| result      | enum |
+-------------+------+

(player_id, match_day) 是该表的主键（具有唯一值的列的组合）。
每一行包括了：参赛选手 id、比赛时间、比赛结果。
比赛结果（result）的枚举类型为 (‘Win’, ‘Draw’, ‘Lose’)。

选手的连胜数是指连续获胜的次数，且没有被平局或输球中断。

编写解决方案来计算每个参赛选手最多的连胜数。

结果可以以任何顺序返回。

结果格式如下例所示：

示例 1:

输入:
Matches 表:

+-----------+------------+--------+
| player_id | match_day  | result |
+-----------+------------+--------+
| 1         | 2022-01-17 | Win    |
| 1         | 2022-01-18 | Win    |
| 1         | 2022-01-25 | Win    |
| 1         | 2022-01-31 | Draw   |
| 1         | 2022-02-08 | Win    |
| 2         | 2022-02-06 | Lose   |
| 2         | 2022-02-08 | Lose   |
| 3         | 2022-03-30 | Win    |
+-----------+------------+--------+

输出:

+-----------+----------------+
| player_id | longest_streak |
+-----------+----------------+
| 1         | 3              |
| 2         | 0              |
| 3         | 1              |
+-----------+----------------+

解释:
Player 1:
从 2022-01-17 到 2022-01-25, player 1连续赢了三场比赛。
2022-01-31, player 1 平局.
2022-02-08, player 1 赢了一场比赛。
最多连胜了三场比赛。

Player 2:
从 2022-02-06 到 2022-02-08, player 2 输了两场比赛。
最多连赢了0场比赛。

Player 3:
2022-03-30, player 3 赢了一场比赛。
最多连赢了一场比赛。

答案：

sql">select a.player_id
        ,IFNULL(b.longest_streak, 0) as longest_streak
from (
    select distinct player_id as player_id
    from matches
) a
left join 
(
     select player_id
    ,max(cnt) as longest_streak 
    from
        (select player_id
                ,diff
                ,count(1) as cnt
        from
            (select *
                    ,rn1 - rn2 as diff
            from
                (select *
                        ,row_number() over(partition by player_id order by match_day) as rn1
                        ,row_number() over(partition by player_id, result order by match_day) as rn2
                from matches) t
            )t1
        where  result = 'win'
        group by  player_id
                ,diff
        ) t2
    group by player_id
) b
on a.player_id = b.player_id
;

1454 活跃用户

表 Accounts:

sql">
+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| name          | varchar |
+---------------+---------+

id 是该表主键（具有唯一值的列）
该表包含账户 id 和账户的用户名.

表 Logins:


+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| id            | int     |
| login_date    | date    |
+---------------+---------+

该表可能包含重复项.
该表包含登录用户的账户 id 和登录日期. 用户也许一天内登录多次.

活跃用户是指那些至少连续 5 天登录账户的用户。

编写解决方案, 找到活跃用户的 id 和 name。

返回的结果表按照 id 排序。

结果表格式如下例所示。

示例 1：

输入：
Accounts 表:

+----+----------+
| id | name     |
+----+----------+
| 1  | Winston  |
| 7  | Jonathan |
+----+----------+

Logins 表:

+----+------------+
| id | login_date |
+----+------------+
| 7  | 2020-05-30 |
| 1  | 2020-05-30 |
| 7  | 2020-05-31 |
| 7  | 2020-06-01 |
| 7  | 2020-06-02 |
| 7  | 2020-06-02 |
| 7  | 2020-06-03 |
| 1  | 2020-06-07 |
| 7  | 2020-06-10 |
+----+------------+

输出：

+----+----------+
| id | name     |
+----+----------+
| 7  | Jonathan |
+----+----------+

解释：
id = 1 的用户 Winston 仅仅在不同的 2 天内登录了 2 次, 所以, Winston 不是活跃用户.
id = 7 的用户 Jonathon 在不同的 6 天内登录了 7 次, , 6 天中有 5 天是连续的, 所以, Jonathan 是活跃用户.

答案：

sql">
select  distinct t3.id
        ,t4.name
from
    (select id
        ,subdate(login_date, rn) as first_date
    from
        (select id
                ,login_date
                ,row_number() over(partition by id order by login_date) as rn
        from
            (select distinct id
                    ,login_date
            from logins ) t1
        ) t2
    group by id
            ,first_date
    having count(*) >= 5) t3
left join accounts t4
on t3.id = t4.id;

1225 报告系统状态的连续日期

表：Failed

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| fail_date    | date    |
+--------------+---------+

该表主键为 fail_date (具有唯一值的列)。
该表包含失败任务的天数.

表： Succeeded

+--------------+---------+
| Column Name  | Type    |
+--------------+---------+
| success_date | date    |
+--------------+---------+

该表主键为 success_date (具有唯一值的列)。
该表包含成功任务的天数.

系统每天运行一个任务。每个任务都独立于先前的任务。任务的状态可以是失败或是成功。

编写解决方案找出 2019-01-01 到 2019-12-31 期间任务连续同状态 period_state 的起止日期（start_date 和 end_date）。即如果任务失败了，就是失败状态的起止日期，如果任务成功了，就是成功状态的起止日期。

最后结果按照起始日期 start_date 排序

返回结果样例如下所示:

示例 1：

输入：
Failed table:

+-------------------+
| fail_date         |
+-------------------+
| 2018-12-28        |
| 2018-12-29        |
| 2019-01-04        |
| 2019-01-05        |
+-------------------+

Succeeded table:

+-------------------+
| success_date      |
+-------------------+
| 2018-12-30        |
| 2018-12-31        |
| 2019-01-01        |
| 2019-01-02        |
| 2019-01-03        |
| 2019-01-06        |
+-------------------+

输出：

+--------------+--------------+--------------+
| period_state | start_date   | end_date     |
+--------------+--------------+--------------+
| succeeded    | 2019-01-01   | 2019-01-03   |
| failed       | 2019-01-04   | 2019-01-05   |
| succeeded    | 2019-01-06   | 2019-01-06   |
+--------------+--------------+--------------+

解释：
结果忽略了 2018 年的记录，因为我们只关心从 2019-01-01 到 2019-12-31 的记录
从 2019-01-01 到 2019-01-03 所有任务成功，系统状态为 “succeeded”。
从 2019-01-04 到 2019-01-05 所有任务失败，系统状态为 “failed”。
从 2019-01-06 到 2019-01-06 所有任务成功，系统状态为 “succeeded”。

答案:

sql">select *
from
(
    select  'failed' as period_state 
            ,MIN(fail_date) AS start_date
            ,MAX(fail_date) AS end_date
    from
        (   select fail_date
                ,subdate(fail_date, rn) as frist_date
            from 
                (select fail_date
                        ,row_number() over(order by fail_date) as rn
                from failed 
                where fail_date between '2019-01-01' and '2019-12-31'
            ) t0
    ) t1
    group by frist_date
    union all
    select  'succeeded' as period_state 
            ,MIN(success_date) AS start_date
            ,MAX(success_date) AS end_date
    from
        (   select  success_date
                    ,subdate(success_date, rn) as frist_date
            from 
            (   select success_date
                        ,row_number() over(order by success_date) as rn
                from Succeeded  
                where success_date between '2019-01-01' and '2019-12-31'
            ) t0
    ) t1
    group by frist_date
) t
order by start_date

2292 连续两年有3个及以上订单的产品

表: Orders

sql">+---------------+------+
| Column Name   | Type |
+---------------+------+
| order_id      | int  |
| product_id    | int  |
| quantity      | int  |
| purchase_date | date |
+---------------+------+

order_id 包含唯一值。
该表中的每一行都包含订单 ID、购买的产品 ID、数量和购买日期。

编写解决方案，获取连续两年订购三次或三次以上的所有产品的 id。

以任意顺序返回结果表。

结果格式示例如下。

示例 1:

输入:
Orders 表:

sql">+----------+------------+----------+---------------+
| order_id | product_id | quantity | purchase_date |
+----------+------------+----------+---------------+
| 1        | 1          | 7        | 2020-03-16    |
| 2        | 1          | 4        | 2020-12-02    |
| 3        | 1          | 7        | 2020-05-10    |
| 4        | 1          | 6        | 2021-12-23    |
| 5        | 1          | 5        | 2021-05-21    |
| 6        | 1          | 6        | 2021-10-11    |
| 7        | 2          | 6        | 2022-10-11    |
+----------+------------+----------+---------------+

输出:

sql">+------------+
| product_id |
+------------+
| 1          |
+------------+

解释:
产品 1 在 2020 年和 2021 年都分别订购了三次。由于连续两年订购了三次，所以我们将其包含在答案中。
产品 2 在 2022 年订购了一次。我们不把它包括在答案中。

答案：

sql">select distinct product_id
from 
    (select product_id
            ,ord_date
            ,ord_date - row_number() over(partition by product_id order by ord_date) as first_year
    from
        (
            select date_format(purchase_date, '%Y') as ord_date
                ,product_id
                ,count(order_id) as ord_cnt
            from orders 
            group by product_id
                    ,date_format(purchase_date, '%Y')
            having count(order_id) >= 3
        ) t0
    ) t1
group by product_id 
        ,first_year
having count(*) > 1;

2752 在连续天数上进行了最多交易次数的顾客

表： Transactions


+------------------+------+
| 列名             | 类型 |
+------------------+------+
| transaction_id   | int  |
| customer_id      | int  |
| transaction_date | date |
| amount           | int  |
+------------------+------+

transaction_id 是这个表的具有唯一值的列。
每行包含有关交易的信息，包括唯一的（customer_id，transaction_date）以及相应的 customer_id 和 amount。
编写一个解决方案，找到连续天数上进行了最多交易的所有 customer_id 。

返回所有具有最大连续交易次数的 customer_id 。结果表按 customer_id 的升序排序。

结果的格式如下所示。

示例 1：

输入：
Transactions 表:

+----------------+-------------+------------------+--------+
| transaction_id | customer_id | transaction_date | amount |
+----------------+-------------+------------------+--------+
| 1              | 101         | 2023-05-01       | 100    |
| 2              | 101         | 2023-05-02       | 150    |
| 3              | 101         | 2023-05-03       | 200    |
| 4              | 102         | 2023-05-01       | 50     |
| 5              | 102         | 2023-05-03       | 100    |
| 6              | 102         | 2023-05-04       | 200    |
| 7              | 105         | 2023-05-01       | 100    |
| 8              | 105         | 2023-05-02       | 150    |
| 9              | 105         | 2023-05-03       | 200    |
+----------------+-------------+------------------+--------+

输出：

+-------------+
| customer_id | 
+-------------+
| 101         | 
| 105         | 
+-------------+

解释：

customer_id 为 101 共有 3 次交易，且全部是连续的。
customer_id 为 102 共有 3 次交易，但只有其中 2 次是连续的。
customer_id 为 105 共有 3 次交易，且全部是连续的。
总的来说，最大连续交易次数为 3，由 customer_id 为 101 和 105 的完成。customer_id 按升序排序。

答案：

sql">
select customer_id
from
    (select *
            ,rank() over(order by tra_cnt desc) as rk
    from
        (select  customer_id
                ,first_date
                ,count(transaction_id) as tra_cnt
        from
            (select  customer_id
                    ,transaction_id
                    ,transaction_date
                    ,rn
                    ,subdate(transaction_date, rn) as first_date
            from
            (select *
                    ,row_number() over(partition by customer_id order by transaction_date) as rn
            from transactions ) t0
        ) t1
        group by customer_id
                ,first_date
    ) t2
) t3
where rk = 1
order by customer_id;

2474 购买量严格增加的客户

表: Orders

+--------------+------+
| Column Name  | Type |
+--------------+------+
| order_id     | int  |
| customer_id  | int  |
| order_date   | date |
| price        | int  |
+--------------+------+

order_id 是该表的主键。
每行包含订单的 id、订购该订单的客户 id、订单日期和价格。

编写一个 SQL 查询，报告总购买量每年严格增加的客户 id。

客户在一年内的总购买量是该年订单价格的总和。如果某一年客户没有下任何订单，我们认为总购买量为 0。
对于每个客户，要考虑的第一个年是他们第一次下单的年份。
对于每个客户，要考虑的最后一年是他们最后一次下单的年份。
以任意顺序返回结果表。

查询结果格式如下所示。

示例 1:

输入:
Orders 表:

+----------+-------------+------------+-------+
| order_id | customer_id | order_date | price |
+----------+-------------+------------+-------+
| 1        | 1           | 2019-07-01 | 1100  |
| 2        | 1           | 2019-11-01 | 1200  |
| 3        | 1           | 2020-05-26 | 3000  |
| 4        | 1           | 2021-08-31 | 3100  |
| 5        | 1           | 2022-12-07 | 4700  |
| 6        | 2           | 2015-01-01 | 700   |
| 7        | 2           | 2017-11-07 | 1000  |
| 8        | 3           | 2017-01-01 | 900   |
| 9        | 3           | 2018-11-07 | 900   |
+----------+-------------+------------+-------+

输出:

+-------------+
| customer_id |
+-------------+
| 1           |
+-------------+

解释:
客户 1: 第一年是 2019 年，最后一年是 2022 年

2019: 1100 + 1200 = 2300
2020: 3000
2021: 3100
2022: 4700
我们可以看到总购买量每年都在严格增加，所以我们在答案中包含了客户 1。

客户 2: 第一年是2015年，最后一年是2017年

2015: 700
2016: 0
2017: 1000
我们没有把客户 2 包括在答案中，因为总的购买量并没有严格地增加。请注意，客户 2 在 2016 年没有购买任何物品。

客户 3: 第一年是 2017 年，最后一年是 2018 年

2017: 900
2018: 900

答案：

sql">select  customer_id
from
(select customer_id
        ,order_year
        ,price
        ,rank() over(partition by customer_id order by price) as rn
    from
    (
        select  customer_id
                ,date_format(order_date, '%Y') as order_year
                ,sum(price) as price
        from orders 
        group by customer_id
                ,date_format(order_date, '%Y')
    ) t0
) t1
group by customer_id
having count(distinct order_year - rn) = 1

3140 连续空余座位II

表：Cinema

+-------------+------+
| Column Name | Type |
+-------------+------+
| seat_id     | int  |
| free        | bool |
+-------------+------+

seat_id 是这张表中的自增列。
这张表的每一行表示第 i 个作为是否空余。1 表示空余，而 0 表示被占用。
编写一个解决方案来找到电影院中最长的空余座位的长度。

注意：

保证最多有一个最长连续序列。
如果有多个相同长度的连续序列，将它们全部输出。
返回结果表以 first_seat_id 升序排序。

结果表的格式如下所示。

示例：

输入：

Cinema 表：

+---------+------+
| seat_id | free |
+---------+------+
| 1       | 1    |
| 2       | 0    |
| 3       | 1    |
| 4       | 1    |
| 5       | 1    |
+---------+------+

输出：

+-----------------+----------------+-----------------------+
| first_seat_id   | last_seat_id   | consecutive_seats_len |
+-----------------+----------------+-----------------------+
| 3               | 5              | 3                     |
+-----------------+----------------+-----------------------+

解释：

可用座位的最长连续序列从座位 3 开始，到座位 5 结束，长度为 3。
输出表以 first_seat_id 升序排序。

答案：

sql">
select first_seat_id 
        ,last_seat_id
        ,consecutive_seats_len
from
(
    select *
            ,rank() over(order by consecutive_seats_len desc) as rk
    from 
    (select  group_id,
            MIN(seat_id) AS first_seat_id,
            MAX(seat_id) AS last_seat_id,
            COUNT(*) AS consecutive_seats_len 
    from
    (select *
            ,sum(free) over(order by seat_id) as val
            ,seat_id - sum(free) over(order by seat_id)  as group_id
    from Cinema 
    where free = 1) t0
    group by group_id
    ) t1
) t2
where rk = 1;