Skip to content

解决clickhouse服务器启动异常DB::Exception: Suspiciously many broken parts to remove.

产生原因

  • 服务重启与断电原因

最近遇到一个关于clickhouse服务启动失败问题,报错输出如下

Application: DB::Exception: Suspiciously many (32) broken parts to remove.****
Application: DB::Exception: Suspiciously many (32) broken parts to remove.****

这个是发生在机器断电场景下的报错,查找原因是说因为写入数据造成的元数据和数据不一致问题

clickhouse在重启服务的时候会重新加载MergeTree表引擎数据,数据可能存在损坏情况

  • clickhouse配置原因

配置参数当中包含一个参数max_suspicious_broken_parts,默认值是10,可选值范围是任意正整数,如果单个分区中的损坏部分数量超过max_suspicious_broken_parts 配置的值,则拒绝自动修复或者拒绝删除损坏部分的数据,并且服务启动时候直接报错退出

目前需要尽量避免该错误以免服务启动失败,推荐把该参数配置为1000或者更大的值

解决方案

单表配置方式

在创建MergeTree表的时候特别配置一下max_suspicious_broken_parts参数

CREATE TABLE foo
(
    `A` Int64
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS max_suspicious_broken_parts = 1000;
CREATE TABLE foo
(
    `A` Int64
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS max_suspicious_broken_parts = 1000;

命令行方式

使用ALTER TABLE ... MODIFY SETTING命令修改

ALTER TABLE foo
    MODIFY SETTING max_suspicious_broken_parts = 1000;

-- 恢复默认值
-- reset to default (use value from system.merge_tree_settings)
ALTER TABLE foo
    RESET SETTING max_suspicious_broken_parts;
ALTER TABLE foo
    MODIFY SETTING max_suspicious_broken_parts = 1000;

-- 恢复默认值
-- reset to default (use value from system.merge_tree_settings)
ALTER TABLE foo
    RESET SETTING max_suspicious_broken_parts;

配置文件方式

如果服务起不来了,就只能使用这个方式解决

新建文件max_suspicious_broken_parts.xml写入如下内容

<?xml version="1.0"?>
<yandex>
     <merge_tree>
         <max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
     </merge_tree>
</yandex>
<?xml version="1.0"?>
<yandex>
     <merge_tree>
         <max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
     </merge_tree>
</yandex>

clickhouse的配置文件推荐放置在/etc/clickhouse-server/config.d/文件夹下生效

如果是在Ubuntu或者Centos上面以DEBRPM安装包的形式启动的,需要把该文件放到/etc/clickhouse-server/config.d/,最后重启clickhouse就可以了

如果是docker compose方式启动

修改compose.yaml配置如下,主要也是把对应文件挂载进入容器内部相应位置

services:
  clickhouse:
    image: clickhouse/clickhouse-server
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    restart: always
    container_name: demo-clickhouse
    environment:
    - CLICKHOUSE_USER=demo
    - CLICKHOUSE_PASSWORD=demo-pass
    - CLICKHOUSE_DB=demo
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - ./max_suspicious_broken_parts.xml:/etc/clickhouse-server/config.d/max_suspicious_broken_parts.xml
      - demo-clickhouse:/var/lib/clickhouse
    healthcheck:
      test: 'wget -O - http://127.0.0.1:8123 || exit 1'
     
volumes:
  demo-clickhouse: {}
services:
  clickhouse:
    image: clickhouse/clickhouse-server
    ulimits:
      nofile:
        soft: 262144
        hard: 262144
    restart: always
    container_name: demo-clickhouse
    environment:
    - CLICKHOUSE_USER=demo
    - CLICKHOUSE_PASSWORD=demo-pass
    - CLICKHOUSE_DB=demo
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - ./max_suspicious_broken_parts.xml:/etc/clickhouse-server/config.d/max_suspicious_broken_parts.xml
      - demo-clickhouse:/var/lib/clickhouse
    healthcheck:
      test: 'wget -O - http://127.0.0.1:8123 || exit 1'
     
volumes:
  demo-clickhouse: {}

最后运行如下命令,容器就可以正常启动了

$ docker compose down
$ docker compose up -d
$ docker compose down
$ docker compose up -d

验证配置是否生效

连接到clickhouse之后执行查询

SELECT *
FROM system.merge_tree_settings
WHERE name LIKE '%max_suspicious_broken_parts%'
SELECT *
FROM system.merge_tree_settings
WHERE name LIKE '%max_suspicious_broken_parts%'
namevaluechangeddescriptiontype
max_suspicious_broken_parts10001Max broken parts, if more - deny automatic deletion.UInt64
max_suspicious_broken_parts_bytes10737418240Max size of all broken parts, if more - deny automatic deletion.UInt64

阅读参考

Clickhouse MergeTree官方配置文档

Altinity Knowledge Base关于Suspiciously many broken parts报错解决方案

Last updated:

Released under the MIT License.