解决clickhouse服务器启动异常DB::Exception: Suspiciously many broken parts to remove.
产生原因
服务重启与断电原因
最近遇到一个关于clickhouse
服务启动失败问题,报错输出如下
Application: DB::Exception: Suspiciously many (32) broken parts to remove.****
Application: DB::Exception: Suspiciously many (32) broken parts to remove.****
这个是发生在机器断电场景下的报错,查找原因是说因为写入数据造成的元数据和数据不一致问题
clickhouse
在重启服务的时候会重新加载MergeTree
表引擎数据,数据可能存在损坏情况
clickhouse
配置原因
配置参数当中包含一个参数max_suspicious_broken_parts
,默认值是10
,可选值范围是任意正整数,如果单个分区中的损坏部分数量超过max_suspicious_broken_parts
配置的值,则拒绝自动修复或者拒绝删除损坏部分的数据,并且服务启动时候直接报错退出
目前需要尽量避免该错误以免服务启动失败,推荐把该参数配置为1000
或者更大的值
解决方案
单表配置方式
在创建MergeTree
表的时候特别配置一下max_suspicious_broken_parts
参数
CREATE TABLE foo
(
`A` Int64
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS max_suspicious_broken_parts = 1000;
CREATE TABLE foo
(
`A` Int64
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS max_suspicious_broken_parts = 1000;
命令行方式
使用ALTER TABLE ... MODIFY SETTING
命令修改
ALTER TABLE foo
MODIFY SETTING max_suspicious_broken_parts = 1000;
-- 恢复默认值
-- reset to default (use value from system.merge_tree_settings)
ALTER TABLE foo
RESET SETTING max_suspicious_broken_parts;
ALTER TABLE foo
MODIFY SETTING max_suspicious_broken_parts = 1000;
-- 恢复默认值
-- reset to default (use value from system.merge_tree_settings)
ALTER TABLE foo
RESET SETTING max_suspicious_broken_parts;
配置文件方式
如果服务起不来了,就只能使用这个方式解决
新建文件max_suspicious_broken_parts.xml
写入如下内容
<?xml version="1.0"?>
<yandex>
<merge_tree>
<max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
</merge_tree>
</yandex>
<?xml version="1.0"?>
<yandex>
<merge_tree>
<max_suspicious_broken_parts>1000</max_suspicious_broken_parts>
</merge_tree>
</yandex>
clickhouse
的配置文件推荐放置在/etc/clickhouse-server/config.d/
文件夹下生效
如果是在Ubuntu
或者Centos
上面以DEB
或RPM
安装包的形式启动的,需要把该文件放到/etc/clickhouse-server/config.d/
,最后重启clickhouse
就可以了
如果是docker compose
方式启动
修改compose.yaml
配置如下,主要也是把对应文件挂载进入容器内部相应位置
services:
clickhouse:
image: clickhouse/clickhouse-server
ulimits:
nofile:
soft: 262144
hard: 262144
restart: always
container_name: demo-clickhouse
environment:
- CLICKHOUSE_USER=demo
- CLICKHOUSE_PASSWORD=demo-pass
- CLICKHOUSE_DB=demo
ports:
- "8123:8123"
- "9000:9000"
volumes:
- ./max_suspicious_broken_parts.xml:/etc/clickhouse-server/config.d/max_suspicious_broken_parts.xml
- demo-clickhouse:/var/lib/clickhouse
healthcheck:
test: 'wget -O - http://127.0.0.1:8123 || exit 1'
volumes:
demo-clickhouse: {}
services:
clickhouse:
image: clickhouse/clickhouse-server
ulimits:
nofile:
soft: 262144
hard: 262144
restart: always
container_name: demo-clickhouse
environment:
- CLICKHOUSE_USER=demo
- CLICKHOUSE_PASSWORD=demo-pass
- CLICKHOUSE_DB=demo
ports:
- "8123:8123"
- "9000:9000"
volumes:
- ./max_suspicious_broken_parts.xml:/etc/clickhouse-server/config.d/max_suspicious_broken_parts.xml
- demo-clickhouse:/var/lib/clickhouse
healthcheck:
test: 'wget -O - http://127.0.0.1:8123 || exit 1'
volumes:
demo-clickhouse: {}
最后运行如下命令,容器就可以正常启动了
$ docker compose down
$ docker compose up -d
$ docker compose down
$ docker compose up -d
验证配置是否生效
连接到clickhouse
之后执行查询
SELECT *
FROM system.merge_tree_settings
WHERE name LIKE '%max_suspicious_broken_parts%'
SELECT *
FROM system.merge_tree_settings
WHERE name LIKE '%max_suspicious_broken_parts%'
name | value | changed | description | type |
---|---|---|---|---|
max_suspicious_broken_parts | 1000 | 1 | Max broken parts, if more - deny automatic deletion. | UInt64 |
max_suspicious_broken_parts_bytes | 1073741824 | 0 | Max size of all broken parts, if more - deny automatic deletion. | UInt64 |
阅读参考
Altinity Knowledge Base
关于Suspiciously many broken parts
报错解决方案