概述: Flink Job 运行问题 FAQ
Flink Job 运行失败的原因汇总
历次处理Flink任务的错误情况,一般原因有:
- 与第三方资源(数据库、OSS等)的网络不互通(搭建环境的早期阶段)、网络不稳定
- 配置错误 (url / 用户名 / 密码; 大小写、空格等特殊字符)
- 集群/队列的CU资源不足
- Flink Job 的 JVM内存不足
- Flink CDC Job中mysql binlog过期或失效
- checkpoint保存失败
- Flink 程序的业务逻辑、数据量太大:导致性能缓慢,导致 checkpoint 超时
- Flink程序中的依赖组件(OSS、MYSQL、Redis、OLAP数据库等)不稳定/运行崩溃,导致 checkpoint 超时
- ...
Flink CDC 运行问题 FAQ
- [Flink] Flink CDC FAQ - 博客园/千千寰宇
Q: Flink运行时报java.lang.IllegalStateException: Buffer pool is destroyed.
问题描述
- Flink运行时报
java.lang.IllegalStateException: Buffer pool is destroyed.
且这个报错在日志中与"
Could not forward element to next operator"同时存在。
java.lang.RuntimeException: Buffer pool is destroyed.at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at com.ucarinc.framework.flink.connectors.flexq.FlexQSource.run(FlexQSource.java:204) ~[flink-connector-flexq-1.8.500-20191206.054312-28.jar:1.8.500-SNAPSHOT]at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711) [flink-runtime_2.11-1.8.1.jar:1.8.1]at java.lang.Thread.run(Thread.java:745) [?:1.8.0_31]
Caused by: java.lang.IllegalStateException: Buffer pool is destroyed.at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.internalRequestMemorySegment(LocalBufferPool.java:264) ~[flink-runtime_2.11-1.8.1.jar:1.8.1]at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegment(LocalBufferPool.java:240) ~[flink-runtime_2.11-1.8.1.jar:1.8.1]at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:218) ~[flink-runtime_2.11-1.8.1.jar:1.8.1]at org.apache.flink.runtime.io.network.api.writer.RecordWriter.requestNewBufferBuilder(RecordWriter.java:264) ~[flink-runtime_2.11-1.8.1.jar:1.8.1]at org.apache.flink.runtime.io.network.api.writer.RecordWriter.getBufferBuilder(RecordWriter.java:257) ~[flink-runtime_2.11-1.8.1.jar:1.8.1]at org.apache.flink.runtime.io.network.api.writer.RecordWriter.copyFromSerializerToTargetChannel(RecordWriter.java:177) ~[flink-runtime_2.11-1.8.1.jar:1.8.1]at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:162) ~[flink-runtime_2.11-1.8.1.jar:1.8.1]at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128) ~[flink-runtime_2.11-1.8.1.jar:1.8.1]at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107) ~[flink-streaming-java_2.11-1.8.1.jar:1.8.1]... 18 more
问题分析
- 一般为为任务
network buffer不足。可以调整下任务的network buffer的大小。
解决方法
-
方法1:清理/腾出运行的计算机内存资源,尔后重新提交运行 (亲测有效)
-
方法2:高级参数中添加:
taskmanager.memory.network.fraction0.2 (默认值为0.1,可根据实际情况适当调整)
参考文献
- FAQ-Buffer pool is destroyed. - 网易-有数学堂/EasyData数据开发治理平台FAQ 【推荐】
- Flink问题排查-Buffer pool is destroyed. - CSDN
