Flink中的DataSet任务用于实现data sets的转换,data set通常是固定的数据源,比如可读文件,或者本地集合等。
Ref
1 | https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/batch/ |
使用DataSet API需要使用 批处理 env
1 | ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); |
DataSet支持的Data Source有:File-based,Collection-based,Generic
1.File-based
1 | readTextFile(path) / TextInputFormat - Reads files line wise and returns them as Strings. |
2.Collection-based
1 | fromCollection(Collection) - Creates a data set from a Java.util.Collection. All elements in the collection must be of the same type. |
3.Generic
1 | readFile(inputFormat, path) / FileInputFormat - Accepts a file input format. |