tonglin0325的个人主页

Spark学习笔记——使用PySpark

1.启动pyspark

 

2.读取文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
>>> from pyspark.sql import SparkSession
>>>
>>> spark = SparkSession.builder.appName("myjob").getOrCreate()
>>> df = spark.read.text("hdfs:///user/lintong/logs/test")
>>> df.show()
+-----+
|value|
+-----+
| 1|
| 2|
| 3|
| 4|
+-----+

  

3.退出pyspark使用exit()

 

4.使用spark-submit提交pyspark任务pi.py

1
2
spark2-submit --master local[*] /opt/cloudera/parcels/SPARK2/lib/spark2/examples/src/main/python/pi.py