scala> val textFile = sc.textFile("hdfs://master:9000/user/hadoop/test.segmented") textFile: org.apache.spark.rdd.RDD[String] = hdfs://master:9000/user/hadoop/test.segmented MapPartitionsRDD[1] at textFile at <console>:24
hadoop@master:~$ hadoop fs -cat /user/hadoop/writeback/part-00000 aa bb aa bb aa aa cc bb ee dd ee cc
1 2 3 4 5 6 7 8 9 10
hadoop@master:~$ hadoop fs -cat /user/hadoop/writeback/part-00001 aa cc ee ff ff gg hh aa
现在进入WordCount阶段,再次进入 Spark-shell 中
1 2 3 4
val textFile = sc.textFile("hdfs://master:9000/user/hadoop/test.segmented") val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) wordCount.collect()
import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf /** * Created by common on 17-4-3. */
object WordCount { def main(args: Array[String]) { val inputFile = "file:///home/common/coding/coding/Scala/word-count/test.segmented" val conf = new SparkConf().setAppName("WordCount").setMaster("local") #创建一个SparkConf对象来配置应用<br /> #集群URL:告诉Spark连接到哪个集群,local是单机单线程,无需连接到集群,应用名:在集群管理器的用户界面方便找到应用 val sc = new SparkContext(conf) #然后基于这SparkConf创建一个SparkContext对象 val textFile = sc.textFile(inputFile) #读取输入的数据 val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b) #切分成单词,转换成键值对并计数 wordCount.foreach(println) } }
报如下错误: Exception in thread "main" java.lang.Exception: When running with master 'yarn-client' either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the environment
mysql> SHOW VARIABLES LIKE 'datadir'; +---------------+----------------------+ | Variable_name | Value | +---------------+----------------------+ | datadir | /var/lib/mysql/data/ | +---------------+----------------------+ 1 row in set (0.01 sec)
查看datadir目录下的所有文件夹
1 2 3 4 5 6
sh-4.2$ ls -l | grep '^d' drwxr-x--- 2 mysql mysql 4096 Aug 24 12:36 default drwxr-x--- 2 mysql mysql 4096 Jan 31 2024 mysql drwxr-x--- 2 mysql mysql 4096 Jan 31 2024 performance_schema drwxr-x--- 2 mysql mysql 12288 Jan 31 2024 sys
sh-4.2$ ls -l | grep -v '^d' total 41032 -rw-r----- 1 mysql mysql 56 Jan 31 2024 auto.cnf -rw-r----- 1 mysql mysql 2 Aug 15 15:05 bc130f3f763a.pid -rw------- 1 mysql mysql 1680 Jan 31 2024 ca-key.pem -rw-r--r-- 1 mysql mysql 1112 Jan 31 2024 ca.pem -rw-r--r-- 1 mysql mysql 1112 Jan 31 2024 client-cert.pem -rw------- 1 mysql mysql 1680 Jan 31 2024 client-key.pem -rw-r----- 1 mysql mysql 477 Aug 15 15:05 ib_buffer_pool -rw-r----- 1 mysql mysql 8388608 Aug 24 15:25 ib_logfile0 -rw-r----- 1 mysql mysql 8388608 Aug 24 15:05 ib_logfile1 -rw-r----- 1 mysql mysql 12582912 Aug 24 15:25 ibdata1 -rw-r----- 1 mysql mysql 12582912 Aug 24 15:06 ibtmp1 -rw-r--r-- 1 mysql mysql 6 Jan 31 2024 mysql_upgrade_info -rw------- 1 mysql mysql 1676 Jan 31 2024 private_key.pem -rw-r--r-- 1 mysql mysql 452 Jan 31 2024 public_key.pem -rw-r--r-- 1 mysql mysql 1112 Jan 31 2024 server-cert.pem -rw------- 1 mysql mysql 1676 Jan 31 2024 server-key.pem
在 MySQL 的 InnoDB 存储引擎中,这些文件(ib_buffer_pool、ib_logfile0、ib_logfile1、ibdata1、ibtmp1)代表了数据库运行时的不同数据结构和存储机制。它们各自有不同的用途,用于管理和存储 InnoDB 的数据和日志。
ib_buffer_pool 文件用于持久化存储 InnoDB 缓冲池(Buffer Pool)中的热数据页的状态。当 MySQL 服务器重启时,通过此文件恢复缓冲池的内容,减少重启后重新填充缓冲池所需的时间。
mysql> SHOW VARIABLES LIKE 'innodb_file_per_table'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | innodb_file_per_table | ON | +-----------------------+-------+ 1 row in set (0.01 sec)
mysql> SHOW VARIABLES LIKE 'innodb_default_row_format'; +---------------------------+---------+ | Variable_name | Value | +---------------------------+---------+ | innodb_default_row_format | dynamic | +---------------------------+---------+ 1 row in set (0.00 sec)