1.scylla部署
docker单机部署
可以使用docker镜像来启动scyllaDB
docker集群部署
也可以使用docker镜像来部署scyllaDB集群
1 2 3 4 5 6 7 8
| docker run --name scylla -p 9042:9042 -p 9160:9160 -p 10000:10000 -p 9180:9180 -v /var/lib/scylla:/var/lib/scylla -d scylladb/scylla
docker run --name scylla-node2 -p 8042:9042 -p 8160:9160 -p 1000:10000 -p 8180:9180 -v /var/lib/scylladb2:/var/lib/scylla -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' scylla)"
docker run --name scylla-node3 -p 10042:9042 -p 10160:9160 -p 1100:10000 -p 10180:9180 -v /var/lib/scylladb3:/var/lib/scylla -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' scylla)"
docker run --name scylla-node4 -p 11042:9042 -p 11160:9160 -p 1200:10000 -p 11180:9180 -v /var/lib/scylladb4:/var/lib/scylla -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' scylla)"
|
2.在cqlsh中操作scyllaDB
在cqlsh中可以使用CQL (the Cassandra Query Language) 来对scyllaDB做一些基本操作
1 2 3 4 5 6
| sh-4.2# cqlsh Connected to at 172.17.0.3:9042. [cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh>
|
参考:CQLSh: the CQL shell
3.scyllaDB的操作
scylla数据存储于table当中,而table由keyspace分组
创建keysapce
名字叫做test
1 2 3 4 5
| cqlsh> CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION = {'class': 'SimpleStrategy','replication_factor':1}; cqlsh> describe keyspaces;
system_schema system_auth system system_distributed test system_traces
|
REPLICATION参数指定了备份策略,使用了REPLICATION后必须指定class,其中class有SimpleStrategy,NetworkTopologyStrategy,在这里由于是单机测试,所以我指定副本数量是1
创建表
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| cqlsh> use test; cqlsh:test>
CREATE TABLE demo ( user_id int, str text, mtime timestamp, PRIMARY KEY (user_id, mtime) ) WITH CLUSTERING ORDER BY (mtime DESC);
cqlsh:test> DESCRIBE TABLES
demo
cqlsh:test>
|
PRIMARY KEY参数指定了主键,会按照user_id,mtime的顺序来排列key
CLUSTERING参数指定了mtime按照降序排列
插入数据到scylla表
1 2 3 4 5 6
| INSERT INTO demo ( user_id,str,mtime ) VALUES (6,'test','2021-10-09 07:00:00') using TTL 86400;
|
可以看到相同的key user_id会聚合到一起,相同的user_id中mtime按照降序排列
1 2 3 4 5 6 7 8 9 10 11 12 13
| cqlsh:test> select * from demo;
user_id | mtime | str ---------+---------------------------------+------- 5 | 2021-10-09 02:00:00.000000+0000 | Panda 1 | 2021-10-09 04:00:00.000000+0000 | Kay 1 | 2021-10-01 02:00:00.000000+0000 | Kay 2 | 2021-10-09 03:00:00.000000+0000 | Snail 2 | 2021-10-01 02:00:00.000000+0000 | Snail 6 | 2021-10-09 07:00:00.000000+0000 | test
(6 rows)
|
参考:Data Definition
查看scylla集群状态
1 2 3 4 5 6 7 8 9 10 11 12 13
| [root@6a30e1b8fc71 /]# nodetool status Using /etc/scylla/scylla.yaml as the config file Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns Host ID Rack UN 172.17.0.2 1.49 MB 256 ? eaee8765-450d-4d4e-a7b5-2ed4c6b20df3 rack1 UN 172.17.0.5 1.04 MB 256 ? 17153bb7-f4f1-4436-bc49-f1eca3409040 rack1 UN 172.17.0.4 1.03 MB 256 ? 25fd6224-7edc-4161-bf49-ba6fe51c9f73 rack1 UN 172.17.0.6 1.04 MB 256 ? 3d5757d2-dc5f-4ba4-8d90-e02eb9d4c255 rack1
Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless
|
查看表状态
1 2
| nodetool tablestats my_ks.my_tb
|
增加字段
1 2
| ALTER TABLE xx.xx ADD col_name col_type;
|
删除scylla表
4.spark和scyllaDB集成
使用spark读写scyllaDB
由于scyllaDB兼容cassandra API,所以可以参考:
1 2
| https://docs.microsoft.com/en-us/azure/cosmos-db/cassandra/spark-create-operations
|
或者可以参考我的文章:Spark学习笔记——读写ScyllaDB
其他:sizing-your-scylla-cluster
MySQL 亿级数据迁移 之 Cassandra概述