tonglin0325的个人主页

scyllaDB基本使用

1.scylla部署#

docker单机部署#

可以使用docker镜像来启动scyllaDB

docker集群部署#

也可以使用docker镜像来部署scyllaDB集群

1
2
3
4
5
6
7
8
docker run --name scylla -p 9042:9042 -p 9160:9160 -p 10000:10000 -p 9180:9180 -v /var/lib/scylla:/var/lib/scylla -d scylladb/scylla

docker run --name scylla-node2 -p 8042:9042 -p 8160:9160 -p 1000:10000 -p 8180:9180 -v /var/lib/scylladb2:/var/lib/scylla -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' scylla)"

docker run --name scylla-node3 -p 10042:9042 -p 10160:9160 -p 1100:10000 -p 10180:9180 -v /var/lib/scylladb3:/var/lib/scylla -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' scylla)"

docker run --name scylla-node4 -p 11042:9042 -p 11160:9160 -p 1200:10000 -p 11180:9180 -v /var/lib/scylladb4:/var/lib/scylla -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' scylla)"

  

2.在cqlsh中操作scyllaDB#

在cqlsh中可以使用CQL (the Cassandra Query Language) 来对scyllaDB做一些基本操作

1
2
3
4
5
6
sh-4.2# cqlsh
Connected to at 172.17.0.3:9042.
[cqlsh 5.0.1 | Cassandra 3.0.8 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh>

参考:CQLSh: the CQL shell

  

3.scyllaDB的操作#

scylla数据存储于table当中,而table由keyspace分组

创建keysapce#

名字叫做test

1
2
3
4
5
cqlsh> CREATE KEYSPACE IF NOT EXISTS test WITH REPLICATION = {'class': 'SimpleStrategy','replication_factor':1};
cqlsh> describe keyspaces;

system_schema system_auth system system_distributed test system_traces

REPLICATION参数指定了备份策略,使用了REPLICATION后必须指定class,其中class有SimpleStrategy,NetworkTopologyStrategy,在这里由于是单机测试,所以我指定副本数量是1

创建表#

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
cqlsh> use test;
cqlsh:test>

CREATE TABLE demo (
user_id int,
str text,
mtime timestamp,
PRIMARY KEY (user_id, mtime)
) WITH CLUSTERING ORDER BY (mtime DESC);

cqlsh:test> DESCRIBE TABLES

demo

cqlsh:test>

PRIMARY KEY参数指定了主键,会按照user_id,mtime的顺序来排列key

CLUSTERING参数指定了mtime按照降序排列

插入数据到scylla表#

1
2
3
4
5
6
INSERT INTO demo (
user_id,str,mtime
) VALUES
(6,'test','2021-10-09 07:00:00')
using TTL 86400;

可以看到相同的key user_id会聚合到一起,相同的user_id中mtime按照降序排列

1
2
3
4
5
6
7
8
9
10
11
12
13
cqlsh:test> select * from demo;

user_id | mtime | str
---------+---------------------------------+-------
5 | 2021-10-09 02:00:00.000000+0000 | Panda
1 | 2021-10-09 04:00:00.000000+0000 | Kay
1 | 2021-10-01 02:00:00.000000+0000 | Kay
2 | 2021-10-09 03:00:00.000000+0000 | Snail
2 | 2021-10-01 02:00:00.000000+0000 | Snail
6 | 2021-10-09 07:00:00.000000+0000 | test

(6 rows)

参考:Data Definition 

查看scylla集群状态#

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@6a30e1b8fc71 /]# nodetool status
Using /etc/scylla/scylla.yaml as the config file
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 172.17.0.2 1.49 MB 256 ? eaee8765-450d-4d4e-a7b5-2ed4c6b20df3 rack1
UN 172.17.0.5 1.04 MB 256 ? 17153bb7-f4f1-4436-bc49-f1eca3409040 rack1
UN 172.17.0.4 1.03 MB 256 ? 25fd6224-7edc-4161-bf49-ba6fe51c9f73 rack1
UN 172.17.0.6 1.04 MB 256 ? 3d5757d2-dc5f-4ba4-8d90-e02eb9d4c255 rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless  

查看表状态#

1
2
nodetool tablestats my_ks.my_tb

增加字段#

1
2
ALTER TABLE xx.xx ADD col_name col_type;

删除scylla表#

 

1
2
drop table xx.xx;

 

4.spark和scyllaDB集成#

使用spark读写scyllaDB #

由于scyllaDB兼容cassandra API,所以可以参考:

1
2
https://docs.microsoft.com/en-us/azure/cosmos-db/cassandra/spark-create-operations

或者可以参考我的文章:Spark学习笔记——读写ScyllaDB

 

其他:sizing-your-scylla-cluster

MySQL 亿级数据迁移 之 Cassandra概述