tonglin0325的个人主页

Solr学习笔记——导入JSON数据

1.导入JSON数据的方式有两种,一种是在web管理界面中导入,另一种是使用curl命令来导入

1
2
curl http://localhost:8983/solr/baikeperson/update/json?commit=true --data-binary @/home/XXX/下载/person/test1.json -H 'Content-type:text/json; charset=utf-8'

2.导入的时候注意格式

使用curl可以导入的格式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
  "add": {
    "overwrite": true,
    "doc": {
      "id": 1,
      "name": "Some book",
      "author": ["John", "Marry"]
    }
  },
  "add": {
    "overwrite": true,
    "boost": 2.5,
    "doc": {
      "id": 2,
      "name": "Important Book",
      "author": ["Harry", "Jane"]
    }
  },
  "add": {
    "overwrite": true,
    "doc": {
      "id": 3,
      "name": "Some other book",
      "author": "Marry"
    }
  }
}

 

在web界面中可以导入的格式

1
2
{"title":"许宝江","url":"7254863","chineseName":"许宝江","sex":"男","occupation":" 滦县农业局局长","nationality":"中国"}

不可以导入的格式

1
2
3
{"title":"鲍志成","url":"2074015","chineseName":"鲍志成","occupation":"医师","nationality":"中国","birthDate":"1901年","deathDate":"1973年","graduatedFrom":"香港大学"}
{"title":"许宝江","url":"7254863","chineseName":"许宝江","sex":"男","occupation":" 滦县农业局局长","nationality":"中国"}

格式转换的Scala代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import java.io.{File, PrintWriter}
import scala.io.Source

/**
* Created by common on 17-5-10.
*/
object SplitJson {

def main(args: Array[String]): Unit = {

val inputPath = "/home/common/下载/person/part-r-00000-47c2fce6-87cb-4a33-af2c-309a621b070f.json"

val outputPath = "/home/common/下载/person/split.json"
val pw = new PrintWriter(new File(outputPath))

val s = Source.fromFile(new File(inputPath)).getLines()
pw.append("{\"add\": {\"overwrite\": true,\"doc\":")
s.foreach { x =>
if (s.hasNext) pw.append(s"$x").write("},\"add\": {\"overwrite\": true,\"doc\": \n")
else pw.append(s"$x").write("}}\n")
}
pw.flush
pw.close
}


}

 导入成功将会返回,导入之后需要等上一段时间才会生成索引

1
2
{"responseHeader":{"status":0,"QTime":86}}

注意有可能还需要在下面的地址中加上

1
2
/var/solr/data/baikeperson/conf

1
2
<requestHandler name="/update/json" class="solr.JsonUpdateRequestHandler" />

导入了28W条人物百科数据

 查询一下岳云鹏