Elastic Search
Currenty, Elastic Search is used by the UI to search for datasets based on top-level metadata (sample, dataset and investigation name). More precisely the user interface uses ReactiveSearch that is a library for components done in REACT that allows display results from elastic search in a REACT application. In order to allow ReactiveSearch to get the results ICAT+ implemements a endpoint that allows multiple searchs:
/elasticsearch/{sessionId}/datasets/_msearch
Note: this endpoint is supposed to be used only by ReactiveSearch.
Installation
Easiest way is to use a docker container:
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.6.2
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.6.2
For development purpose someone can also install Kibana in the same way:
docker pull docker.elastic.co/kibana/kibana:7.6.2
docker run --link YOUR_ELASTICSEARCH_CONTAINER_NAME_OR_ID:elasticsearch -p 5601:5601 {docker-repo}:{version}
Populating Index
Our approach is to create a single index called /dataset. It means that we will index only the datasets of ICAT and not the datafiles or investigation but the dataset_document might have parameters from the investigation i.e: investigationName
Dataset Document
The next endpoint will convert a dataset from ICAT into a dataset_document
/catalogue/{sessionId}/dataset/id/{datasetIds}/dataset_document
This creates an Flattening data structure that makes the search simpler. A dataset documents looks like:
[
{
"id": 142795543,
"name": "mesh-AFAMIN-revi-B5-1_1_1719731",
"startDate": "2016-04-08T19:28:20.000+02:00",
"endDate": "2018-11-06T16:48:51.432+01:00",
"location": "/data/id30a1/inhouse/opid30a1/20160408/RAW_DATA/AFAMIN/AFAMIN-revi-B5-1/MXPressA_01",
"sampleName": "AFAMIN-revi-B5-1",
"MX_template": "mesh-AFAMIN-revi-B5-1_1_####.cbf",
"MX_numberOfImages": "24",
"InstrumentSource_mode": "16 bunch",
"MX_oscillationOverlap": "0",
"MX_scanType": "OSC",
"MX_aperture": "50 um",
"MX_detectorDistance": "234.925",
"MX_beamSizeAtSampleY": "0.05",
"InstrumentMonochromator_wavelength": 0.966,
"MX_beamShape": "ellipse",
"MX_motors_name": "y z sampx sampy phi kappa chi kappa_phi zoom focus phiz phiy",
"MX_transmission": "100",
"MX_oscillationStart": "149.5",
"MX_resolution": "2",
"MX_oscillationRange": "0.0416667",
"MX_startImageNumber": "49",
"MX_exposureTime": "0.1",
"MX_dataCollectionId": "1719731",
"MX_fluxEnd": "143000000000",
"MX_flux": "142000000000",
"MX_yBeam": "146.858",
"MX_directory": "/data/id30a1/inhouse/opid30a1/20160408/RAW_DATA/AFAMIN/AFAMIN-revi-B5-1/MXPressA_01",
"MX_xBeam": "129.056",
"MX_beamSizeAtSampleX": "0.05",
"MX_motors_value": "2.041 -0.596 -0.011 0.157 149.5 0.0 0.0 0.0 2.0 0.0 0.0 -0.605",
"fileCount": "24",
"volume": "59492064",
"elapsedTime": "81382831",
"ResourcesGallery": "5be1b7d89885c253675ceebc 5be1b7d89885c253675ceec0",
"ResourcesGalleryFilePaths": "/data/pyarch/2016/id30a1/opid30a1/20160408/RAW_DATA/AFAMIN/AFAMIN-revi-B5-1/MXPressA_01/mesh-AFAMIN-revi-B5-1_1_0049.jpeg,/data/pyarch/2016/id30a1/opid30a1/20160408/RAW_DATA/AFAMIN/AFAMIN-revi-B5-1/MXPressA_01/AFAMIN-revi-B5-1_1_snapshot_before_mesh.png",
"startTime": "2016-04-08 19:28:20",
"datasetName": "mesh-AFAMIN-revi-B5-1_1_1719731",
"dataArchived": "True",
"parametersCount": 33,
"MX_motors": [
{
"name": "y",
"numericValue": 2.041,
"stringValue": "2.041"
},
{
"name": "z",
"numericValue": -0.596,
"stringValue": "-0.596"
},
{
"name": "sampx",
"numericValue": -0.011,
"stringValue": "-0.011"
},
{
"name": "sampy",
"numericValue": 0.157,
"stringValue": "0.157"
},
{
"name": "phi",
"numericValue": 149.5,
"stringValue": "149.5"
},
{
"name": "kappa",
"numericValue": 0,
"stringValue": "0.0"
},
{
"name": "chi",
"numericValue": 0,
"stringValue": "0.0"
},
{
"name": "kappa_phi",
"numericValue": 0,
"stringValue": "0.0"
},
{
"name": "zoom",
"numericValue": 2,
"stringValue": "2.0"
},
{
"name": "focus",
"numericValue": 0,
"stringValue": "0.0"
},
{
"name": "phiz",
"numericValue": 0,
"stringValue": "0.0"
},
{
"name": "phiy",
"numericValue": -0.605,
"stringValue": "-0.605"
}
],
"investigationId": 13175377,
"investigationName": "OPID-1",
"investigationTitle": "opid-1",
"investigationVisitId": "id30a1",
"escompactsearch": "id30a1 OPID-1 ",
"estype": "dataset"
}
]
Ingestion
In order to populate we need to store each dataset_document into elastic search. We are doing so by using this scripts. The index should be updated automatically when the data on ICAT changes i.e: when a new dataset has been stored into ICAT.
This script will connect to ICAT to get the investigation and the will use the ICAT+ end point to convert the datasets into documents that will be stored in the elastic search index by using the function BULK.
Requirements
pip install elasticsearch
Script
The configuration properties should be changed and adapted to your environment. You can find the script here.
Search by using Kibana
Kibana has a development console that allows to test its search capabilities .
Queries can be performed by using /datasets/search. For example:
GET /datasets/_search
{
"query": {
"bool": {
"must": {
"bool" : {
"should": [
{ "match": { "definition": "SXM" }},
{ "match": { "InstrumentMonochromatorCrystal_type": "Si" }}
],
"must": { "match": { "Sample_name": "fe2streptor2" }}
}
}
}
}
}