Elasticsearch

Elasticsearch is used in order to provide the main search functionality within Gentics Mesh.

When enabled it is possible to search for:

Search queries can be executed via the dedicated search REST endpoints or GraphQL.

TL;DR

You can use Elasticsearch queries to search for data. Please note that the format of the documents which can be searched differs from the format which Gentics Mesh returns by the REST API. This difference will affect your queries.

Integration details

Data format

The JSON format of stored documents within the Elasticsearch differ from the JSON format that is returned via regular Gentics Mesh endpoints. Thus it is important to know the Elasticsearch document format when building an Elasticsearch query.

Permission handling

Internally Gentics Mesh will check which roles of the user match up with the needed roles of the documents and thus only return elements which are visible by the user. This is done by nesting the input query inside of an outer boolean query which includes the needed filter terms.

Limitations

It is not possible to search for specific individual versions. Instead only published and draft versions per project branch are stored in the search index.
The stored documents within the Elasticsearch indices do not contain all properties which are otherwise available via REST. Only directly accessible values which have minimal dependencies to other elements are stored in order to keep the update effort manageable.

Configuration

The Elasticsearch connection can be configured within the mesh.yml configuration file.

search:
  url: "http://localhost:9200"
  timeout: 8000
  startEmbedded: false
  embeddedArguments: "-Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75\
    \ -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -client -Xss1m -Djava.awt.headless=true\
    \ -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true\
    \ -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0\
    \ -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -XX:+HeapDumpOnOutOfMemoryError"
Configuration Type Default Description

search.url

String

http://localhost:9200

URL to the Elasticsearch server.

search.timeout

Number

3000

Timeout for interactions with the search server.

search.startEmbedded

Boolean

true

Flag that is used to enable or disable the automatic startup and handling of the embedded Elasticsearch server.

search.startupTimeout

Number

45

Timeout for the Elasticsearch start and connection processes.

search.embeddedArguments

String

See above

Set the JVM arguments for the embedded Elasticsearch server process.

search.bulkLimit

Number

2000

Upper size limit for bulk requests.

search.prefix

String

mesh-

Elasticsearch installation prefix. Multiple Gentics Mesh installations with different prefixes can utilize the same Elasticsearch server.

Dedicated Elasticsearch

It is recommended to use a dedicated Elasticsearch server. You can disable the embedded Elasticsearch server via by setting search.startEmbedded to false. Your search.url parameter must point to the Elasticsearch installation that you want to use.

You can also use environment variables to configure Gentics Mesh and change these settings.

Embedded Elasticsearch

Gentics Mesh will setup and start the Elasticsearch server if the search.startEmbedded property is set to true. The server installation will be placed in the current working directory and started. A watchdog will continuously check the process and restart it when a crash has been detected.

Support for embedded Elasticsearch will be dropped in the future.

It is also possible to completely turn off the search support by setting the search.startEmbedded property to false and the search.url property to null.

The Gentics Mesh UI currently requires the Elasticsearch support to function correctly.

Compatibility

We currently run and test against Elasticsearch version 6.1.2. Other versions have not yet been tested.

REST endpoints

Search requests are handled by the /api/v1/search or /api/v1/:projectName/search endpoints.

Examples / Queries

Users

Endpoint: /api/v1/search/users

{
  "query": {
      "simple_query_string" : {
          "query": "myusername*",
          "fields": ["username.raw"],
          "default_operator": "and"
      }
  }
}

Groups

Endpoint: /api/v1/search/groups

{
  "query": {
      "simple_query_string" : {
          "query": "testgroup*",
          "fields": ["name.raw^5"],
          "default_operator": "and"
      }
  }
}

Roles

Endpoint: /api/v1/search/roles

Nodes

Endpoint: /api/v1/search/nodes

Search nodes by schema name

Listed below is an example search query which can be posted to /api/v1/search/nodes in order to find all nodes across all projects which were created using the content schema. The found nodes will be sorted ascending by creator.

{
  "sort" : {
     "created" : { "order" : "asc" }
  },
  "query":{
    "bool" : {
      "must" : {
        "term" : { "schema.name" : "content" }
       }
    }
  }
}

Search nodes by micronode

Search nodes by micronode field values

Find all nodes which have a micronode list field (vcardlist) that contain at least one micronode which contains the two string fields (firstName, lastName) with the values ("Joe", "Doe"):

{
  "query": {
    "nested": {
      "path": "fields.vcardlist",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "fields.vcardlist.fields.firstName": "Joe"
              }
            },
            {
              "match": {
                "fields.vcardlist.fields.lastName": "Doe"
              }
            }
          ]
        }
      }
    }
  }
}

Search tagged nodes

Search nodes which are tagged 'Solar' and 'Blue'

The tags field is a nested field and thus a nested query must be used to match the two tags. Please note that you need to use match_phrase because you want to match the whole tag name. Using match would cause elasticsearch to match any of trigram found within the tag name value.

{
  "query": {
    "nested": {
      "path": "tags",
      "query": {
        "bool": {
          "must": [
            {
              "match_phrase": {
                "tags.name": "Solar"
              }
            },
            {
              "match_phrase": {
                "tags.name": "Blue"
              }
            }
          ]
        }
      }
    }
  }
}

Search nodes by geolocation of images

Search images which were taken in specific areas

GPS information from images will automatically be extracted and added to the search index. It is possible to run a geo search to locate images within a specific area.

{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "fields.binary.metadata.location" : {
                        "top_left" : {
                            "lat" : 50.0,
                            "lon" : 10.0
                        },
                        "bottom_right" : {
                            "lat" : -40.0,
                            "lon" : 19.0
                        }
                    }
                }
            }
        }
    }
}

Projects

Endpoint: /api/v1/search/projects

Tags

Endpoint: /api/v1/search/tags

{
  "query": {
    "nested": {
      "path": "tags",
      "query": {
        "bool": {
          "must": {
            "match_phrase": {
              "tags.name": "Twinjet"
            }
          }
        }
      }
    }
  }
}

Tag Families

Endpoint: /api/v1/search/tagFamilies

{
  "query": {
    "nested": {
      "path": "tagFamilies.colors.tags",
      "query": {
        "match": {
          "tagFamilies.colors.tags.name": "red"
        }
      }
    }
  }
}

Schemas

Endpoint: /api/v1/search/schemas

Microschemas

Endpoint: /api/v1/search/microschemas

Paging

The paging query parameters are perPage and page . It is important to note that page is 1-based and perPage can be set to 0 in order to just retrieve a count of elements.

Additionally it is also possible to use the /api/v1/rawSearch or /api/v1/:projectName/rawSearch endpoints.

These endpoints will accept the same query but return a Elasticsearch multi search response instead of the typical Gentics Mesh list response. This is useful if you want to use for example the Elasticsearch highlighing and aggregation features. The endpoint will automatically select the needed indices and modify the query in order to add needed permission checks.

Index Synchronization

The POST /api/v1/search/sync endpoint can be used to invoke a manual sync of the search index.

The index sync operation will automatically be invoked when Mesh is being started and a unclean shutdown has been detected.

You can also recreate all indices if needed via the POST /api/v1/search/clear endpoint.

This operation will remove all indices which have been created by Mesh and rebuild them one at a time.

Starting with Gentics Mesh 0.21.0 it is possible to add the contents of file uploads to the search index. The Elasticsearch Ingest Attachment Plugin will be utilized when installed to process text file uploads (PDF, DOC, DOCX).

This is especially useful if you also want to search within document file uploads.

Currently uploads which have one of these mimetypes will be processed:

  • application/pdf

  • application/msword

  • text/rtf

  • application/vnd.ms-powerpoint

  • application/vnd.oasis.opendocument.text

  • text/plain

  • application/rtf

Example binary field within document:

…
  "binaryField" : {
    "filename" : "mydoc.pdf",
    "sha512sum" : "16d3aeae9869d2915dda30866c2d7b77f50dc668daa3a49d2bc6eb6349cf6e895099349b7f8240174a788db967c87947b6a2fd41a353eec99a20358dfd4c9211",
    "mimeType" : "application/pdf",
    "filesize" : 200,
    "dominantColor" : null,
    "width" : null,
    "height" : null,
    "file" : {
      "language" : "ro",
      "content" : "Lorem ipsum dolor sit amet",
      "date" : "2018-05-04T21:28:30Z",
      "author" : "Joe Doe",
      "title" : "My Document Title"
    }
  }
…

Gentics Mesh will automatically create ingest piplines for each node index. The created pipelines will be utilized when storing documents. The name of these pipelines matches the index name. No longer needed pipelines will automatically be removed.

All documents need to be re-index if you choose to enable the Elasticsearch plugin.

GraphQL

It is possible to nest Elasticsearch queries within the GraphQL query in order to filter elements.

Document format

The following section contains document examples which are useful when creating queries. Gentics Mesh transforms elements into these documents which then can be stored within Elasticsearch.

Users

{
  "uuid" : "589319933be24ec79319933be24ec7fe",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "username" : "joe1",
  "emailaddress" : "joe1@nowhere.tld",
  "firstname" : "Joe",
  "lastname" : "Doe",
  "groups" : {
    "name" : [ "editors", "superEditors" ],
    "uuid" : [ "df81c23d9ff1450081c23d9ff195005e", "df81c23d9ff1450081c23d9ff195005e" ]
  },
  "_roleUuids" : [ ],
  "version" : "ab7d5eb7"
}

Groups

{
  "name" : "adminGroup",
  "uuid" : "df81c23d9ff1450081c23d9ff195005e",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "version" : "ffb6a2da"
}

Roles

{
  "name" : "adminRole",
  "uuid" : "ddcddceb33d648318ddceb33d618314f",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "version" : "27844082"
}

Nodes

{
  "uuid" : "adaf48da8c124049af48da8c12a0493e",
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : "1970-01-01T00:00:00Z",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : "1970-01-01T00:00:00Z",
  "project" : {
    "name" : "dummyProject",
    "uuid" : "d3b1840b01464b54b1840b0146cb54f5"
  },
  "tags" : {
    "name" : [ "green", "red" ],
    "uuid" : [ "6e6f1d9f055447d2af1d9f055417d289", "6e6f1d9f055447d2af1d9f055417d289" ]
  },
  "tagFamilies" : {
    "colors" : {
      "uuid" : "f36400b436d244c3a400b436d244c3f4",
      "tags" : [ {
        "name" : "green",
        "uuid" : "6e6f1d9f055447d2af1d9f055417d289"
      }, {
        "name" : "red",
        "uuid" : "6e6f1d9f055447d2af1d9f055417d289"
      } ]
    }
  },
  "_roleUuids" : [ ],
  "parentNode" : {
    "uuid" : "adaf48da8c124049af48da8c12a0493e"
  },
  "language" : "de",
  "schema" : {
    "name" : "content",
    "uuid" : "2aa83a2b3cba40a1a83a2b3cba90a1de",
    "version" : null
  },
  "fields" : {
    "date" : 1542746513,
    "string" : "The name value",
    "htmlList" : [ "somehtml", "somehtml", "somehtml" ],
    "nodeList" : [ "adaf48da8c124049af48da8c12a0493e", "adaf48da8c124049af48da8c12a0493e", "adaf48da8c124049af48da8c12a0493e" ],
    "number" : 0.146,
    "node" : "adaf48da8c124049af48da8c12a0493e",
    "boolean" : true,
    "stringList" : [ "The name value", "The name value", "The name value" ],
    "micronode" : {
      "microschema" : {
        "name" : null,
        "uuid" : null
      },
      "fields-null" : {
        "latitude" : 16.373063840833,
        "longitude" : 16.373063840833
      }
    },
    "html" : "somehtml",
    "numberList" : [ 0.146, 0.146, 0.146 ],
    "booleanList" : [ "true", "true", "true" ],
    "dateList" : [ 1542746513, 1542746513, 1542746513 ]
  },
  "displayField" : {
    "key" : "string",
    "value" : null
  },
  "version" : "cbfd7867"
}

Projects

{
  "name" : "dummyProject",
  "uuid" : "d3b1840b01464b54b1840b0146cb54f5",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "version" : "6714dafb"
}

Tags

{
  "name" : "red",
  "uuid" : "6e6f1d9f055447d2af1d9f055417d289",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "tagFamily" : {
    "name" : "colors",
    "uuid" : "f36400b436d244c3a400b436d244c3f4"
  },
  "project" : {
    "name" : "dummyProject",
    "uuid" : "d3b1840b01464b54b1840b0146cb54f5"
  },
  "version" : "7df128ce"
}

Tag Families

{
  "name" : "colors",
  "uuid" : "f36400b436d244c3a400b436d244c3f4",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "tags" : {
    "name" : [ "red", "green" ],
    "uuid" : [ "6e6f1d9f055447d2af1d9f055417d289", "6e6f1d9f055447d2af1d9f055417d289" ]
  },
  "project" : {
    "name" : "dummyProject",
    "uuid" : "d3b1840b01464b54b1840b0146cb54f5"
  },
  "_roleUuids" : [ ],
  "version" : "7f598ceb"
}

Microschemas

{
  "uuid" : "2715307deafc4ecc95307deafccecc10",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "name" : "geolocation",
  "_roleUuids" : [ ],
  "version" : "ffb6a2da"
}

Schemas

{
  "name" : "content",
  "description" : "Content schema",
  "uuid" : "2aa83a2b3cba40a1a83a2b3cba90a1de",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "version" : "4bc088df"
}

Custom mappings / index settings

The index settings for nodes can be configured within the schema json. Additionally it is also possible to add extra mappings to fields. This may be desired when if a field needs to be analyzed in a special way or a keyword field must be added.

An example for such Schema can be seen below. This schema contains additional tokenizer and analyzer which can be used to setup an index that is ready to be used for a full-text search which supports autocompletion and auto suggestion.

{
    "container": false,
    "name": "CustomSchema",
    "elasticsearch": {
        "analysis": {
            "filter": {
                "my_stop": {
                    "type": "stop",
                    "stopwords": "_english_"
                },
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "tokenizer": {
                "basicsearch": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 10,
                    "token_chars": [
                        "letter"
                    ]
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "char_filter": [
                        "html_strip"
                    ],
                    "filter": [
                        "lowercase",
                        "my_stop",
                        "autocomplete_filter"
                    ]
                },
                "basicsearch": {
                    "tokenizer": "basicsearch",
                    "char_filter": [
                        "html_strip"
                    ],
                    "filter": [
                        "my_stop",
                        "lowercase"
                    ]
                },
                "basicsearch_search": {
                    "char_filter": [
                        "html_strip"
                    ],
                    "tokenizer": "lowercase"
                }
            }
        }
    },
    "fields": [
        {
            "name": "content",
            "required": false,
            "elasticsearch": {
                "basicsearch": {
                    "type": "text",
                    "analyzer": "basicsearch",
                    "search_analyzer": "basicsearch_search"
                },
                "suggest": {
                    "type": "text",
                    "analyzer": "simple"
                },
                "auto": {
                    "type": "text",
                    "analyzer": "autocomplete"
                }
            },
            "type": "string"
        }
    ]
}

Custom mappings can currently only be specified for the following types:

  • string fields

  • html fields

  • string list fields

  • html list fields

  • binary fields

Index settings for other elements (e.g: Users, Roles etc) can currently not be configured.

Binary Fields

You can add custom mappings to the mimeType and file.content field. The elasticsearch property needs to contain a custom mapping for each type.

Example:

{
  "displayField": "name",
  "segmentField": "binary",
  "container": false,
  "description": "Image schema",
  "name": "image",
  "fields": [
    {
      "name": "name",
      "label": "Name",
      "required": false,
      "type": "string"
    },
    {
      "name": "binary",
      "label": "Image",
      "required": false,
      "type": "binary",
      "elasticsearch": {
        "mimeType": {
          "raw": {
            "type": "keyword",
            "index": true
          }
        },
        "file.content": {
          "raw": {
            "type": "keyword",
            "index": true
          }
        }
      }
    }
  ]
}