Elasticsearch

Elasticsearch is used in order to provide the main search functionality within Gentics Mesh.

When enabled it is possible to search for:

Search queries can be executed via the dedicated search REST endpoints or GraphQL.

TL;DR

You can use Elasticsearch queries to search for data. Please note that the format of the documents which can be searched differs from the format which Gentics Mesh returns by the REST API. This difference will affect your queries.

Integration details

Data format

The JSON format of stored documents within the Elasticsearch differ from the JSON format that is returned via regular Gentics Mesh endpoints. Thus it is important to know the Elasticsearch document format when building an Elasticsearch query.

Permission handling

Internally Gentics Mesh will check which roles of the user match up with the needed roles of the documents and thus only return elements which are visible by the user. This is done by nesting the input query inside of an outer boolean query which includes the needed filter terms.

Limitations

It is not possible to search for specific individual versions. Instead only published and draft versions per project branch are stored in the search index.
The stored documents within the Elasticsearch indices do not contain all properties which are otherwise available via REST. Only directly accessible values which have minimal dependencies to other elements are stored in order to keep the update effort manageable.
Unlike Mesh, Elasticsearch does not allow an unbounded page size in the search result. The default value for the perPage parameter is 10.

Configuration

The Elasticsearch connection can be configured within the mesh.yml configuration file.

search:
  url: "http://localhost:9200"
  timeout: 8000
  startEmbedded: false
  embeddedArguments: "-Xms1g -Xmx1g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75\
    \ -XX:+UseCMSInitiatingOccupancyOnly -XX:+AlwaysPreTouch -client -Xss1m -Djava.awt.headless=true\
    \ -Dfile.encoding=UTF-8 -Djna.nosys=true -XX:-OmitStackTraceInFastThrow -Dio.netty.noUnsafe=true\
    \ -Dio.netty.noKeySetOptimization=true -Dio.netty.recycler.maxCapacityPerThread=0\
    \ -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -XX:+HeapDumpOnOutOfMemoryError"
Configuration Type Default Description

search.url

String

http://localhost:9200

URL to the Elasticsearch server.

search.username

String

-

Username for basic authentication.

search.password

String

-

Password for basic authentication.

search.certPath

String

-

Path to the trusted server certificate (PEM format).

search.caPath

String

-

Path to the trusted CA certificate (PEM format).

search.hostnameVerification

Boolean

true

Flag to control SSL hostname verification.

search.timeout

Number

60000

Timeout for interactions with the search server.

search.startEmbedded

Boolean

true

Flag that is used to enable or disable the automatic startup and handling of the embedded Elasticsearch server.

search.embeddedArguments

String

Default JVM Arguments

Set the JVM arguments for the embedded Elasticsearch server process.

search.prefix

String

mesh-

Elasticsearch installation prefix. Multiple Gentics Mesh installations with different prefixes can utilize the same Elasticsearch server.

search.bulkLimit

Number

100

Upper size limit for bulk requests.

search.bulkLengthLimit

Number

5000000

Upper limit for the total encoded string length of the bulk requests.

search.eventBufferSize

Number

1000

Upper limit for mesh events that are to be mapped to elastic search requests.

search.bulkDebounceTime

Number

2000

The maximum amount of time in milliseconds between two bulkable requests before they are sent.

search.idleDebounceTime

Number

100

The maximum amount of time in milliseconds between two successful requests before the idle event is emitted.

search.retryInterval

Number

5000

The time in milliseconds between retries of elastic search requests in case of a failure.

search.retryLimit

Number

3

The amount of retries on a single request before the request is discarded.

search.waitForIdle

Boolean

true

If true, search endpoints wait for elasticsearch to be idle before sending a response.

search.includeBinaryFields

Boolean

true

If true, the content and metadata of binary fields will be included in the search index.

search.mappingMode

String

DYNAMIC

This setting controls the mapping mode of fields for Elasticsearch. When set to STRICT only fields which have a custom mapping will be added to Elasticsearch. Mode DYNAMIC will automatically use the Gentics Mesh default mappings which can be supplemented with custom mappings.

Dedicated Elasticsearch

It is recommended to use a dedicated Elasticsearch server. You can disable the embedded Elasticsearch server via by setting search.startEmbedded to false. Your search.url parameter must point to the Elasticsearch installation that you want to use.

You can also use environment variables to configure Gentics Mesh and change these settings.

Embedded Elasticsearch

Gentics Mesh will setup and start the Elasticsearch server if the search.startEmbedded property is set to true. The server installation will be placed in the current working directory and started. A watchdog will continuously check the process and restart it when a crash has been detected.

Support for embedded Elasticsearch will be dropped in the future.

It is also possible to completely turn off the search support by setting the search.startEmbedded property to false and the search.url property to null.

The Gentics Mesh UI currently requires the Elasticsearch support to function correctly.

Compatibility

We currently run and test against Elasticsearch version 6.8.0. Other versions have not yet been tested.

Security

Gentics Mesh supports basic authentication and TLS in-transit encrypted connections for Elasticsearch communication.

Encrypting communications

The Elasticsearch server can be configured to use TLS/SSL encryption. Details on this topic can be found in the Elasticsearch documentation.

You can specify the certificate chain (server certificate and common authority certificate) of the Elasticsearch server in the Gentics Mesh configuration in order to only trust this connection.

example settings
search:
  url: "https://127.0.0.1:9200"
  username: "elastic"
  password: "iucee7dohjaedemiShoshie9eiz4af0Oiceish6a"
  certPath: "certs/elastic-certificates.crt.pem"
  caPath: "certs/elastic-stack-ca.crt.pem"
  hostnameVerification: true
  startEmbedded: false
The certPath and caPath setting only accept certificate files in PEM format.
Elasticsearch servers which already use a trusted certificate don’t need to specify this cert in the configuration. It can however be configured to only trust the given cert chain.

The settings can be also configured via environment variables:

  • MESH_ELASTICSEARCH_CERT_PATH - Override the cert path

  • MESH_ELASTICSEARCH_CA_PATH - Override the ca path

  • MESH_ELASTICSEARCH_HOSTNAME_VERIFICATION - Override the hostname verification

Basic Authentication

Connections to Elasticsearch can be authenticated using a user. Details on how to configure this in Elasticsearch can be found in the Elasticsearch documentation.

When xpack.security.enabled has been enabled in Elasticsearch you can use the elasticsearch-setup-passwords

The authentication details can be configured using the search.username and search.password settings in the mesh.yml.

The username and password will only be used when specified. By default no user will be used.

The settings can be also configured via environment variables:

  • MESH_ELASTICSEARCH_USERNAME - Override the configured username

  • MESH_ELASTICSEARCH_PASSWORD - Override the configured password

REST endpoints

Search requests are handled by the /api/v2/search or /api/v2/:projectName/search endpoints.

If you want the search endpoints to wait until Elasticsearch has processed all pending changes to the index, you can set the query parameter ?wait=true. The default value can be configured in the search options.

If no query parameter and no configuration was provided, the wait parameter defaults to true to not break any existing implementations. In the future this will change to false.

Examples / Queries

Users

Endpoint: /api/v2/search/users

{
  "query": {
      "simple_query_string" : {
          "query": "myusername*",
          "fields": ["username.raw"],
          "default_operator": "and"
      }
  }
}

Groups

Endpoint: /api/v2/search/groups

{
  "query": {
      "simple_query_string" : {
          "query": "testgroup*",
          "fields": ["name.raw^5"],
          "default_operator": "and"
      }
  }
}

Roles

Endpoint: /api/v2/search/roles

Nodes

Endpoint: /api/v2/search/nodes

Search nodes by schema name

Listed below is an example search query which can be posted to /api/v2/search/nodes in order to find all nodes across all projects which were created using the content schema. The found nodes will be sorted ascending by creator.

{
  "sort" : {
     "created" : { "order" : "asc" }
  },
  "query":{
    "bool" : {
      "must" : {
        "term" : { "schema.name" : "content" }
       }
    }
  }
}

Search nodes by micronode

Search nodes by micronode field values

Find all nodes which have a micronode list field (vcardlist) that contain at least one micronode which contains the two string fields (firstName, lastName) with the values ("Joe", "Doe"):

{
  "query": {
    "nested": {
      "path": "fields.vcardlist",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "fields.vcardlist.fields.firstName": "Joe"
              }
            },
            {
              "match": {
                "fields.vcardlist.fields.lastName": "Doe"
              }
            }
          ]
        }
      }
    }
  }
}

Search tagged nodes

Search nodes which are tagged 'Solar' and 'Blue'

The tags field is a nested field and thus a nested query must be used to match the two tags. Please note that you need to use match_phrase because you want to match the whole tag name. Using match would cause elasticsearch to match any of trigram found within the tag name value.

{
  "query": {
    "nested": {
      "path": "tags",
      "query": {
        "bool": {
          "must": [
            {
              "match_phrase": {
                "tags.name": "Solar"
              }
            },
            {
              "match_phrase": {
                "tags.name": "Blue"
              }
            }
          ]
        }
      }
    }
  }
}

Search nodes which have been tagged with 'Diesel' for tagFamily 'Fuels':

{
    "query": {
        "nested": {
            "ignore_unmapped": "true",
            "path": "tagFamilies.Fuels.tags",
            "query": {
                "match": {
                    "tagFamilies.Fuels.tags.name": "Diesel"
                }
            }
        }
    }
}
The "ignore_unmapped": "true" will suppress errors which may be returned due to missing paths. This is useful when invoking /rawSearch requests which will directly return the Elasticsearch response. This response would otherwise be cluttered with a lot of errors due to missing path. This can be omitted for these kinds of searches.

Search nodes by geolocation of images

Search images which were taken in specific areas

GPS information from images will automatically be extracted and added to the search index. It is possible to run a geo search to locate images within a specific area.

{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "fields.binary.metadata.location" : {
                        "top_left" : {
                            "lat" : 50.0,
                            "lon" : 10.0
                        },
                        "bottom_right" : {
                            "lat" : -40.0,
                            "lon" : 19.0
                        }
                    }
                }
            }
        }
    }
}

Projects

Endpoint: /api/v2/search/projects

Tags

Endpoint: /api/v2/search/tags

{
  "query": {
    "nested": {
      "path": "tags",
      "query": {
        "bool": {
          "must": {
            "match_phrase": {
              "tags.name": "Twinjet"
            }
          }
        }
      }
    }
  }
}

Tag Families

Endpoint: /api/v2/search/tagFamilies

{
  "query": {
    "nested": {
      "path": "tagFamilies.colors.tags",
      "query": {
        "match": {
          "tagFamilies.colors.tags.name": "red"
        }
      }
    }
  }
}

Schemas

Endpoint: /api/v2/search/schemas

Microschemas

Endpoint: /api/v2/search/microschemas

Paging

The paging query parameters are perPage and page . It is important to note that page is 1-based and perPage can be set to 0 in order to just retrieve a count of elements.

Additionally it is also possible to use the /api/v2/rawSearch or /api/v2/:projectName/rawSearch endpoints.

These endpoints will accept the same query but return a Elasticsearch multi search response instead of the typical Gentics Mesh list response. This is useful if you want to use for example the Elasticsearch highlighing and aggregation features. The endpoint will automatically select the needed indices and modify the query in order to add needed permission checks.

Index Synchronization

The POST /api/v2/search/sync endpoint can be used to invoke a manual sync of the search index.

The index sync operation will automatically be invoked when Mesh is being started and a unclean shutdown has been detected.

You can also recreate all indices if needed via the POST /api/v2/search/clear endpoint.

This operation will remove all indices which have been created by Mesh and rebuild them one at a time.

Document uploads to Gentics Mesh will automatically be parsed and the containing text will be extracted. This information will also be added to the search index and thus it is possible to search for text within uploaded documents.

Currently uploads which have one of these mimetypes will be processed:

  • application/pdf

  • application/msword

  • text/rtf

  • application/vnd.ms-powerpoint

  • application/vnd.oasis.opendocument.text

  • text/plain

  • application/rtf

Example binary field within document:

…
  "binaryField" : {
    "filename" : "mydoc.pdf",
    "sha512sum" : "16d3aeae9869d2915dda30866c2d7b77f50dc668daa3a49d2bc6eb6349cf6e895099349b7f8240174a788db967c87947b6a2fd41a353eec99a20358dfd4c9211",
    "mimeType" : "application/pdf",
    "filesize" : 200,
    "file" : {
      "content" : "Lorem ipsum dolor sit amet"
    }
  }
…

GraphQL

It is possible to nest Elasticsearch queries within the GraphQL query in order to filter elements.

Document format

The following section contains document examples which are useful when creating queries. Gentics Mesh transforms elements into these documents which then can be stored within Elasticsearch.

Users

{
  "uuid" : "589319933be24ec79319933be24ec7fe",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "username" : "joe1",
  "emailaddress" : "joe1@nowhere.tld",
  "firstname" : "Joe",
  "lastname" : "Doe",
  "groups" : {
    "name" : [ "editors", "superEditors" ],
    "uuid" : [ "df81c23d9ff1450081c23d9ff195005e", "df81c23d9ff1450081c23d9ff195005e" ]
  },
  "_roleUuids" : [ ],
  "version" : "ab7d5eb7"
}

Groups

{
  "name" : "adminGroup",
  "uuid" : "df81c23d9ff1450081c23d9ff195005e",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "version" : "ffb6a2da"
}

Roles

{
  "name" : "adminRole",
  "uuid" : "ddcddceb33d648318ddceb33d618314f",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "version" : "27844082"
}

Nodes

{
  "uuid" : "adaf48da8c124049af48da8c12a0493e",
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : "1970-01-01T00:00:00Z",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : "1970-01-01T00:00:00Z",
  "project" : {
    "name" : "dummyProject",
    "uuid" : "d3b1840b01464b54b1840b0146cb54f5"
  },
  "tags" : {
    "name" : [ "green", "red" ],
    "uuid" : [ "6e6f1d9f055447d2af1d9f055417d289", "6e6f1d9f055447d2af1d9f055417d289" ]
  },
  "tagFamilies" : {
    "colors" : {
      "uuid" : "f36400b436d244c3a400b436d244c3f4",
      "tags" : [ {
        "name" : "green",
        "uuid" : "6e6f1d9f055447d2af1d9f055417d289"
      }, {
        "name" : "red",
        "uuid" : "6e6f1d9f055447d2af1d9f055417d289"
      } ]
    }
  },
  "_roleUuids" : [ ],
  "parentNode" : {
    "uuid" : "adaf48da8c124049af48da8c12a0493e"
  },
  "language" : "de",
  "schema" : {
    "name" : "content",
    "uuid" : "2aa83a2b3cba40a1a83a2b3cba90a1de",
    "version" : null
  },
  "fields" : {
    "date" : 1542746513,
    "string" : "The name value",
    "htmlList" : [ "somehtml", "somehtml", "somehtml" ],
    "nodeList" : [ "adaf48da8c124049af48da8c12a0493e", "adaf48da8c124049af48da8c12a0493e", "adaf48da8c124049af48da8c12a0493e" ],
    "number" : 0.146,
    "node" : "adaf48da8c124049af48da8c12a0493e",
    "boolean" : true,
    "stringList" : [ "The name value", "The name value", "The name value" ],
    "micronode" : {
      "microschema" : {
        "name" : null,
        "uuid" : null
      },
      "fields-null" : {
        "latitude" : 16.373063840833,
        "longitude" : 16.373063840833
      }
    },
    "html" : "somehtml",
    "numberList" : [ 0.146, 0.146, 0.146 ],
    "booleanList" : [ "true", "true", "true" ],
    "dateList" : [ 1542746513, 1542746513, 1542746513 ]
  },
  "displayField" : {
    "key" : "string",
    "value" : null
  },
  "version" : "cbfd7867"
}

Projects

{
  "name" : "dummyProject",
  "uuid" : "d3b1840b01464b54b1840b0146cb54f5",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "version" : "6714dafb"
}

Tags

{
  "name" : "red",
  "uuid" : "6e6f1d9f055447d2af1d9f055417d289",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "tagFamily" : {
    "name" : "colors",
    "uuid" : "f36400b436d244c3a400b436d244c3f4"
  },
  "project" : {
    "name" : "dummyProject",
    "uuid" : "d3b1840b01464b54b1840b0146cb54f5"
  },
  "version" : "7df128ce"
}

Tag Families

{
  "name" : "colors",
  "uuid" : "f36400b436d244c3a400b436d244c3f4",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "tags" : {
    "name" : [ "red", "green" ],
    "uuid" : [ "6e6f1d9f055447d2af1d9f055417d289", "6e6f1d9f055447d2af1d9f055417d289" ]
  },
  "project" : {
    "name" : "dummyProject",
    "uuid" : "d3b1840b01464b54b1840b0146cb54f5"
  },
  "_roleUuids" : [ ],
  "version" : "7f598ceb"
}

Microschemas

{
  "uuid" : "2715307deafc4ecc95307deafccecc10",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "name" : "geolocation",
  "_roleUuids" : [ ],
  "version" : "ffb6a2da"
}

Schemas

{
  "name" : "content",
  "description" : "Content schema",
  "uuid" : "2aa83a2b3cba40a1a83a2b3cba90a1de",
  "creator" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "created" : null,
  "editor" : {
    "uuid" : "589319933be24ec79319933be24ec7fe"
  },
  "edited" : null,
  "_roleUuids" : [ ],
  "version" : "4bc088df"
}

Custom mappings / index settings

The index settings for nodes can be configured within the schema json. Additionally it is also possible to add extra mappings to fields. This may be desired when if a field needs to be analyzed in a special way or a keyword field must be added.

An example for such Schema can be seen below. This schema contains additional tokenizer and analyzer which can be used to setup an index that is ready to be used for a full-text search which supports autocompletion and auto suggestion.

{
    "container": false,
    "name": "CustomSchema",
    "elasticsearch": {
        "analysis": {
            "filter": {
                "my_stop": {
                    "type": "stop",
                    "stopwords": "_english_"
                },
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "tokenizer": {
                "basicsearch": {
                    "type": "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 10,
                    "token_chars": [
                        "letter"
                    ]
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "char_filter": [
                        "html_strip"
                    ],
                    "filter": [
                        "lowercase",
                        "my_stop",
                        "autocomplete_filter"
                    ]
                },
                "basicsearch": {
                    "tokenizer": "basicsearch",
                    "char_filter": [
                        "html_strip"
                    ],
                    "filter": [
                        "my_stop",
                        "lowercase"
                    ]
                },
                "basicsearch_search": {
                    "char_filter": [
                        "html_strip"
                    ],
                    "tokenizer": "lowercase"
                }
            }
        }
    },
    "fields": [
        {
            "name": "content",
            "required": false,
            "elasticsearch": {
                "basicsearch": {
                    "type": "text",
                    "analyzer": "basicsearch",
                    "search_analyzer": "basicsearch_search"
                },
                "suggest": {
                    "type": "text",
                    "analyzer": "simple"
                },
                "auto": {
                    "type": "text",
                    "analyzer": "autocomplete"
                }
            },
            "type": "string"
        }
    ]
}

Custom mappings can currently only be specified for the following types:

  • string fields

  • html fields

  • string list fields

  • html list fields

  • binary fields

Index settings for other elements (e.g: Users, Roles etc) can currently not be configured.

Mapping Modes

Gentics Mesh currently supports two way of managing mappings for your content fields.

The mode can be controlled via the search.mappingMode option or the MESH_ELASTICSEARCH_MAPPING_MODE environment variable.

The mapping mode setting influences how elasticsearch mappings for fields will be generated.

DYNAMIC (default)

In this mode Gentics Mesh will automatically create default mappings for any content field. The custom mapping which can be added via the elasticsearch parameter will be used to supplement the default mapping.

STRICT

In the strict mode Gentics Mesh will not generate default mappings. Only field mappings which have been specified in the elasticsearch property will be added to the index mapping.

Please note that your custom mapping will directly replace the field mapping in this mode. If you switch modes you need to adapt your elasticsearch property value. The setting will no longer be added as a nested element.

This mode is useful if you want to have finer control of what contents should be added to Elasticsearch.

Fields which have no mapping will by-default also not added to the source document.

Example:

{
	"displayField": "name",
	"container": true,
	"name": "category",
	"fields": [
		{
			"name": "name",
			"label": "Name",
			"required": true,
			"type": "string"
		},
		{
			"name": "description",
			"label": "Description",
			"required": false,
			"type": "string",
			"elasticsearch": {
				"type": "keyword",
				"index": false,
				"fields": {
					"search": {
						"type": "text",
						"analyzer": "trigrams"
					}
				}
			}
		}
	]
}
You can use the POST /api/v2/utilities/validateSchema endpoint to validate your schema and check what index mapping is actually being generated.
If you just want to add your field to the source document you can create a mapping with "index": false.

Binary Fields

You can add custom mappings to the mimeType and file.content field. The elasticsearch property needs to contain a custom mapping for each type.

Example:

{
  "displayField": "name",
  "segmentField": "binary",
  "container": false,
  "description": "Image schema",
  "name": "image",
  "fields": [
    {
      "name": "name",
      "label": "Name",
      "required": false,
      "type": "string"
    },
    {
      "name": "binary",
      "label": "Image",
      "required": false,
      "type": "binary",
      "elasticsearch": {
        "mimeType": {
          "raw": {
            "type": "keyword",
            "index": true
          }
        },
        "file.content": {
          "raw": {
            "type": "keyword",
            "index": true
          }
        }
      }
    }
  ]
}