Monitoring

Gentics Mesh exposes monitoring data via an un-authenticated monitoring server. In addition to the default API server which runs on default port 8080 an additional monitoring server will be started which listens on localhost:8081 by default.

The monitoring server is un-authenticated and thus should not be exposed to the public. By default it will only bind to localhost:8081.

Configuration

The network settings for the monitoring server can be configured within the mesh.yml or via environment settings.

The montoring server can also be turned off via the enabled flag.

mesh.yml
…
monitoring:
  enabled: true
  port: 8081
  host: "127.0.0.1"
Container deployments which need to access the monitoring API need to change the monitoring server binding in order to expose the port.
This can be done via the environment http host setting: MESH_MONITORING_HTTP_HOST=0.0.0.0

Liveness Probe in Docker

The liveness of the Gentics Mesh instance running in a docker container can be checked by executing the command java -cp mesh.jar com.gentics.mesh.monitor.jmx.JMXProbe inside the container. This has some advantages over the liveness probe using the HTTP endpoint: Liveness checked with JMX does not depend on the http verticles and vert.x to be fully started, but only on the Java process to be running and having the JMX bean registered. This means that - especially when running Gentics Mesh in a cluster - the instances will become live very soon, whereas starting of vert.x and the http verticles may take a while, because this requires OrientDB cluster to be synchronized first.

Endpoints

Liveness Probe

The healthcheck liveness probe endpoint GET /api/v2/health/live indicates if server is working as expected when status code 200 is returned.

Readiness Probe

The readiness probe endpoint GET /api/v2/health/ready returns status code 200 if the server is accepting connections. This endpoint can be used to check when the server instance if ready to accept requests once it has been started. This is especially useful if you run rolling cluster upgrades.

Status

The current Gentics Mesh server status can be checked against the GET /api/v2/status.

Status Description

STARTING

Status which indicates that the server is starting up.

WAITING_FOR_CLUSTER

Status which indicates that the server is waiting/looking for a cluster to join.

READY

Status which indicates that the server is operating normally.

SHUTTING_DOWN

Status which indicates that the instance is shutting down.

BACKUP

Status which indicates that a blocking backup is currently running.

RESTORE

Status which indicates that a blocking restore operation is currently running.

Cluster Status

The GET /api/v2/cluster/status endpoint returns the status of all cluster nodes to which the queries instance has establishes a connection.

Version Info

The version information can be retrieved via the GET /api/v2/versions endpoint.

{
  "meshVersion" : "0.30.2",
  "meshNodeName" : "Singing Chandelure",
  "databaseVendor" : "orientdb",
  "databaseVersion" : "3.0.16 - Veloce (build cc3c25ebc7d48a0db1073a0469cee2dd2947f4df, branch 3.0.x)",
  "searchVendor" : "elasticsearch",
  "searchVersion" : "6.1.2",
  "vertxVersion" : "3.5.4",
  "databaseRevision" : "0be9a986"
}

Prometheus Metrics

Gentics Mesh server exposes Prometheus compatible data on the /api/v2/metrics endpoint.

The Prometheus server can scrape metric data from this endpoint. Using Grafana in combination with Prometheus is a typical usecase to display metric data.

Prometheus

Example Prometheus configuration:

prometheus.yml
…
scrape_configs:
  - job_name: 'mesh'
    scrape_interval: 30s
    metrics_path: '/api/v2/metrics'
    static_configs:
        - targets: ['mesh:8081']
…

Metrics

Gentics Mesh exposes the following metrics in addition to the default Vert.x metrics. More metrics will be added over time.

<cache> is one of permission, projectbranchname, projectname, webroot.

Key Description

mesh_tx_created

Meter which measures the rate of created transactions over time.

mesh_notx_created

Meter which measures the rate of created noTx transactions over time.

mesh_graph_element_reload

Meter which tracks the reload operations on used vertices.

mesh_tx_time

Timer which tracks transaction durations.

mesh_tx_retry

Amount of transaction retries which happen if a conflict has been encountered.

tx_interrupt

Amount of commit interrupts.

commit_time

Timer which tracks commit durations.

mesh_node_migration_pending

Pending contents which need to be processed by the node migration.

mesh_cache_<cache>_hit

Amount of cache hits.

mesh_cache_<cache>_miss

Amount of cache misses.

mesh_cache_<cache>_clear_all

Amount of invalidations of the whole cache.

mesh_cache_<cache>_clear_single

Amount of invalidations for a single entry in the cache.

mesh_write_lock_waiting_time

Tracks the time which is spent waiting on the write lock.

mesh_write_lock_timeout

Amount of timeouts of acquiring the write lock.

mesh_topology_lock_waiting_time

Tracks the time which is spent waiting on the write lock.

mesh_topology_lock_timeout

Amount of timeouts of acquiring the write lock.

graphql_time

Timer which tracks duration of graphql requests.

mesh_storage_disk_total

Total disk size in bytes for the storage.

mesh_storage_disk_usable

Usable (free) disk size in bytes for the storage.

Clients

The Monitoring Java Client can be used to interact with the endpoints using Java.