Troubleshooting

Index issues

Indices with bad default mappings

Status of the root cause fix for this can be found at: https://chef-software.ideas.aha.io/ideas/AUTO-I-91

A complaint in the journalctl -u chef-automate output that has to do with bad default index mappings may look like

ingest-service.default(O): time="2022-03-03T00:32:40Z" level=error msg="Failed initializing elasticsearch" error="Error creating index node-1-run-info with error: elastic: Error 400 (Bad Request): mapper [node_uuid] of different type, current_type [text], merged_type [keyword] [type=illegal_argument_exception]"

As a result, ingest-service can never properly start up, which also breaks automate-cs-nginx and automate-cs-oc-erchef processes, as they need to connect to port 10122, where ingest-service would be listening if it were not restarting continuously.

The list of INDEX_NAME given here can be used in the following command sequence to rebuild whichever index has bad mappings

node-1-run-info
converge-history-DATE-STAMP
node-attribute
node-state-7

First, stop traffic coming in to the Automate system. You can turn off your Chef Server, for example. If you are running a combined Automate and Chef Server system, use chef-server-ctl maintenance on Choose a method that is comfortable for you.

Then, perform the deletion with the following commands, remembering to substitute the desired INDEX_NAME from the list above.

chef-automate dev stop-converge
hab svc unload chef/ingest-service
curl -XDELETE localhost:10141/INDEX_NAME
chef-automate dev start-converge

Afterwards, the ingest-service will start back up. If you continue to see mapping errors, it may be best to contact Support to get a better idea what is going on.

chef-automate CLI Errors

Error: Unable to make a request to the deployment-service

The chef-automate CLI emits this error when the CLI is unable to communicate with a Chef Automate deployment. In particular, when Chef Automate 2 (as distinct from Chef Automate 1) is not deployed, running chef-automate CLI commands such as version or status causes this error.

File exists (os error 17)

It’s possible for the following error to be emitted when deploying Chef Automate:

DeploymentServiceCallError: A request to the deployment-service failed: Request to configure deployment failed: rpc error: code = Unknown desc = failed to binlink command "chef-automate" in pkg "chef/automate-cli/0.1.0/20181212085335" - hab output: >> Binlinking chef-automate from chef/automate-cli/0.1.0/20181212085335 into /bin
xxx
xxx File exists (os error 17)
xxx
: exit status 1

This problem can be fixed by removing the chef-automate binary from the /bin directory. The binary should not be placed in the PATH manually as the deployment process will do it.

Compliance Report Display

If the size of a Compliance Report is over 4MB, then the Reports page (Compliance > Reports) may not display as expected. Audit Cookbook 9.4.0 and later supports some attribute options that trims a report to its smallest size when combined with latest Chef Automate version. Contact Chef Support to determine the best way to manage your Compliance Report size.

Low Disk Space

Chef Automate emits a warning when the available disk space on the system drops below 1 GB, for example:

es-sidecar-service.default(O): time="2018-05-16T00:07:16Z" level=error msg="Disk free below critical threshold" avail_bytes=43368448 host=127.0.0.1 mount="/ (overlay)" threshold_bytes=536870912 total_bytes=31361703936

Recovering from Low Disk Conditions

Chef Automate disables disk writes if available disk space drops below 250 MB and logs a message similar to:

ingest-service.default(O): time="2018-05-16T00:10:09Z" level=error msg="Message failure" error="rpc error: code = Internal desc = elastic: Error 403 (Forbidden): blocked by: [FORBIDDEN/12/index read-only / allow delete (api)]; [type=cluster_block_exception] elastic: Error 403 (Forbidden): blocked by: [FORBIDDEN/12/index read-only / allow delete (api)]; [type=cluster_block_exception]"

After freeing up disk space, you will need to remove the write block on the OpenSearch indices by running:

curl -X PUT "localhost:10141/_all/_settings" -H 'Content-Type: application/json' -d'
{
  "index.blocks.read_only_allow_delete": null
}
'

To confirm that you’ve successfully removed the blocks, run:

curl 'localhost:10141/_all/_settings'

Verify that the output does not contain "blocks":{"read_only_allow_delete":"true"}.

Uninstalling Chef Automate

The following procedure will remove Chef Automate from your system, including all data. If you wish to preserve the data, make a backup before uninstalling.

With the chef-automate CLI:

chef-automate uninstall

Resetting the Admin Password

Use the following command to completely reset a lost, forgotten, or compromised admin password:

chef-automate iam admin-access restore NEW_PASSWORD

This command causes Automate to inspect your A2 IAM resource and initiates a series of steps for applying the new password to the “admin” user–in effect reconstituting the admin–and connecting it with full administrative permissions.

The process Automate follows for resetting the admin password may include: recreating the user record, recreating the “admins” local team, and recreating the default policy that grants access to all resources for the newly reconstituted local admin team.

To see what exactly will happen in your system, pass --dry-run:

chef-automate iam admin-access restore NEW_PASSWORD --dry-run

Issue: Increase in Data collector API failure

Details

Possible reason of failures in data collector API could be due to change in the use case we are running.

This can be due to the following reasons:

Increase in frequency of scan
Number of controls have changed
Increase in number of nodes

Possible fixes

Changing heap size. Heap size should not be more than 70% of the RAM
Upgrading machine to improve performance

Issue: Maximum Shards Open

Details

The max shards are the number of shards that can be patched for running the data lifecycle to avoid overloading of shards.

Fixes

The error for the shards occurs when the limit of the data injection to the OpenSearch drains. For example, the shards requirement is 1025 whereas the default value of max shards is 1000. This is a performance issue from OpenSearch which can be fixed by setting the value of max shards per node.

Validation Failed: 1: this action would add [10] total shards, but this cluster currently has [1997]/[2000] maximum shards open; [type=validation_exception]

To set the value of max shards per node, patch the following configuration in the .toml file.

[opensearch.v1.sys.cluster]
max_shards_per_node = 1000

Once done, run the chef-automate config patch </path/to/your-file.toml> to deploy your change.

Troubleshooting

Index issues

Indices with bad default mappings

chef-automate CLI Errors

Error: Unable to make a request to the deployment-service

File exists (os error 17)

Compliance Report Display

Low Disk Space

Recovering from Low Disk Conditions

Uninstalling Chef Automate

Resetting the Admin Password

Issue: Increase in Data collector API failure

Details

Possible fixes

Issue: Maximum Shards Open

Details

Fixes

Chef Product

Search Results