Data Collection
Audit Cookbook + Chef InSpec + Chef Automate 2 Versions Support Matrix
Refer to the following Supported Versions list to confirm a full set of working versions for your Chef Infra Client, Audit cookbook, Chef InSpec, and Chef Automate. When these do not match up, ingestion problems can occur because the messages will not show up in the expected format.
Node Run and Audit Data Collection
Nodes can send their run data to Chef Automate. There are two steps to getting data collection running in Chef Automate:
You must first have an API token. You have two options:
- Create a new API token and add the API token to the Ingest policy, preferably at time of creation.
- Or you can use your existing data collector token if you are migrating from Chef Automate 1.
Once you have an API token, you can either:
- Configure your Chef Infra Server to point to Chef Automate. If you are using Chef Infra Server, this is the recommended method of sending data to Chef Automate.
- Or, you can have Chef Infra Client send the data directly to Chef Automate.
Set Up an Existing Chef Automate 1 Data Collector Token in Chef Automate 2
Porting the Existing Chef Automate 1 Data Collector Token to Chef Automate 2
If you are migrating from Chef Automate 1, you probably have already deployed a data collector token on either your Chef Infra Servers or your Chef Infra Clients. To re-use your existing data collector token from your Chef Automate 1 installation, you need to perform the configuration change outlined here.
For this process, you need the existing token (let’s call it A1_DC_TOKEN
), and access to the machine running the chef-automate
CLI client.
Create a file (in this example, data-collector-token.toml
) containing your existing token:
[auth_n.v1.sys.service]
a1_data_collector_token = "<A1_DC_TOKEN>"
Now apply that configuration to your Chef Automate 2 deployment:
# chef-automate config patch data-collector-token.toml
[...output omitted...]
Success: Configuration patched
The system will notice that configuration change after a short interval.
From that point on, requests using the x-data-collector-token: <A1_DC_TOKEN>
header will be accepted.
When logged in with admin permissions, you will also find your added token in
https://automate.example.com/admin/tokens
, under the name
Legacy data collector token ported from A1
Now that you have a valid API token, you’ll need to update your Chef Infra Server data collector configuration if you are using a Chef Infra Server. Otherwise, you must configure your Chef Infra Clients to send data directly to Chef Automate.
Configure your Chef Infra Server to Send Data to Chef Automate
Note
In addition to forwarding Chef run data to Chef Automate, Chef Infra Server will send messages to Chef Automate whenever an action is taken on a Chef Infra Server object, such as when a cookbook is uploaded to the Chef Infra Server or when a user edits a role.
In order to have Chef Infra Server send run data from connected Chef Infra Clients, set the data collection proxy attribute to true
.
Setting Up Data Collection on Chef Infra Server Deployed with the Chef Automate Installer
This step sets up data collection from a standalone Chef Infra Server deployed with the Chef Automate Installer to a separate Chef Automate server.
Open the config.toml
file and include the external automate configuration settings:
[global.v1.external.automate]
enable = true
node = "https://<automate server url>"
[global.v1.external.automate.auth]
token = "<data-collector token>"
[global.v1.external.automate.ssl]
server_name = "<server name from the automate server ssl cert>"
root_cert = """<pem format root CA cert>
"""
[auth_n.v1.sys.service]
a1_data_collector_token = "<data-collector token>"
[erchef.v1.sys.data_collector]
enabled = true
Then, run chef-automate config patch config.toml
.
Setting Up Data Collection on Chef Infra Server Versions 12.14 and Higher
Instead of setting the token directly in /etc/opscode/chef-server.rb
as was done in older versions of the Chef Infra Server, we’ll use the set-secret
command, so that your API token does not live in
plaintext in a file:
sudo chef-server-ctl set-secret data_collector token '<API_TOKEN_FROM_STEP_1>'
sudo chef-server-ctl restart nginx
sudo chef-server-ctl restart opscode-erchef
Next, configure the Chef Infra Server for data collection forwarding by adding the following setting to
/etc/opscode/chef-server.rb
:
data_collector['root_url'] = 'https://automate.example.com/data-collector/v0/'
# Add for Chef Infra Client run forwarding
data_collector['proxy'] = true
# Add for compliance scanning
profiles['root_url'] = 'https://automate.example.com'
# Save and close the file
To apply the changes, run:
sudo chef-server-ctl reconfigure
Setting Up Chef Infra Client to Send Compliance Scan Data Through the Chef Infra Server to Chef Automate
Now that the Chef Infra Server is configured for data collection, you can also enable Compliance Scanning
on your Chef Infra Clients via the Audit Cookbook.
- Set the following attributes for the audit cookbook:
default['audit']['reporter'] = 'chef-server-automate'
default['audit']['fetcher'] = 'chef-server'
default['audit']['profiles'].push(
'name': 'cis-centos7-level2',
'compliance': 'user-name/cis-centos7-level2' # in the ui for automate, this value is the identifier for the profile
)
default['audit']['interval'] = {
'enabled': true
'time': 1440 # once a day, the default value
}
Now, any node with audit::default
its runlist will fetch and report data to and from Chef Automate
via the Chef Infra Server. Please see the audit cookbook for an
exhaustive list of configuration options.
Additional Chef Infra Server Data Collection Configuration Options
Option | Description | Default |
---|---|---|
data_collector['proxy'] | If set to true, Chef Infra Server will proxy all requests sent to /data-collector to the configured Chef Automate data_collector['root_url'] . Note that this route does not check the request signature and add the right data_collector token, but just proxies the Chef Automate endpoint as-is. | Default: nil |
data_collector['timeout'] | Timeout in milliseconds to abort an attempt to send a message to the Chef Automate server. | Default: 30000 |
data_collector['http_init_count'] | Number of Chef Automate HTTP workers Chef Infra Server should start. | Default: 25 |
data_collector['http_max_count'] | Maximum number of Chef Automate HTTP workers Chef Infra Server should allow to exist at any time. | Default: 100 |
data_collector['http_max_age'] | Maximum age a Chef Automate HTTP worker should be allowed to live, specified as an Erlang tuple. | Default: {70, sec} |
data_collector['http_cull_interval'] | How often Chef Infra Server should cull aged-out Chef Automate HTTP workers that have exceeded their http_max_age , specified as an Erlang tuple. | Default: {1, min} |
data_collector['http_max_connection_duration'] | Maximum duration an HTTP connection is allowed to exist before it is terminated, specified as an Erlang tuple. | Default: {70, sec} |
Configure your Chef Infra Client to Send Data to Chef Automate without Chef Infra Server
If you do not use a Chef Infra Server in your environment (if you only use chef-solo
, for example), you
can configure your Chef Infra Clients to send their run data to Chef Automate directly by performing the following:
Add Chef Automate SSL certificate to
trusted_certs
directory.Configure Chef Infra Client to use the Data Collector endpoint and API token in Chef Automate.
Add Chef Automate certificate to trusted_certs
directory
Note: This step only applies to self-signed SSL certificates. If you are using an SSL certificate signed by a valid certificate authority, you may skip this step.
Chef requires that the self-signed Chef Automate SSL certificate
(HOSTNAME.crt
) is located in the /etc/chef/trusted_certs
directory
on any node that wants to send data to Chef Automate. This directory is
the location into which SSL certificates are placed when a node has been
bootstrapped with chef-client.
To fetch the certificate onto your workstation, use knife ssl fetch
and pass in the URL of the Chef Automate server. You can then use
utilities such as scp
or rsync
to copy the downloaded cert files
from your .chef/trusted_certs
directory to the
/etc/chef/trusted_certs
directory on the nodes in your infrastructure
that will be sending data directly to the Chef Automate
server.
Configure Chef Infra Client to Use the Data Collector Endpoint in Chef Automate
Warning
The data collector functionality is used by the Chef Infra Client to send node
and converge data to Chef Automate. This feature works for Chef Infra Client, as well as both the default
and legacy modes of chef-solo
.
To send node, converge, and compliance data to Chef Automate, modify
your Chef config (that is client.rb
, solo.rb
, or add an additional
config file in an appropriate directory, such as client.d
) to contain
the following configuration:
data_collector.server_url "https://automate.example.com/data-collector/v0/"
data_collector.token '<API_TOKEN_FROM_STEP_1>'
Setting Up Chef Infra Client to Send Compliance Scan Data Directly to Chef Automate
Now that the Chef Infra Client is configured for data collection, you can also enable Compliance Scanning on via the Audit Cookbook.
- Set the following attributes for the audit cookbook:
default['audit']['reporter'] = 'chef-automate'
default['audit']['fetcher'] = 'chef-automate'
default['audit']['token'] = '<API_TOKEN_FROM_STEP_1>'
default['audit']['profiles'].push(
'name': 'cis-centos7-level2',
'compliance': 'user-name/cis-centos7-level2' # in the ui for automate, this value is the identifier for the profile
)
default['audit']['interval'] = {
'enabled': true
'time': 1440 # once a day, the default value
}
Now, any node with audit::default
its runlist will fetch and report data directly to and from
Chef Automate. Please see the audit cookbook for an
exhaustive list of configuration options.
Additional Chef Infra Client Data Collection Configuration Options
Configuration | Description | Options | Default |
---|---|---|---|
data_collector.mode | The mode in which the data collector is allowed to operate. This can be used to run data collector only when running as Chef solo but not when using Chef Infra Client. | :solo , :client , or :both | :both |
data_collector.raise_on_failure | When the data collector cannot send the “starting a run” message to the data collector server, the data collector will be disabled for that run. In some situations, such as highly-regulated environments, it may be more reasonable to Prevents data collection when the data collector cannot send the “starting a run” message to the data collector server. In these situations, setting this value to true will cause the Chef run to raise an exception before starting any converge activities. | true , false | false |
data_collector.organization | A user-supplied organization string that can be sent in payloads generated by the data collector when Chef is run in Solo mode. This allows users to associate their Solo nodes with faux organizations without the nodes being connected to an actual Chef Infra Server. | string | none |
Performance Testing of Compliance Data Ingestion
The following performance numbers are benchmarked on a machine with
- 4 vCPUs
- 16 GB of RAM
Compliance Report Size | Concurrency | Max CPU Utilisation | Max Memory Utilisation |
---|---|---|---|
3MB | 100 | 79% | 76% |
If you have a higher requirement of concurrency, please deploy Automate in HA mode.
Refer Automate HA
Troubleshooting
My Data Does Not Show Up in the User Interface
Organizations without associated nodes will not show up on the Chef Automate Nodes page. A node is not associated with Automate until a Chef Infra Client run has completed. This is also true for roles, cookbooks, recipes, attributes, resources, node names, and environments but does not highlight them in the UI. This is designed to keep the UI focused on the nodes in your cluster.