Configuring agents for lineage metadata import
Configure Manta agents in the same location or network segment as the external system to extract lineage metadata from these systems and visualize this data on a lineage graph.
Overview
In most cases, you can access many data sources directly from Cloud Pak for Data as a Service. However, it is not always possible or optimal. You can then use Manta agents, which you install in the same location or network segment as the external system from which you want to extract metadata for lineage analysis. The most common use cases are:
- It is not possible to connect to an on-premises data source.
- You connect to a data source that requires specific third-party tools or libraries and you can't or don't want to install these tools or libraries on Cloud Pak for Data as a Service.
- Your data centers are distributed in many geographical locations and you want to avoid delays of data transfer (network latency).
The following list summarizes the steps that are required to import lineage metadata by using the Manta agents:
- Download the Manta agent executable files and save them in the target location. These files are compressed in a .zip file. Extract the file.
- Register a new agent instance in Manta Data Lineage, and save the configuration file.
- Copy the agent instance configuration file to the target location and start the agent.
- When you create a metadata import, select the agent from the list.
Each instance of a data source might require an individual agent instance, depending on the access settings. For example, if you have three instances of IBM Cognos Analytics, you might need to register three agent instances, and configure them independently on each Cognos Analytics instance. Provide meaningful names for the agent instances to know to which data source instance the agent is connected.
Supported data sources
You can use the agents with the following data source:
- IBM Cognos Analytics. When you create a metadata import, using agents is the only way to connect to Cognos Analytics. You select agents in the Connection mode option when you create a metadata import.
- Microsoft Azure Databricks
Agent status
The agent can have the following statuses:
- Online: The agent is configured and connected. It is ready to be used.
- Offline: The agent is configured but is not connected at the moment.
- Registered: The agent is registered but needs to be configured on the external system. For more information, see Configuring the agent on the external system.
Prerequisites
On the external system, create a dedicated operating system user account to run the agent. The agent executable files and agent configuration file are stored on this user account. Use Java Runtime Environment (JRE) version 21 or higher.
It is important to create a dedicated user account on the external system to ensure the security of the data. The agent configuration file contains confidential information that includes a username and API key. This file must be always protected. On the external system only authorized users can access it. With a dedicated user account for running the agent on the external system, the confidential data is secure. Also, even when data is compromised, the impact is limited to one agent instance only.
Downloading Manta Agent executable files
Download the Manta Agent executable files from the Fix Central website.
Extract the .zip file in a location where the executable files are allowed. For example, it can be /usr/local/bin/manta-agent
on the Linux operating system, and C:/manta-agent
on the Windows operating system.
Make sure that you install the latest agent version. For information about how to upgrade the current agent installation, see Updating agent version.
Registering an agent in Manta Data Lineage
To register a new agent, complete these steps in Manta Data Lineage:
- Go to Administration > Configurations and settings > Data lineage setup.
- On the Manage agents tab, click New agent.
- If you already have the Manta agent file on your external system, go to the next step. If not, download it and extract it on the external system.
- Define the name for the agent instance. It cannot contain spaces.
- Click Register.
- Download the configuration file. You will use it to finish configuring the agent on the external system.
At this point, the agent status is Registered.
Configuring the agent on the external system
To finish the configuration of the agent on the external system, complete these steps:
- Copy the agent configuration file to the same location where you extracted the agent executable files.
- Run the starting script, which is
run.sh
orrun.bat
, depending on your operating system. The script is in thebin
folder.
At this point, the agent status is Online. It is ready to be used in the metadata import. For more information, see Creating metadata imports.
When the agent is run for the first time, the data
folder is created in the location where you extracted the .zip file. The data
folder contains log files for the agent, where you can find the agent's status updates
and information about ongoing extraction jobs.
In the bin
folder, you can find the README.md
file with useful information about the agent.
Updating agent version
From time to time, you must update the agent version to the latest version. When the current agent version is outdated, the agent is not started and the log files contain an error message that you must install the latest version.
To update the agent, complete these steps:
- Download the latest agent version from the Fix Central website.
- Save the agent files in another destination than the previous agent version and extract the new agent files.
- Stop the previous agent version by running the
shutdown.sh
orshutdown.bat
scripts, depending on your operating system. - Create a backup copy of the previous agent
config.json
configuration file, and save it in the new agent folder. Do not move thedata
folder to the new location. - Delete the entire folder with the previous agent files.
- Start the new agent by running the
run.sh
orrun.bat
scripts, depending on your operating system. - Go to Data > Data lineage > Data lineage setup > Manage agents and verify that the status of the new agent is Online.
The agent is updated. You do not need to modify the API key.
Regenerating API key
In some cases, you might need to regenerate the API key for an agent. For example, when the agent configuration file is lost. In this case, the API key of the associated Service ID must be regenerated and a new configuration file created.
To regenerate API key, complete these steps:
- Go to Data > Data lineage > Data lineage setup.
- On the Manage agents tab, find the agent that you want to update and click it to display the details panel.
- Click Regenerate API key.
- Download the new configuration file.
- On the external system, replace the old configuration file with the new one.
- Restart the agent by using the
shutdown.sh
orshutdown.bat
, andrun.sh
orrun.bat
scripts, depending on your operating system.
The old API key is automatically removed.
Removing an agent
To remove an agent, complete these steps, in any order:
- On the Manage agents tab in Cloud Pak for Data as a Service, find the agent, open the details panel, and click Delete agent.
- On the external system, stop the agent by using the
shutdown.sh
orshutdown.bat
script, and delete the files that you extracted from the .zip file and the configuration file for the agent.
Configuring agent settings in the setenv
scripts
You can configure the following settings for each agent installation:
Memory settings
The AGENT_JVM_OPTS
property controls the Java virtual machine settings for the agent, primarily memory allocation.
Example values:
- Linux or macOS operating systems:
export AGENT_JVM_OPTS="-Xms1g -Xmx4g -XX:+UseG1GC"
- Windows operating system:
set "AGENT_JVM_OPTS=-Xms1g -Xmx4g -XX:+UseG1GC"
You can adjust the following parameters for the AGENT_JVM_OPTS
property:
-Xms
: This parameter sets the initial Java heap size. For example, you can set it to1g
, which means 1 gigabyte.-Xmx
: This parameter sets the maximum Java heap size. If the agent processes large data sources or out of memory errors occur when the agent is run, you might increase the value for this parameter, for example to-Xmx8g
or-Xmx16g
. Monitor the agent's memory consumption to find an optimal value.-XX:+UseG1GC
: This parameter selects the G1 (Garbage-First) garbage collector, which can provide better performance for applications with larger heap sizes.
Agent extractor memory
The LINEAGE_AGENT_EXTRACTOR_MEMORY
property specifies the maximum memory (in megabytes) that the extractor part of the agent can use.
Example values:
- Linux or macOS operating systems:
export LINEAGE_AGENT_EXTRACTOR_MEMORY=4096
- Windows operating system:
set "LINEAGE_AGENT_EXTRACTOR_MEMORY=4096"
If the default value is not set, it might be derived from the system memory or a pre-configured internal default. If agent extracts large or complex data sources, and out of memory errors occur, you might increase the value to 8192
for 8 GB, or 16384
for 16 GB. When you adjust the value, check how much memory is allocated to the main agent by using the AGENT_JVM_OPTS
, and do not set a value that is higher than the total system memory.
Agent dictionary batch size
The LINEAGE_AGENT_DICTIONARY_BATCH_SIZE
property specifies how many dictionary entries are sent to the central service in a single batch.
Example values:
- Linux or macOS operating systems:
export LINEAGE_AGENT_DICTIONARY_BATCH_SIZE=1000
- Windows operating system:
set "LINEAGE_AGENT_DICTIONARY_BATCH_SIZE=1000"
The default value is around 500 or 1000. You might increase the value to 2000 or 5000 when you populate large dictionaries and when there is a network latency between the agent and the server. If the memory consumption is too high, you might set a lower value than the default value.
Logging level
The LOGGING_LEVEL_COM_IBM_WDP_DATALINEAGE
property adjusts the verbosity of the agent's logs, specifically for lineage-related components.
Example values:
- Linux or macOS operating systems:
export LOGGING_LEVEL_COM_IBM_WDP_DATALINEAGE=DEBUG
- Windows operating system:
set "LOGGING_LEVEL_COM_IBM_WDP_DATALINEAGE=DEBUG"
You can set this property to one of these values: INFO
(default), DEBUG
, WARN
, ERROR
. When you investigate issues or work on issues with IBM support, set this property to DEBUG
.
In most cases, the default value INFO
is sufficient.
Procedure
To modify these settings, complete these steps:
- In the agent installation folder, go to the
bin
folder, and open thesetenv
script for editing. Depending on your operating system, the script issetenv.sh
orsetenv.bat
. - Uncomment the property that you want to modify, and provide your custom values.
- Save your changes.
- Start the new agent by running the
run.sh
orrun.bat
scripts, depending on your operating system.
Learn more
Parent topic: Data lineage in Manta Data Lineage