0 / 0
Glossary
Last updated: Dec 05, 2024
Glossary

This glossary provides terms and definitions for watsonx.ai and watsonx.governance.

The following cross-references are used in this glossary:

  • See refers you from a nonpreferred term to the preferred term or from an abbreviation to the spelled-out form.
  • See also refers you to a related or contrasting term.

A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | R | S | T | U | V | W | Z

A

accelerator

In high-performance computing, a specialized circuit that is used to take some of the computational load from the CPU, increasing the efficiency of the system. For example, in deep learning, GPU-accelerated computing is often employed to offload part of the compute workload to a GPU while the main application runs off the CPU. See also graphics processing unit.

accountability

The expectation that organizations or individuals will ensure the proper functioning, throughout their lifecycle, of the AI systems that they design, develop, operate or deploy, in accordance with their roles and applicable regulatory frameworks. This includes determining who is responsible for an AI mistake which may require legal experts to determine liability on a case-by-case basis.

activation function

A function defining a neural unit's output given a set of incoming activations from other neurons

active learning

A model for machine learning in which the system requests more labeled data only when it needs it.

active metadata

Metadata that is automatically updated based on analysis by machine learning processes. For example, profiling and data quality analysis automatically update metadata for data assets.

active runtime

An instance of an environment that is running to provide compute resources to assets that run code.

agent

An algorithm or a program that interacts with an environment to learn optimal actions or decisions, typically using reinforcement learning, to achieve a specific goal.

agentic AI

A generative AI flow that can decompose a prompt into multiple tasks, assign tasks to appropriate gen AI agents, and synthesize an answer without human intervention.

AI

See artificial intelligence.

AI accelerator

Specialized silicon hardware designed to efficiently execute AI-related tasks like deep learning, machine learning, and neural networks for faster, energy-efficient computing. It can be a dedicated unit in a core, a separate chiplet on a multi-module chip or a separate card.

AI ethics

A multidisciplinary field that studies how to optimize AI's beneficial impact while reducing risks and adverse outcomes. Examples of AI ethics issues are data responsibility and privacy, fairness, explainability, robustness, transparency, environmental sustainability, inclusion, moral agency, value alignment, accountability, trust, and technology misuse.

AI governance

An organization's act of governing, through its corporate instructions, staff, processes and systems to direct, evaluate, monitor, and take corrective action throughout the AI lifecycle, to provide assurance that the AI system is operating as the organization intends, as its stakeholders expect, and as required by relevant regulation.

AI safety

The field of research aiming to ensure artificial intelligence systems operate in a manner that is beneficial to humanity and don't inadvertently cause harm, addressing issues like reliability, fairness, transparency, and alignment of AI systems with human values.

AI service

A deployable unit of code that contains the logic of a generative AI use case and provides an endpoint for inferencing from an application.

AI system

See artificial intelligence system.

algorithm

A formula applied to data to determine optimal ways to solve analytical problems.

analytics

The science of studying data in order to find meaningful patterns in the data and draw conclusions based on those patterns.

appropriate trust

In an AI system, an amount of trust that is calibrated to its accuracy, reliability, and credibility.

artificial intelligence (AI)

The capability to acquire, process, create and apply knowledge in the form of a model to make predictions, recommendations or decisions.

artificial intelligence system (AI system)

A system that can make predictions, recommendations or decisions that influence physical or virtual environments, and whose outputs or behaviors are not necessarily pre-determined by its developer or user. AI systems are typically trained with large quantities of structured or unstructured data, and might be designed to operate with varying levels of autonomy or none, to achieve human-defined objectives.

asset

An item that contains information about data, other valuable information, or code that works with data. See also data asset.

attention mechanism

A mechanism in deep learning models that determines which parts of the input a model focuses on when producing output.

AutoAI experiment

An automated training process that considers a series of training definitions and parameters to create a set of ranked pipelines as model candidates.

B

batch deployment

A method to deploy models that processes input data from a file, data connection, or connected data in a storage bucket, then writes the output to a selected destination.

bias

Systematic error in an AI system that has been designed, intentionally or not, in a way that may generate unfair decisions. Bias can be present both in the AI system and in the data used to train and test it. AI bias can emerge in an AI system as a result of cultural expectations; technical limitations; or unanticipated deployment contexts. See also fairness.

bias detection

The process of calculating fairness to metrics to detect when AI models are delivering unfair outcomes based on certain attributes.

bias mitigation

Reducing biases in AI models by curating training data and applying fairness techniques.

binary classification

A classification model with two classes. Predictions are a binary choice of one of the two classes.

C

classification model

A predictive model that predicts data in distinct categories. Classifications can be binary, with two classes of data, or multi-class when there are more than 2 categories.

cleanse

To ensure that all values in a data set are consistent and correctly recorded.

CNN

See convolutional neural network.

cognitive forcing function

An intervention that is applied at a decision-making moment to disrupt heuristic reasoning and cause a person to engage in analytical thinking; examples include a checklist, a diagnostic time-out, or asking a person to rule out an alternative.

computational linguistics

Interdisciplinary field that explores approaches for computationally modeling natural languages.

compute resource

The hardware and software resources that are defined by an environment template to run assets in tools.

confusion matrix

A performance measurement that determines the accuracy between a model's positive and negative predicted outcomes compared to positive and negative actual outcomes.

connected data asset

A pointer to data that is accessed through a connection to an external data source.

connected folder asset

A pointer to a folder in IBM Cloud Object Storage.

connection

The information required to connect to a database. The actual information that is required varies according to the DBMS and connection method.

connection asset

An asset that contains information that enables connecting to a data source.

constraint

  • In databases, a relationship between tables.
  • In Decision Optimization, a condition that must be satisfied by the solution of a problem.

continuous learning

Automating the tasks of monitoring model performance, retraining with new data, and redeploying to ensure prediction quality.

convolutional neural network (CNN)

A class of neural network commonly used in computer vision tasks that uses convolutional layers to process image data.

Core ML deployment

The process of downloading a deployment in Core ML format for use in iOS apps.

corpus

A collection of source documents that are used to train a machine learning model.

CPLEX model

A Decision Optimization model that is formulated to be solved by the CPLEX engine.

CPO model

A constraint programming model that is formulated to be solved by the Decision Optimization CP Optimizer (CPO) engine.

cross-validation

A technique for testing how well a model generalizes in the absence of a hold-out test sample. Cross-validation divides the training data into a number of subsets, and then builds the same number of models, with each subset held out in turn. Each of those models is tested on the holdout sample, and the average accuracy of the models on those holdout samples is used to estimate the accuracy of the model when applied to new data.

curate

To select, collect, preserve, and maintain content relevant to a specific topic. Curation establishes, maintains, and adds value to data; it transforms data into trusted information and knowledge.

D

data asset

An asset that points to data, for example, to an uploaded file. Connections and connected data assets are also considered data assets. See also asset.

data imputation

The substitution of missing values in a data set with estimated or explicit values.

data lake

A large-scale data storage repository that stores raw data in any format in a flat architecture. Data lakes hold structured and unstructured data as well as binary data for the purpose of processing and analysis.

data lakehouse

A unified data storage and processing architecture that combines the flexibility of a data lake with the structured querying and performance optimizations of a data warehouse, enabling scalable and efficient data analysis for AI and analytics applications.

data mining

The process of collecting critical business information from a data source, correlating the information, and uncovering associations, patterns, and trends. See also predictive analytics.

Data Refinery flow

A set of steps that cleanse and shape data to produce a new data asset.

data science

The analysis and visualization of structured and unstructured data to discover insights and knowledge.

data set

A collection of data, usually in the form of rows (records) and columns (fields) and contained in a file or database table.

data source

A repository, queue, or feed for reading data, such as a database.

data table

A collection of data, usually in the form of rows (records) and columns (fields) and contained in a table.

data warehouse

A large, centralized repository of data collected from various sources that is used for reporting and data analysis. It primarily stores structured and semi-structured data, enabling businesses to make informed decisions.

DDL

See distributed deep learning.

decision boundary

A division of data points in a space into distinct groups or classifications.

decoder-only model

A model that generates output text word by word by inference from the input sequence. Decoder-only models are used for tasks such as generating text and answering questions.

deep learning

A computational model that uses multiple layers of interconnected nodes, which are organized into hierarchical layers, to transform input data (first layer) through a series of computations to produce an output (final layer). Deep learning is inspired by the structure and function of the human brain. See also distributed deep learning.

deep neural network

A neural network with multiple hidden layers, allowing for more complex representations of the data.

deep reasoning

A class of machine learning in which systems generate insights from data to support cognitive tasks beyond perception and classification, such as common sense, changing situations, planning, and decision making.

deployment

A model or application package that is available for use.

deployment space

A workspace where models are deployed and deployments are managed.

deterministic

Describes a characteristic of computing systems when their outputs are completely determined by their inputs.

discriminative AI

A class of algorithm that focuses on finding a boundary that separates different classes in the data.

distributed deep learning (DDL)

An approach to deep learning training that leverages the methods of distributed computing. In a DDL environment, compute workload is distributed between the central processing unit and graphics processing unit. See also deep learning.

DOcplex

A Python API for modeling and solving Decision Optimization problems.

E

embedding

A numerical representation of a unit of information, such as a word or a sentence, as a vector of real-valued numbers. Embeddings are learned, low-dimensional representations of higher-dimensional data. See also encoding, representation.

emergence

A property of foundation models in which the model exhibits behaviors that were not explicitly trained.

emergent behavior

A behavior exhibited by a foundation model that was not explicitly constructed.

encoder-decoder model

A model for both understanding input text and for generating output text based on the input text. Encoder-decoder models are used for tasks such as summarization or translation.

encoder-only model

A model that understands input text at the sentence level by transforming input sequences into representational vectors called embeddings. Encoder-only models are used for tasks such as classifying customer feedback and extracting information from large documents.

encoding

The representation of a unit of information, such as a character or a word, as a set of numbers. See also embedding, positional encoding.

endpoint URL

A network destination address that identifies resources, such as services and objects. For example, an endpoint URL is used to identify the location of a model or function deployment when a user sends payload data to the deployment.

environment

The compute resources for running jobs.

environment runtime

An instantiation of the environment template to run assets.

environment template

A definition that specifies hardware and software resources to instantiate environment runtimes.

exogenous feature

A feature that can influence the predictive model but cannot be influenced in return. For example, temperatures can affect predicted ice cream sales, but ice cream sales cannot influence temperatures.

experiment

A model training process that considers a series of training definitions and parameters to determine the most accurate model configuration.

explainability

  • The ability of human users to trace, audit, and understand predictions that are made in applications that use AI systems.
  • The ability of an AI system to provide insights that humans can use to understand the causes of the system's predictions.

F

fairness

In an AI system, the equitable treatment of individuals or groups of individuals. The choice of a specific notion of equity for an AI system depends on the context in which it is used. See also bias.

feature

A property or characteristic of an item within a data set, for example, a column in a spreadsheet. In some cases, features are engineered as combinations of other features in the data set.

feature engineering

The process of selecting, transforming, and creating new features from raw data to improve the performance and predictive power of machine learning models.

feature group

A set of columns of a particular data asset along with the metadata that is used for machine learning.

feature selection

Identifying the columns of data that best support an accurate prediction or score in a machine learning model.

feature store

A centralized repository or system that manages and organizes features, providing a scalable and efficient way to store, retrieve, and share feature data across machine learning pipelines and applications.

feature transformation

In AutoAI, a phase of pipeline creation that applies algorithms to transform and optimize the training data to achieve the best outcome for the model type.

federated learning

The training of a common machine learning model that uses multiple data sources that are not moved, joined, or shared. The result is a better-trained model without compromising data security.

few-shot prompting

A prompting technique in which a small number of examples are provided to the model to demonstrate how to complete the task.

fine tuning

The process of adapting a pre-trained model to perform a specific task by conducting additional training. Fine tuning may involve (1) updating the model’s existing parameters, known as full fine tuning, or (2) updating a subset of the model’s existing parameters or adding new parameters to the model and training them while freezing the model’s existing parameters, known as parameter-efficient fine tuning.

flow

A collection of nodes that define a set of steps for processing data or training a model.

foundation model

An AI model that can be adapted to a wide range of downstream tasks. Foundation models are typically large-scale generative models that are trained on unlabeled data using self-supervision. As large scale models, foundation models can include billions of parameters.

G

Gantt chart

A graphical representation of a project timeline and duration in which schedule data is displayed as horizontal bars along a time scale.

gen AI

See generative AI.

generative AI (gen AI)

A class of AI algorithms that can produce various types of content including text, source code, imagery, audio, and synthetic data.

generative variability

The characteristic of generative models to produce varied outputs, even when the input to the model is held constant. See also probabilistic.

GPU

See graphics processing unit.

graphical builder

A tool for creating flow assets by visually coding. A canvas is an area on which to place objects or nodes that can be connected to create a flow.

graphics processing unit (GPU)

A specialized processor designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display. GPUs are heavily utilized in machine learning due to their parallel processing capabilities. See also accelerator.

grounding

Providing a large language model with information to improve the accuracy of results.

H

hallucination

A response from a foundation model that includes off-topic, repetitive, incorrect, or fabricated content. Hallucinations involving fabricating details can happen when a model is prompted to generate text, but the model doesn't have enough related text to draw upon to generate a result that contains the correct details.

HAP detection (HAP detection)

  • The ability to detect and filter hate, abuse, and profanity in both prompts submitted by users and in responses generated by an AI model.

HAP detector (HAP detector)

  • A sentence classifier that removes potentially harmful content, such as hate speech, abuse, and profanity, from foundation model output and input.

hold-out set

A set of labeled data that is intentionally withheld from both the training and validation sets, serving as an unbiased assessment of the final model's performance on unseen data.

homogenization

The trend in machine learning research in which a small number of deep neural net architectures, such as the transformer, are achieving state-of-the-art results across a wide variety of tasks.

HPO

See hyperparameter optimization.

human oversight

Human involvement in reviewing decisions rendered by an AI system, enabling human autonomy and accountability of decision.

hyperparameter

In machine learning, a parameter whose value is set before training as a way to increase model accuracy.

hyperparameter optimization (HPO)

The process for setting hyperparameter values to the settings that provide the most accurate model.

I

image

A software package that contains a set of libraries.

incremental learning

The process of training a model using data that is continually updated without forgetting data obtained from the preceding tasks. This technique is used to train a model with batches of data from a large training data source.

inferencing

The process of running live data through a trained AI model to make a prediction or solve a task.

ingest

  • To continuously add a high-volume of real-time data to a database.
  • To feed data into a system for the purpose of creating a base of knowledge.

insight

An accurate or deep understanding of something. Insights are derived using cognitive analytics to provide current snapshots and predictions of customer behaviors and attitudes.

intelligent AI

Artificial intelligence systems that can understand, learn, adapt, and implement knowledge, demonstrating abilities like decision-making, problem-solving, and understanding complex concepts, much like human intelligence.

intent

A purpose or goal expressed by customer input to a chatbot, such as answering a question or processing a bill payment.

J

job

A separately executable unit of work.

K

knowledge base

See corpus.

L

label

A class or category assigned to a data point in supervised learning.Labels can be derived from data but are often applied by human labelers or annotators.

labeled data

Raw data that is assigned labels to add context or meaning so that it can be used to train machine learning models. For example, numeric values might be labeled as zip codes or ages to provide context for model inputs and outputs.

large language model (LLM)

A language model with a large number of parameters, trained on a large quantity of text.

latent space

An n-dimensional mathematical space in which data instances are embedded. A two- dimensional latent space embeds data as points within in a 2D plane (see also: representational space).. See also representational space.

LLM

See large language model.

M

machine learning (ML)

A branch of artificial intelligence (AI) and computer science that focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving the accuracy of AI models.

machine learning framework

The libraries and runtime for training and deploying a model.

machine learning model

An AI model that is trained on a a set of data to develop algorithms that it can use to analyze and learn from new data.

mental model

An individual’s understanding of how a system works and how their actions affect system outcomes. When these expectations do not match the actual capabilities of a system, it can lead to frustration, abandonment, or misuse.

misalignment

A discrepancy between the goals or behaviors that an AI system is optimized to achieve and the true, often complex, objectives of its human users or designers

ML

See machine learning.

MLOps

  • A methodology that takes a machine learning model from development to production.
  • The practice for collaboration between data scientists and operations professionals to help manage production machine learning (or deep learning) lifecycle. MLOps looks to increase automation and improve the quality of production ML while also focusing on business and regulatory requirements. It involves model development, training, validation, deployment, monitoring, and management and uses methods like CI/CD.

model

  • In a machine learning context, a set of functions and algorithms that have been trained and tested on a data set to provide predictions or decisions.
  • In Decision Optimization, a mathematical formulation of a problem that can be solved with CPLEX optimization engines using different data sets.

ModelOps

A methodology for managing the full lifecycle of an AI model, including training, deployment, scoring, evaluation, retraining, and updating.

monitored group

A class of data that is monitored to determine if the results from a predictive model differ significantly from the results of the reference group. Groups are commonly monitored based on characteristics that include race, gender, or age.

multiclass classification model

A classification task with more than two classes. For example, where a binary classification model predicts yes or no values, a multi-class model predicts yes, no, maybe, or not applicable.

multimodal model

A generative AI model that can process multiple types of data, such as, text, images, and audio, and convert between them. For example, a multimodal model can take text input and generate image output.

multivariate time series

Time series experiment that contains two or more changing variables. For example, a time series model forecasting the electricity usage of three clients.

N

natural language processing (NLP)

A field of artificial intelligence and linguistics that studies the problems inherent in the processing and manipulation of natural language, with an aim to increase the ability of computers to understand human languages.

natural language processing library

A library that provides basic natural language processing functions for syntax analysis and out-of-the-box pre-trained models for a wide variety of text processing tasks.

neural network

A mathematical model for predicting or classifying cases by using a complex mathematical scheme that simulates an abstract version of brain cells. A neural network is trained by presenting it with a large number of observed cases, one at a time, and allowing it to update itself repeatedly until it learns the task.

NLP

See natural language processing.

node

In an SPSS Modeler flow, the graphical representation of a data operation.

notebook

An interactive document that contains executable code, descriptive text for that code, and the results of any code that is run.

notebook kernel

The part of the notebook editor that executes code and returns the computational results.

O

object storage

A method of storing data, typically used in the cloud, in which data is stored as discrete units, or objects, in a storage pool or repository that does not use a file hierarchy but that stores all objects at the same level.

one-shot learning

A model for deep learning that is based on the premise that most human learning takes place upon receiving just one or two examples. This model is similar to unsupervised learning.

one-shot prompting

A prompting technique in which a single example is provided to the model to demonstrate how to complete the task.

online deployment

Method of accessing a model or Python code deployment through an API endpoint as a web service to generate predictions online, in real time.

ontology

An explicit formal specification of the representation of the objects, concepts, and other entities that can exist in some area of interest and the relationships among them.

operational asset

An asset that runs code in a tool or a job.

optimization

The process of finding the most appropriate solution to a precisely defined problem while respecting the imposed constraints and limitations. For example, determining how to allocate resources or how to find the best elements or combinations from a large set of alternatives.

Optimization Programming Language

A modeling language for expressing model formulations of optimization problems in a format that can be solved by CPLEX optimization engines such as IBM CPLEX.

optimized metric

A metric used to measure the performance of the model. For example, accuracy is the typical metric used to measure the performance of a binary classification model.

orchestration

The process of creating an end-to-end flow that can train, run, deploy, test, and evaluate a machine learning model, and uses automation to coordinate the system, often using microservices.

overreliance

A user's acceptance of an incorrect recommendation made by an AI model. See also reliance, underreliance.

P

parameter

  • A configurable part of the model that is internal to a model and whose values are estimated or learned from data. Parameters are aspects of the model that are adjusted during the training process to help the model accurately predict the output. The model's performance and predictive power largely depend on the values of these parameters.
  • A real-valued weight between 0.0 and 1.0 indicating the strength of connection between two neurons in a neural network.

party

In Federated Learning, an entity that contributes data for training a common model. The data is not moved or combined but each party gets the benefit of the federated training.

payload

The data that is passed to a deployment to get back a score, prediction, or solution.

payload logging

The capture of payload data and deployment output to monitor ongoing health of AI in business applications.

pipeline

  • In Watson Pipelines, an end-to-end flow of assets from creation through deployment.
  • In AutoAI, a candidate model.

pipeline leaderboard

In AutoAI, a table that shows the list of automatically generated candidate models, as pipelines, ranked according to the specified criteria.

policy

A strategy or rule that an agent follows to determine the next action based on the current state.

positional encoding

An encoding of an ordered sequence of data that includes positional information, such as encoding of words in a sentence that includes each word's position within the sentence. See also encoding.

predictive analytics

A business process and a set of related technologies that are concerned with the prediction of future possibilities and trends. Predictive analytics applies such diverse disciplines as probability, statistics, machine learning, and artificial intelligence to business problems to find the best action for a specific situation. See also data mining.

pretrained model

An AI model that was previously trained on a large data set to accomplish a specific task. Pretrained models are used instead of building a model from scratch.

pretraining

The process of training a machine learning model on a large dataset before fine-tuning it for a specific task.

privacy

Assurance that information about an individual is protected from unauthorized access and inappropriate use.

probabilistic

The characteristic of being subject to randomness; non-deterministic. Probabilistic models do not produce the same outputs given the same inputs. See also generative variability.

project

A collaborative workspace for working with data and other assets.

prompt

  • Data, such as text or an image, that prepares, instructs, or conditions a foundation model's output.
  • A component of an action that indicates that user input is required for a field before making a transition to an output screen.

prompt engineering

The process of designing natural language prompts for a language model to perform a specific task.

prompting

The process of providing input to a foundation model to induce it to produce output.

prompt tuning

An efficient, low-cost way of adapting a pre-trained model to new tasks without retraining the model or updating its weights. Prompt tuning involves learning a small number of new parameters that are appended to a model’s prompt, while freezing the model’s existing parameters.

pruning

The process of simplifying, shrinking, or trimming a decision tree or neural network. This is done by removing less important nodes or layers, reducing complexity to prevent overfitting and improve model generalization while maintaining its predictive power.

Python

A programming language that is used in data science and AI.

Python function

A function that contains Python code to support a model in production.

Q

quantization

A method of compressing foundation model weights to speed up inferencing and reduce GPU memory needs.

R

R

An extensible scripting language that is used in data science and AI that offers a wide variety of analytic, statistical, and graphical functions and techniques.

RAG

See retrieval augmented generation.

random seed

A number used to initialize a pseudorandom number generator. Random seeds enable reproducibility for processes that rely on random number generation.

reference group

A group that is identified as most likely to receive a positive result in a predictive model. The results can be compared to a monitored group to look for potential bias in outcomes.

refine

To cleanse and shape data.

regression model

A model that relates a dependent variable to one or more independent variables.

reinforcement learning

A machine learning technique in which an agent learns to make sequential decisions in an environment to maximize a reward signal. Inspired by trial and error learning, agents interact with the environment, receive feedback, and adjust their actions to achieve optimal policies.

reinforcement learning on human feedback (RLHF)

A method of aligning a language learning model's responses to the instructions given in a prompt. RLHF requires human annotators rank multiple outputs from the model. These rankings are then used to train a reward model using reinforcement learning. The reward model is then used to fine-tune the large language model's output.

reliance

In AI systems, a user’s acceptance of a recommendation made by, or the output generated by, an AI model. See also overreliance, underreliance.

representation

An encoding of a unit of information, often as a vector of real-valued numbers. See also embedding.

representational space

An n-dimensional mathematical space in which data instances are embedded. A two-dimensional latent space embeds data as points within in a 2D plane (see also: latent space). See also latent space.

reranking

A generative AI process for ranking a set of document passages from most-to-least likely to answer a specified query.

retrieval augmented generation (RAG)

A technique in which a large language model is augmented with knowledge from external sources to generate text. In the retrieval step, relevant documents from an external source are identified from the user’s query. In the generation step, portions of those documents are included in the LLM prompt to generate a response grounded in the retrieved documents.

reward

A signal used to guide an agent, typically a reinforcement learning agent, that provides feedback on the goodness of a decision

RLHF

See reinforcement learning on human feedback.

runtime environment

The predefined or custom hardware and software configuration that is used to run tools or jobs, such as notebooks.

S

scoring

  • In machine learning, the process of measuring the confidence of a predicted outcome.
  • The process of computing how closely the attributes for an incoming identity match the attributes of an existing entity.

script

A file that contains Python or R scripts to support a model in production.

self-attention

An attention mechanism that uses information from the input data itself to determine what parts of the input to focus on when generating output.

self-supervised learning

A machine learning training method in which a model learns from unlabeled data by masking tokens in an input sequence and then trying to predict them. An example is "I like ________ sprouts".

sentiment analysis

Examination of the sentiment or emotion expressed in text, such as determining if a movie review is positive or negative.

shape

To customize data by filtering, sorting, removing columns; joining tables; performing operations that include calculations, data groupings, hierarchies and more.

small data

Data that is accessible and comprehensible by humans. See also structured data.

SQL pushback

In SPSS Modeler, the process of performing many data preparation and mining operations directly in the database through SQL code.

structured data

Data that resides in fixed fields within a record or file. Relational databases and spreadsheets are examples of structured data. See also unstructured data, small data.

structured information

Items stored in structured resources, such as search engine indices, databases, or knowledge bases.

supervised learning

A machine learning training method in which a model is trained on a labeled dataset to make predictions on new data.

T

temperature

A parameter in a generative model that specifies the amount of variation in the generation process. Higher temperatures result in greater variability in the model's output.

text classification

A model that automatically identifies and classifies text into specified categories.

text extraction

A generative AI method of converting highly structured information into a simpler textual format for use as input to large language models.

time series

A set of values of a variable at periodic points in time.

time series model

A model that tracks and predicts data over time.

token

A discrete unit of meaning or analysis in a text, such as a word or subword.

tokenization

The process used in natural language processing to split a string of text into smaller units, such as words or subwords.

trained model

A model that is trained with actual data and is ready to be deployed to predict outcomes when presented with new data.

training

The initial stage of model building, involving a subset of the source data. The model learns by example from the known data. The model can then be tested against a further, different subset for which the outcome is already known.

training data

A collection of data that is used to train machine learning models.

training set

A set of labeled data that is used to train a machine learning model by exposing it to examples and their corresponding labels, enabling the model to learn patterns and make predictions.

transfer learning

A machine learning strategy in which a trained model is applied to a completely new problem.

transformer

A neural network architecture that uses positional encodings and the self-attention mechanism to predict the next token in a sequence of tokens.

transparency

Sharing appropriate information with stakeholders on how an AI system has been designed and developed. Examples of this information are what data is collected, how it will be used and stored, and who has access to it; and test results for accuracy, robustness and bias.

trust calibration

The process of evaluating and adjusting one’s trust in an AI system based on factors such as its accuracy, reliability, and credibility.

Turing test

Proposed by Alan Turing in 1950, a test of a machine's ability to exhibit intelligent behavior equivalent to, or indistinguishable from, that of a human.

U

underreliance

A user's rejection of a correct recommendations made by an AI model. See also overreliance, reliance.

univariate time series

Time series experiment that contains only one changing variable. For example, a time series model forecasting the temperature has a single prediction column of the temperature.

unstructured data

Any data that is stored in an unstructured format rather than in fixed fields. Data in a word processing document is an example of unstructured data. See also structured data.

unstructured information

Data that is not contained in a fixed location, such as the natural language text document.

unsupervised learning

  • A model for deep learning that allows raw, unlabeled data to be used to train a system with little to no human effort.
  • A machine learning training method in which a model is not provided with labeled data and must find patterns or structure in the data on its own.

V

validation set

A separate set of labeled data that is used to evaluate the performance and generalization ability of a machine learning model during the training process, assisting in hyperparameter tuning and model selection.

vector

A one-dimensional, ordered list of numbers, such as [1, 2, 5] or [0.7, 0.2, -1.0].

vector database

See vector store.

vector index

An index that retrieves the vectorized embeddings of documents from a vector store.

vector store

A repository that stores vectorized embeddings of documents.

verbalizer

In generative AI, a template to format the data during tuning and inferencing.

virtual agent

A pretrained chat bot that can process natural language to respond and complete simple business transactions, or route more complicated requests to a human with subject matter expertise.

visualization

A graph, chart, plot, table, map, or any other visual representation of data.

W

weight

A coefficient for a node that transforms input data within the network's layer. Weight is a parameter that an AI model learns through training, adjusting its value to reduce errors in the model's predictions.

Z

zero-shot prompt

A prompting technique in which the model completes a task without being given a specific example of how.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more