Overview of Cloud Pak for Data as a Service

Cloud Pak for Data as a Service is a cloud native modular service platform for all your data governance, data engineering, data analysis, and AI modeling tasks. Cloud Pak for Data as a Service includes an integrated data fabric with which you can logically collect and organize all of your data so that your data consumers have instant and secure access to trusted information. Supported by the data fabric, Cloud Pak for Data as a Service includes a suite of data science and AI tools so that your data consumers can analyze your data and infuse your applications with AI for better business outcomes.

Cloud Pak for Data as a Service is a fully managed cloud service platform with the following benefits:

  • No installation, management, or updating of software or hardware
  • Simple to scale up or down
  • Secure and compliant
  • Composable services architecture
  • A subscription with a single monthly bill

The Cloud Pak for Data as a Service data fabric

A data fabric is an architectural pattern for managing highly distributed and disparate data. Because it is designed for hybrid and multi-cloud data environments, a data fabric supports the decoupling of data storage, data processing, and data use. With the intelligent knowledge catalog capabilities, you can elevate data into enterprise assets that are governed globally regardless of where the data is stored, processed, or used. Catalog assets are automatically assigned metadata that describes logical connections between data sources and enriches them with semantics so that you can provide business-ready data for your applications, services, and users.

The data fabric architecture that is provided by Cloud Pak for Data as a Service enables your organization to accelerate data analysis for better, faster insights.

Watch this video to see data fabric in action.

This video provides a visual method as an alternative to following the written steps in this documentation.

With the capabilities of the Cloud Pak for Data as a Service data fabric architecture, you can:

  • Simplify and automate access to data, across multi-cloud and on-premises data sources, without moving data.
  • Universally safeguard the use of all data, regardless of source.
  • Provide business users with a self-service experience for finding and using data.
  • Use AI-powered capabilities to automate and orchestrate the data lifecycle.

The following diagram shows the five main capabilities of the data fabric and the connectivity between the platform and existing data sources.

Data fabric

Metadata-based knowledge core
Data stewards enrich data with metadata that describes the data and informs the semantic search for data. They curate data into catalogs by using automated discovery and classification. They can further enrich data assets by creating and assigning custom governance artifacts, such as business vocabulary. They can also import ready to use collections of metadata from industry-specific Knowledge Accelerators.

Components: Watson Knowledge Catalog service, Knowledge Accelerators

Data self-service in catalogs
Data scientists and other business users can find the data that they need in data catalogs that contain data from across the enterprise. They can use AI-powered semantic search and recommendations that consider asset metadata, browse for data, or view their peers’ highly rated assets. They copy data assets from a catalog into a project, where they collaborate to prepare, analyze, and model the data.

Components: Watson Knowledge Catalog service

Automated data integration
Data engineers and other users prepare your data for consumption. They can provide access to data in your existing data architecture and automate data preparation. They can integrate and virtualize data for faster, simpler querying. They can automate the bulk ingestion, cleansing, and complex transformations of data to regularly publish updated data assets. They can push down the processing of the data to the location of the data.

Components: Cloud Pak for Data as a Service platform, Data Refinery tool, Data Virtualization service, DataStage service, Satellite integration

Unified data governance, security, and compliance
Data stewards can create data protection rules to automatically enforce uniform data privacy across the platform. Data masking deidentifies sensitive data to provide data security while it preserves data utility and prevents the need for multiple copies of the data. Data stewards can import ready to use compliance metadata from Knowledge Accelerators.

Components: Watson Knowledge Catalog service, Knowledge Accelerators

Unified lifecycle
Users can design, build, test, orchestrate, deploy to production, and monitor different types of data pipelines in a unified way. Users can create or find data assets, search for them across the platform, and move them across workspaces. Users can orchestrate data transformations and other actions by scheduling jobs that run automatically.

Components: Cloud Pak for Data as a Service platform

To further explore the benefits of the data fabric, read the Data fabric architecture delivers three instant benefits white paper.

For more information on the concept of assets in Cloud Pak for Data as a Service, see Asset types and properties.

Cloud Pak for Data as a Service data science and AI tools

The data science and AI tools on Cloud Pak for Data as a Service enable everyone in your organization to participate in finding and sharing insights. The AI tools cover the complete AI lifecycle of preparing and training models, deploying models in your applications, and then evaluating models for bias, performance, and quality.

Comprehensive tool set
Data scientists, business analysts, and machine learning engineers can collaborate while choosing the tools that fit their individual preferences and skill levels. Users can write Python or R code, visually code by creating a flow of steps on a graphical canvas, or automatically build a ranked list of model candidates.

Components: Watson Studio, Cognos Dashboard Embedded

Easy deployment
Data scientists or machine learning engineers promote trained models to deployment spaces, deploy and score the models, review prediction scores and insights, and monitor deployment jobs in a dashboard.

Components: Watson Machine Learning

Trusted outcomes
Machine learning engineers evaluate deployments for bias or drift and update data and retrain deployed models to maintain quality goals. Models can be easily explained and understood by business users, and are auditable in business transactions.

Components: Watson OpenScale

The Cloud Pak for Data as a Service services architecture

Cloud Pak for Data as a Service is composed of a set of core services, related services, and a sample gallery.

Cloud Pak for Data as a Service

With Cloud Pak for Data as a Service, you can provision these types of services from the Cloud Pak for Data as a Service services catalog:

  • Core services to govern data, analyze data, run models, deploy models, and evaluate models.
  • Services that supplement the core services by adding tools, workspaces, or compute power.
  • IBM Cloud database services to store data that you can use in the platform.
  • Watson Assistant and other Watson services that have their own UIs or provide APIs for analyzing data.

The sample gallery provides data assets, notebooks, and projects. Sample data assets and notebooks provide examples of data science and machine learning code. Sample projects, including industry accelerators, contain a set of assets and detailed instructions on how to solve a particular business problem.

Functionality in the core services and the common platform

This illustration shows the functionality included in the common platform and the core services.

Services and common platform functionality

The following functionality is provided by the platform:

  • Administration at the account level, including user management and billing
  • Storage for projects, catalogs, and deployment spaces in IBM Cloud Object Storage
  • Global search for assets and artifacts across the platform
  • The Platform assets catalog for sharing connections across the platform
  • Role-based user management within collaborative workspaces across the platform
  • Common infrastructure for assets, projects, catalogs, and deployment spaces
  • A services catalog for provisioning more service instances

Watson Studio provides the following types of functionality in projects:

  • Tools to prepare data, analyze and visualize data, and build models
  • Environment definitions to provide compute resources

Watson Machine Learning provides the following functionality:

  • Tools to build models in projects
  • Tools to deploy models and manage deployed models in deployment spaces
  • Environment definitions to provide compute resources

Watson OpenScale provides the following functionality in a separate user interface:

  • Tools to evaluate models for fairness, drift, quality, and performance
  • Tools to create custom visualizations of model predictions and inputs

Watson Knowledge Catalog provides the following functionality:

  • Catalogs to share assets
  • Governance artifacts to control and enrich catalog assets
  • Categories to organize governance artifacts
  • Tools to import metadata and prepare data in projects

Services that supplement the core services

These services provide tools to projects, compute resources, and other workspaces to Cloud Pak for Data as a Service.

Cognos Dashboard Embedded provides the following functionality:

  • A tool in projects to build interactive dashboards to tell a story with data

DataStage provides the following functionality:

  • A tool in projects to build DataStage flows using a large collection of powerful stages and connectors
  • Environment definitions to run jobs generated from flows to extract, transform, and load data

Data Virtualization provides the following functionality:

  • A workspace to create virtual tables that segment or combine data from one or more tables

IBM Match 360 with Watson provides the following functionality:

  • A workspace for master data configuration and master data exploration

Learn more