Overview of Cloud Pak for Data as a Service
Cloud Pak for Data as a Service is a cloud service platform for all your data governance, data engineering, data analysis, and AI lifecycle tasks. Cloud Pak for Data as a Service implements a data fabric solution so that you can provide instant and secure access to trusted data to your organization, automate processes and compliance, and deliver trustworthy AI in your applications.
Cloud Pak for Data as a Service is a fully managed cloud service platform with the following benefits:
- No installation, management, or updating of software or hardware
- Simple to scale up or down
- Secure and compliant
- Composable services architecture
- Subscription or consumption-based monthly billing
Watch this video to see an overview of Cloud Pak for Data as a Service
This video provides a visual method to learn the concepts and tasks in this documentation.
The Cloud Pak for Data as a Service data fabric solution
A data fabric architecture enables your enterprise to unlock the value of your data in a hybrid multicloud data landscape. Moving to a data fabric architecture transforms the way that your enterprise integrates, governs, and uses data for analytics, data science, customer master data, and compliance.
With a data fabric, you can have a secure and consistent way to access data from disparate sources. You can eliminate inefficient, repetitive, and manual data access and integration processes. A data fabric architecture bridges the gap between the sources and provides business-ready data to support your company's needs. You can work with data from various types of sources across a hybrid and multi-cloud landscape, while you keep that data secure and trusted with the full breadth of integrated data management capabilities.
Your data engineers need tools to prepare, transform, and virtualize data. Your data quality analysts need tools to measure the quality of the data. Your governance team needs tools to control, protect, and enrich your data. Your data consumers, such as business analysts and data scientists, need tools to collaboratively develop insights and models. With the Cloud Pak for Data platform of integrated tools, your organization can efficiently work together to use your data to improve your business.
A data fabric architecture implements active metadata management, which uses machine learning to automate metadata processing. The outcomes of the metadata analysis facilitate automated data discovery, improve confidence in data, and enable data protection and data governance at scale.
For more information on the data fabric solution, see Use cases. To experience implementing the data fabric, take the data fabric tutorials.
Services and platform architecture
You add features and tools to the Cloud Pak for Data as a Service platform by provisioning services. A set of core services is integrated into the common platform. Other associated services work with the platform but run outside of it. Depending on how you sign up for Cloud Pak for Data as a Service, you might start with a subset of the core services that represent a single data fabric solution use case.
You can provision these types of services from the Cloud Pak for Data as a Service services catalog:
-
Core services
Core services are seamlessly integrated and add tools, workspaces, or compute power to the platform UI:- Watson Studio for analyzing data
- Watson Machine Learning for building and deploying models
- Watson OpenScale for evaluating models
- IBM Knowledge Catalog for governing and cataloging data and other assets
- DataStage for integrating data
- Watson Query for virtualizing and querying data
- Match 360 for creating master data
- Data Replication for replicating data
- Cognos Dashboard Embedded for visualizing data
-
Associated services
IBM Cloud database services that you can use to access data from within the platform but store and manage the data outside the platform.Watson services that have their own UIs or provide APIs for analyzing data.
Workspaces and assets
Cloud Pak for Data as a Service is organized as a set of collaborative workspaces where you can work with your team or organization. Each workspace has a set of members with roles that provide permissions to perform actions. Most users work with assets, which are the items that users add to the platform. Data assets contain metadata that represents data, while assets that you create in tools, such as data pipelines and models, run code to work with data. The following diagram shows the main workspaces, their purposes, and how assets and other items move around the platform.
Projects
Projects are where your data science, data engineering, or data curation teams work with data to create assets, such as, notebooks, dashboards, models, data pipelines, or enriched data assets. Project tools are provided by most of the core services:
- Watson Studio provides the Data Refinery, Jupyter notebooks editor, SPSS Modeler, Decision Optimization, Pipelines, and RStudio tools
- Watson Machine Learning provides AutoAI and Federated Learning tools
- IBM Knowledge Catalog provides the Data Refinery, Metadata import, Metadata enrichment, and masking flows tools
- DataStage provides the DataStage data pipelines editor
- Cognos Dashboard Embedded provides the dashboard editor
- Data Replication provides the Data Replication tool
- Match 360 provides the Master data configuration tool
The following image shows what the Overview page of a project might look like.
Catalogs
Catalogs are where your organization finds and stores high-quality, trusted data, and other assets, such as model factsheets. You can find data assets in a catalog and move them into a project to work with the data. Or you can curate data in projects and publish the high-quality data assets to a catalog for others to use. Catalogs require the IBM Knowledge Catalog service.
The following image shows what the Assets page of a catalog might look like.
Deployment spaces
Deployment spaces are where your ModelOps team deploys models and other deployable assets to production and then tests and manages deployments in production. After you build models and deployable assets in projects, you promote them to deployment spaces.
The following image shows what the Overview page of a deployment space might look like.
Categories
Categories are where your governance team creates and manages governance artifacts that enrich data assets in catalogs. Categories require the IBM Knowledge Catalog service.
The following image shows what a category might look like.
Other workspaces
You can create specialized data assets in other workspaces and move them to projects and catalogs:
- The Watson Query service provides a workspace to virtualize data assets over many data sources.
- The Match360 service provides a workspace to configure and explore a 360-degree view of customer data.
The Resource hub
The platform includes an integrated Resource hub that provides sample data assets, notebooks, and projects. Sample notebooks provide examples of data science and machine learning code. Sample projects, including industry accelerators, contain sets of data, models, other assets, and detailed instructions on how to solve a particular business problem. The Resource hub also provides Knowledge Accelerators, which contain sets of governance artifacts that you can import to provide business vocabularies for specific industries.
The following image shows what the Resource hub looks like.
Watch this video to see a tour of the Resource hub.
This video provides a visual method to learn the concepts and tasks in this documentation.
Learn more
- About the Cloud Pak for Data as a Service product
- Services in Cloud Pak for Data as a Service
- Use cases
- Supported connections
- Relationships between the core services and Cloud Pak for Data as a Service
- Feature differences between Cloud Pak for Data deployments
- Asset types and properties
- Object storage for workspaces
Parent topic: Cloud Pak for Data as a Service