class Aws::SageMaker::Types::DatasetDefinition

Configuration for Dataset Definition inputs. The Dataset Definition input must specify exactly one of either `AthenaDatasetDefinition` or `RedshiftDatasetDefinition` types.

@note When making an API call, you may pass DatasetDefinition

data as a hash:

    {
      athena_dataset_definition: {
        catalog: "AthenaCatalog", # required
        database: "AthenaDatabase", # required
        query_string: "AthenaQueryString", # required
        work_group: "AthenaWorkGroup",
        output_s3_uri: "S3Uri", # required
        kms_key_id: "KmsKeyId",
        output_format: "PARQUET", # required, accepts PARQUET, ORC, AVRO, JSON, TEXTFILE
        output_compression: "GZIP", # accepts GZIP, SNAPPY, ZLIB
      },
      redshift_dataset_definition: {
        cluster_id: "RedshiftClusterId", # required
        database: "RedshiftDatabase", # required
        db_user: "RedshiftUserName", # required
        query_string: "RedshiftQueryString", # required
        cluster_role_arn: "RoleArn", # required
        output_s3_uri: "S3Uri", # required
        kms_key_id: "KmsKeyId",
        output_format: "PARQUET", # required, accepts PARQUET, CSV
        output_compression: "None", # accepts None, GZIP, BZIP2, ZSTD, SNAPPY
      },
      local_path: "ProcessingLocalPath",
      data_distribution_type: "FullyReplicated", # accepts FullyReplicated, ShardedByS3Key
      input_mode: "Pipe", # accepts Pipe, File
    }

@!attribute [rw] athena_dataset_definition

Configuration for Athena Dataset Definition input.
@return [Types::AthenaDatasetDefinition]

@!attribute [rw] redshift_dataset_definition

Configuration for Redshift Dataset Definition input.
@return [Types::RedshiftDatasetDefinition]

@!attribute [rw] local_path

The local path where you want Amazon SageMaker to download the
Dataset Definition inputs to run a processing job. `LocalPath` is an
absolute path to the input data. This is a required parameter when
`AppManaged` is `False` (default).
@return [String]

@!attribute [rw] data_distribution_type

Whether the generated dataset is `FullyReplicated` or
`ShardedByS3Key` (default).
@return [String]

@!attribute [rw] input_mode

Whether to use `File` or `Pipe` input mode. In `File` (default)
mode, Amazon SageMaker copies the data from the input source onto
the local Amazon Elastic Block Store (Amazon EBS) volumes before
starting your training algorithm. This is the most commonly used
input mode. In `Pipe` mode, Amazon SageMaker streams input data from
the source directly to your algorithm without using the EBS volume.
@return [String]

@see docs.aws.amazon.com/goto/WebAPI/sagemaker-2017-07-24/DatasetDefinition AWS API Documentation

Constants

SENSITIVE