tensorfx.data

DataSet and DataSource Implementations

class tensorfx.data.DataSet(datasources, schema, metadata, features)

A class representing data to be used within a job.

A DataSet contains one or more DataSource instances, each associated with a name.

features

Retrives the features defined with the DataSet.

metadata

Retrives the metadata associated with the DataSet.

parse_instances(instances, prediction=False)

Parses input instances according to the associated schema, metadata and features.

Parameters:
  • instances – The tensor containing input strings.
  • prediction – Whether the instances are being parsed for producing predictions or not.
Returns:

A dictionary of tensors key’ed by feature names.

schema

Retrives the schema associated with the DataSet.

sources

Retrieves the names of the contained DataSource instances.

class tensorfx.data.DataSource

A base class representing data that can be read for use in a job.

read(batch=128, shuffle=False, shuffle_buffer=1000, epochs=0, threads=1)

Reads the data represented by this DataSource using a TensorFlow reader.

Parameters:
  • batch – The number of records to read at a time.
  • shuffle – Whether to shuffle the list of files.
  • shuffle_buffer – When shuffling, the number of extra items to keep in the queue for randomness.
  • epochs – The number of epochs or passes over the data to perform.
  • threads – the number of threads to use to read from the queue.
Returns:

A tensor containing a list of instances read.

read_instances(count, shuffle, epochs)

Reads the data represented by this DataSource using a TensorFlow reader.

Parameters:
  • count – The number of instances to read in at most.
  • shuffle – Whether to shuffle the input queue of files.
  • epochs – The number of epochs or passes over the data to perform.
Returns:

A tensor containing instances that are read.

class tensorfx.data.CsvDataSet(schema, metadata=None, features=None, **kwargs)

A DataSet representing data in csv format.

parse_instances(instances, prediction=False)

Parses input instances according to the associated schema.

Parameters:
  • instances – The tensor containing input strings.
  • prediction – Whether the instances are being parsed for producing predictions or not.
Returns:

A dictionary of tensors key’ed by field names.

class tensorfx.data.CsvDataSource(path, delimiter=', ')

A DataSource representing one or more csv files.

path

Retrives the path represented by the DataSource.

read_instances(count, shuffle, epochs)

Reads the data represented by this DataSource using a TensorFlow reader.

Parameters:epochs – The number of epochs or passes over the data to perform.
Returns:A tensor containing instances that are read.
class tensorfx.data.DataFrameDataSet(features=None, **kwargs)

A DataSet representing data loaded as Pandas DataFrame instances.

parse_instances(instances, prediction=False)

Parses input instances according to the associated schema.

Parameters:
  • instances – The tensor containing input strings.
  • prediction – Whether the instances are being parsed for producing predictions or not.
Returns:

A dictionary of tensors key’ed by feature names.

class tensorfx.data.DataFrameDataSource(df)

A DataSource representing a Pandas DataFrame.

This class is useful for working with local/in-memory data.

dataframe

Retrieves the DataFrame represented by this DataSource.

read_instances(count, shuffle, epochs)

Reads the data represented by this DataSource using a TensorFlow reader.

Parameters:epochs – The number of epochs or passes over the data to perform.
Returns:A tensor containing instances that are read.

Schema and Metadata

class tensorfx.data.Schema(fields)

Defines the schema of a DataSet.

The schema represents the structure of the source data before it is transformed into features.

static create(*args)

Creates a Schema from a set of fields.

Parameters:args – a list or sequence of ordered fields defining the schema.
Returns:A Schema instance.
fields

Retrieve the names of the fields in the schema.

format()

Formats a Schema instance into its YAML specification.

Returns:A string containing the YAML specification.
static parse(spec)

Parses a Schema from a YAML specification.

Parameters:spec – The schema specification to parse.
Returns:A Schema instance.
class tensorfx.data.SchemaField(name, type)

Defines a named and typed field within a Schema.

classmethod binary(name)

Creates a field representing a binary byte buffer.

Parameters:name – the name of the field.
classmethod discrete(name)

Creates a field representing a discrete value.

Parameters:name – the name of the field.
name

Retrieves the name of the field.

classmethod numeric(name)

Creates a field representing a number.

Parameters:name – the name of the field.
classmethod text(name)

Creates a field representing a text string.

Parameters:name – the name of the field.
type

Retrieves the type of the field.

class tensorfx.data.SchemaFieldType

Defines the types of SchemaField instances.

class tensorfx.data.Metadata(md)

This class encapsulates metadata for individual fields within a dataset.

Metadata is key’ed by individual field names, and is represented as key/value pairs, specific to the type of the field, and the analysis performed to generate the metadata.

static parse(metadata)

Parses a Metadata instance from a JSON specification.

Parameters:metadata – The metadata to parse.
Returns:A Metadata instance.

Features

class tensorfx.data.FeatureSet(features)

Represents the set of features consumed by a model during training and prediction.

A FeatureSet contains a set of named features. Features are derived from input fields specified in a schema and constructed using a transformation.

static create(*args)

Creates a FeatureSet from a set of features.

Parameters:args – a list or sequence of features defining the FeatureSet.
Returns:A FeatureSet instance.
static parse(spec)

Parses a FeatureSet from a YAML specification.

Parameters:spec – The feature specification to parse.
Returns:A FeatureSet instance.
class tensorfx.data.Feature(name, type, fields=None, features=None, transform=None)

Defines a named feature within a FeatureSet.

classmethod bucketize(name, field, boundaries)

Creates a feature representing a bucketized version of a numeric field.

The value is returned is the index of the bucket that the value falls into in one-hot representation.

Parameters:
  • name – The name of the feature.
  • field – The name of the field to create the feature from.
  • boundaries – The list of bucket boundaries.
Returns:

An instance of a Feature.

classmethod concatenate(name, *args)

Creates a composite feature that is a concatenation of multiple features.

Parameters:
  • name – the name of the feature.
  • args – the sequence of features to concatenate.
Returns:

An instance of a Feature.

features

Retrieves the features making up a composite feature.

field

Retrieves the field making up the feature if the feature is based on a single field.

fields

Retrieves the fields making up the feature.

format()

Retrieves the raw serializable representation of the features.

classmethod identity(name, field)

Creates a feature representing an un-transformed schema field.

Parameters:
  • name – the name of the feature.
  • field – the name of the field.
Returns:

An instance of a Feature.

classmethod log(name, field)

Creates a feature representing a log value of a numeric field.

Parameters:
  • name – The name of the feature.
  • field – The name of the field to create the feature from.
Returns:

An instance of a Feature.

name

Retrieves the name of the feature.

classmethod one_hot(name, field)

Creates a feature representing a one-hot representation of a discrete field.

Parameters:
  • name – The name of the feature.
  • field – The name of the field to create the feature from.
Returns:

An instance of a Feature.

static parse(data)

Parses a feature from its serialized data representation.

Parameters:data – A dictionary holding the serialized representation.
Returns:The parsed Feature instance.
classmethod scale(name, field, range=(0, 1))

Creates a feature representing a scaled version of a numeric field.

In order to perform scaling, the metadata will be looked up for the field, to retrieve min, max and mean values.

Parameters:
  • name – The name of the feature.
  • field – The name of the field to create the feature from.
  • range – The target range of the feature.
Returns:

An instance of a Feature.

classmethod target(name, field)

Creates a feature representing the target value.

Parameters:
  • name – the name of the feature.
  • field – the name of the field.
Returns:

An instance of a Feature.

transform

Retrieves the transform configuration to produce the feature.

type

Retrieves the type of the feature.

class tensorfx.data.FeatureType

Defines the type of Feature instances.