tensorfx.data¶
DataSet and DataSource Implementations¶
-
class
tensorfx.data.
DataSet
(datasources, schema, metadata, features)¶ A class representing data to be used within a job.
A DataSet contains one or more DataSource instances, each associated with a name.
-
features
¶ Retrives the features defined with the DataSet.
-
metadata
¶ Retrives the metadata associated with the DataSet.
-
parse_instances
(instances, prediction=False)¶ Parses input instances according to the associated schema, metadata and features.
Parameters: - instances – The tensor containing input strings.
- prediction – Whether the instances are being parsed for producing predictions or not.
Returns: A dictionary of tensors key’ed by feature names.
-
schema
¶ Retrives the schema associated with the DataSet.
-
sources
¶ Retrieves the names of the contained DataSource instances.
-
-
class
tensorfx.data.
DataSource
¶ A base class representing data that can be read for use in a job.
-
read
(batch=128, shuffle=False, shuffle_buffer=1000, epochs=0, threads=1)¶ Reads the data represented by this DataSource using a TensorFlow reader.
Parameters: - batch – The number of records to read at a time.
- shuffle – Whether to shuffle the list of files.
- shuffle_buffer – When shuffling, the number of extra items to keep in the queue for randomness.
- epochs – The number of epochs or passes over the data to perform.
- threads – the number of threads to use to read from the queue.
Returns: A tensor containing a list of instances read.
-
read_instances
(count, shuffle, epochs)¶ Reads the data represented by this DataSource using a TensorFlow reader.
Parameters: - count – The number of instances to read in at most.
- shuffle – Whether to shuffle the input queue of files.
- epochs – The number of epochs or passes over the data to perform.
Returns: A tensor containing instances that are read.
-
-
class
tensorfx.data.
CsvDataSet
(schema, metadata=None, features=None, **kwargs)¶ A DataSet representing data in csv format.
-
parse_instances
(instances, prediction=False)¶ Parses input instances according to the associated schema.
Parameters: - instances – The tensor containing input strings.
- prediction – Whether the instances are being parsed for producing predictions or not.
Returns: A dictionary of tensors key’ed by field names.
-
-
class
tensorfx.data.
CsvDataSource
(path, delimiter=', ')¶ A DataSource representing one or more csv files.
-
path
¶ Retrives the path represented by the DataSource.
-
read_instances
(count, shuffle, epochs)¶ Reads the data represented by this DataSource using a TensorFlow reader.
Parameters: epochs – The number of epochs or passes over the data to perform. Returns: A tensor containing instances that are read.
-
-
class
tensorfx.data.
DataFrameDataSet
(features=None, **kwargs)¶ A DataSet representing data loaded as Pandas DataFrame instances.
-
parse_instances
(instances, prediction=False)¶ Parses input instances according to the associated schema.
Parameters: - instances – The tensor containing input strings.
- prediction – Whether the instances are being parsed for producing predictions or not.
Returns: A dictionary of tensors key’ed by feature names.
-
-
class
tensorfx.data.
DataFrameDataSource
(df)¶ A DataSource representing a Pandas DataFrame.
This class is useful for working with local/in-memory data.
-
dataframe
¶ Retrieves the DataFrame represented by this DataSource.
-
read_instances
(count, shuffle, epochs)¶ Reads the data represented by this DataSource using a TensorFlow reader.
Parameters: epochs – The number of epochs or passes over the data to perform. Returns: A tensor containing instances that are read.
-
Schema and Metadata¶
-
class
tensorfx.data.
Schema
(fields)¶ Defines the schema of a DataSet.
The schema represents the structure of the source data before it is transformed into features.
-
static
create
(*args)¶ Creates a Schema from a set of fields.
Parameters: args – a list or sequence of ordered fields defining the schema. Returns: A Schema instance.
-
fields
¶ Retrieve the names of the fields in the schema.
-
format
()¶ Formats a Schema instance into its YAML specification.
Returns: A string containing the YAML specification.
-
static
parse
(spec)¶ Parses a Schema from a YAML specification.
Parameters: spec – The schema specification to parse. Returns: A Schema instance.
-
static
-
class
tensorfx.data.
SchemaField
(name, type)¶ Defines a named and typed field within a Schema.
-
classmethod
binary
(name)¶ Creates a field representing a binary byte buffer.
Parameters: name – the name of the field.
-
classmethod
discrete
(name)¶ Creates a field representing a discrete value.
Parameters: name – the name of the field.
-
name
¶ Retrieves the name of the field.
-
classmethod
numeric
(name)¶ Creates a field representing a number.
Parameters: name – the name of the field.
-
classmethod
text
(name)¶ Creates a field representing a text string.
Parameters: name – the name of the field.
-
type
¶ Retrieves the type of the field.
-
classmethod
-
class
tensorfx.data.
SchemaFieldType
¶ Defines the types of SchemaField instances.
-
class
tensorfx.data.
Metadata
(md)¶ This class encapsulates metadata for individual fields within a dataset.
Metadata is key’ed by individual field names, and is represented as key/value pairs, specific to the type of the field, and the analysis performed to generate the metadata.
-
static
parse
(metadata)¶ Parses a Metadata instance from a JSON specification.
Parameters: metadata – The metadata to parse. Returns: A Metadata instance.
-
static
Features¶
-
class
tensorfx.data.
FeatureSet
(features)¶ Represents the set of features consumed by a model during training and prediction.
A FeatureSet contains a set of named features. Features are derived from input fields specified in a schema and constructed using a transformation.
-
static
create
(*args)¶ Creates a FeatureSet from a set of features.
Parameters: args – a list or sequence of features defining the FeatureSet. Returns: A FeatureSet instance.
-
static
parse
(spec)¶ Parses a FeatureSet from a YAML specification.
Parameters: spec – The feature specification to parse. Returns: A FeatureSet instance.
-
static
-
class
tensorfx.data.
Feature
(name, type, fields=None, features=None, transform=None)¶ Defines a named feature within a FeatureSet.
-
classmethod
bucketize
(name, field, boundaries)¶ Creates a feature representing a bucketized version of a numeric field.
The value is returned is the index of the bucket that the value falls into in one-hot representation.
Parameters: - name – The name of the feature.
- field – The name of the field to create the feature from.
- boundaries – The list of bucket boundaries.
Returns: An instance of a Feature.
-
classmethod
concatenate
(name, *args)¶ Creates a composite feature that is a concatenation of multiple features.
Parameters: - name – the name of the feature.
- args – the sequence of features to concatenate.
Returns: An instance of a Feature.
-
features
¶ Retrieves the features making up a composite feature.
-
field
¶ Retrieves the field making up the feature if the feature is based on a single field.
-
fields
¶ Retrieves the fields making up the feature.
-
format
()¶ Retrieves the raw serializable representation of the features.
-
classmethod
identity
(name, field)¶ Creates a feature representing an un-transformed schema field.
Parameters: - name – the name of the feature.
- field – the name of the field.
Returns: An instance of a Feature.
-
classmethod
log
(name, field)¶ Creates a feature representing a log value of a numeric field.
Parameters: - name – The name of the feature.
- field – The name of the field to create the feature from.
Returns: An instance of a Feature.
-
name
¶ Retrieves the name of the feature.
-
classmethod
one_hot
(name, field)¶ Creates a feature representing a one-hot representation of a discrete field.
Parameters: - name – The name of the feature.
- field – The name of the field to create the feature from.
Returns: An instance of a Feature.
-
static
parse
(data)¶ Parses a feature from its serialized data representation.
Parameters: data – A dictionary holding the serialized representation. Returns: The parsed Feature instance.
-
classmethod
scale
(name, field, range=(0, 1))¶ Creates a feature representing a scaled version of a numeric field.
In order to perform scaling, the metadata will be looked up for the field, to retrieve min, max and mean values.
Parameters: - name – The name of the feature.
- field – The name of the field to create the feature from.
- range – The target range of the feature.
Returns: An instance of a Feature.
-
classmethod
target
(name, field)¶ Creates a feature representing the target value.
Parameters: - name – the name of the feature.
- field – the name of the field.
Returns: An instance of a Feature.
-
transform
¶ Retrieves the transform configuration to produce the feature.
-
type
¶ Retrieves the type of the feature.
-
classmethod
-
class
tensorfx.data.
FeatureType
¶ Defines the type of Feature instances.