tensorfx.data¶
DataSet and DataSource Implementations¶
-
class
tensorfx.data.DataSet(datasources, schema, metadata, features)¶ A class representing data to be used within a job.
A DataSet contains one or more DataSource instances, each associated with a name.
-
features¶ Retrives the features defined with the DataSet.
-
metadata¶ Retrives the metadata associated with the DataSet.
-
parse_instances(instances, prediction=False)¶ Parses input instances according to the associated schema, metadata and features.
Parameters: - instances – The tensor containing input strings.
- prediction – Whether the instances are being parsed for producing predictions or not.
Returns: A dictionary of tensors key’ed by feature names.
-
schema¶ Retrives the schema associated with the DataSet.
-
sources¶ Retrieves the names of the contained DataSource instances.
-
-
class
tensorfx.data.DataSource¶ A base class representing data that can be read for use in a job.
-
read(batch=128, shuffle=False, shuffle_buffer=1000, epochs=0, threads=1)¶ Reads the data represented by this DataSource using a TensorFlow reader.
Parameters: - batch – The number of records to read at a time.
- shuffle – Whether to shuffle the list of files.
- shuffle_buffer – When shuffling, the number of extra items to keep in the queue for randomness.
- epochs – The number of epochs or passes over the data to perform.
- threads – the number of threads to use to read from the queue.
Returns: A tensor containing a list of instances read.
-
read_instances(count, shuffle, epochs)¶ Reads the data represented by this DataSource using a TensorFlow reader.
Parameters: - count – The number of instances to read in at most.
- shuffle – Whether to shuffle the input queue of files.
- epochs – The number of epochs or passes over the data to perform.
Returns: A tensor containing instances that are read.
-
-
class
tensorfx.data.CsvDataSet(schema, metadata=None, features=None, **kwargs)¶ A DataSet representing data in csv format.
-
parse_instances(instances, prediction=False)¶ Parses input instances according to the associated schema.
Parameters: - instances – The tensor containing input strings.
- prediction – Whether the instances are being parsed for producing predictions or not.
Returns: A dictionary of tensors key’ed by field names.
-
-
class
tensorfx.data.CsvDataSource(path, delimiter=', ')¶ A DataSource representing one or more csv files.
-
path¶ Retrives the path represented by the DataSource.
-
read_instances(count, shuffle, epochs)¶ Reads the data represented by this DataSource using a TensorFlow reader.
Parameters: epochs – The number of epochs or passes over the data to perform. Returns: A tensor containing instances that are read.
-
-
class
tensorfx.data.DataFrameDataSet(features=None, **kwargs)¶ A DataSet representing data loaded as Pandas DataFrame instances.
-
parse_instances(instances, prediction=False)¶ Parses input instances according to the associated schema.
Parameters: - instances – The tensor containing input strings.
- prediction – Whether the instances are being parsed for producing predictions or not.
Returns: A dictionary of tensors key’ed by feature names.
-
-
class
tensorfx.data.DataFrameDataSource(df)¶ A DataSource representing a Pandas DataFrame.
This class is useful for working with local/in-memory data.
-
dataframe¶ Retrieves the DataFrame represented by this DataSource.
-
read_instances(count, shuffle, epochs)¶ Reads the data represented by this DataSource using a TensorFlow reader.
Parameters: epochs – The number of epochs or passes over the data to perform. Returns: A tensor containing instances that are read.
-
Schema and Metadata¶
-
class
tensorfx.data.Schema(fields)¶ Defines the schema of a DataSet.
The schema represents the structure of the source data before it is transformed into features.
-
static
create(*args)¶ Creates a Schema from a set of fields.
Parameters: args – a list or sequence of ordered fields defining the schema. Returns: A Schema instance.
-
fields¶ Retrieve the names of the fields in the schema.
-
format()¶ Formats a Schema instance into its YAML specification.
Returns: A string containing the YAML specification.
-
static
parse(spec)¶ Parses a Schema from a YAML specification.
Parameters: spec – The schema specification to parse. Returns: A Schema instance.
-
static
-
class
tensorfx.data.SchemaField(name, type)¶ Defines a named and typed field within a Schema.
-
classmethod
binary(name)¶ Creates a field representing a binary byte buffer.
Parameters: name – the name of the field.
-
classmethod
discrete(name)¶ Creates a field representing a discrete value.
Parameters: name – the name of the field.
-
name¶ Retrieves the name of the field.
-
classmethod
numeric(name)¶ Creates a field representing a number.
Parameters: name – the name of the field.
-
classmethod
text(name)¶ Creates a field representing a text string.
Parameters: name – the name of the field.
-
type¶ Retrieves the type of the field.
-
classmethod
-
class
tensorfx.data.SchemaFieldType¶ Defines the types of SchemaField instances.
-
class
tensorfx.data.Metadata(md)¶ This class encapsulates metadata for individual fields within a dataset.
Metadata is key’ed by individual field names, and is represented as key/value pairs, specific to the type of the field, and the analysis performed to generate the metadata.
-
static
parse(metadata)¶ Parses a Metadata instance from a JSON specification.
Parameters: metadata – The metadata to parse. Returns: A Metadata instance.
-
static
Features¶
-
class
tensorfx.data.FeatureSet(features)¶ Represents the set of features consumed by a model during training and prediction.
A FeatureSet contains a set of named features. Features are derived from input fields specified in a schema and constructed using a transformation.
-
static
create(*args)¶ Creates a FeatureSet from a set of features.
Parameters: args – a list or sequence of features defining the FeatureSet. Returns: A FeatureSet instance.
-
static
parse(spec)¶ Parses a FeatureSet from a YAML specification.
Parameters: spec – The feature specification to parse. Returns: A FeatureSet instance.
-
static
-
class
tensorfx.data.Feature(name, type, fields=None, features=None, transform=None)¶ Defines a named feature within a FeatureSet.
-
classmethod
bucketize(name, field, boundaries)¶ Creates a feature representing a bucketized version of a numeric field.
The value is returned is the index of the bucket that the value falls into in one-hot representation.
Parameters: - name – The name of the feature.
- field – The name of the field to create the feature from.
- boundaries – The list of bucket boundaries.
Returns: An instance of a Feature.
-
classmethod
concatenate(name, *args)¶ Creates a composite feature that is a concatenation of multiple features.
Parameters: - name – the name of the feature.
- args – the sequence of features to concatenate.
Returns: An instance of a Feature.
-
features¶ Retrieves the features making up a composite feature.
-
field¶ Retrieves the field making up the feature if the feature is based on a single field.
-
fields¶ Retrieves the fields making up the feature.
-
format()¶ Retrieves the raw serializable representation of the features.
-
classmethod
identity(name, field)¶ Creates a feature representing an un-transformed schema field.
Parameters: - name – the name of the feature.
- field – the name of the field.
Returns: An instance of a Feature.
-
classmethod
log(name, field)¶ Creates a feature representing a log value of a numeric field.
Parameters: - name – The name of the feature.
- field – The name of the field to create the feature from.
Returns: An instance of a Feature.
-
name¶ Retrieves the name of the feature.
-
classmethod
one_hot(name, field)¶ Creates a feature representing a one-hot representation of a discrete field.
Parameters: - name – The name of the feature.
- field – The name of the field to create the feature from.
Returns: An instance of a Feature.
-
static
parse(data)¶ Parses a feature from its serialized data representation.
Parameters: data – A dictionary holding the serialized representation. Returns: The parsed Feature instance.
-
classmethod
scale(name, field, range=(0, 1))¶ Creates a feature representing a scaled version of a numeric field.
In order to perform scaling, the metadata will be looked up for the field, to retrieve min, max and mean values.
Parameters: - name – The name of the feature.
- field – The name of the field to create the feature from.
- range – The target range of the feature.
Returns: An instance of a Feature.
-
classmethod
target(name, field)¶ Creates a feature representing the target value.
Parameters: - name – the name of the feature.
- field – the name of the field.
Returns: An instance of a Feature.
-
transform¶ Retrieves the transform configuration to produce the feature.
-
type¶ Retrieves the type of the feature.
-
classmethod
-
class
tensorfx.data.FeatureType¶ Defines the type of Feature instances.