tensorfx.training

Model Builder

class tensorfx.training.ModelArguments(**kwargs)
classmethod build_parser()

Builds the argument parser.

Returns:An argument parser with arguments added.
classmethod default()

Creates an instance of the arguments with default values.

Returns:The model arguments with default values.
classmethod parse(args=None, parse_job=False)

Parses training arguments.

Parameters:
  • args – the arguments to parse. If unspecified, the process arguments are used.
  • parse_job – whether to parse the job related standard arguments.
Returns:

The model arguments, and optionally the training output path as well.

process()

Processes the parsed arguments to produce any additional objects.

class tensorfx.training.ModelBuilder(args, dataset)

Builds model graphs for different phases: training, evaluation and prediction.

A model graph is an interface that encapsulates a TensorFlow graph, and references to tensors and ops within that graph.

A ModelBuilder serves as a base class for various models. Each specific model adds its specific logic to build the required TensorFlow graph.

args

Retrieves the set of arguments specified for training.

build_evaluation(inputs, outputs)

Builds the evaluation graph.abs

Parameters:
  • inputs – the dictionary of tensors corresponding to the input.
  • outputs – the dictionary containing output tensors.
Returns:

The eval metric tensor and the eval op.

build_evaluation_graph()

Builds the graph to use for evaluating a model during training.

Returns:The set of tensors and ops references required for evaluation.
build_graph_interfaces(config)

Builds graph interfaces for training and evaluating a model, and for predicting using it.

A graph interface is an object containing a TensorFlow graph member, as well as members corresponding to various tensors and ops within the graph.

Parameters:config – The training Configuration object.
Returns:A tuple consisting of the training, evaluation and prediction interfaces.
build_inference(inputs, training)

Builds the inference sub-graph.

Parameters:
  • inputs – the dictionary of tensors corresponding to the input.
  • training – whether the inference sub-graph is being built for the training graph.
Returns:

The inference values.

build_init()

Builds the initialization sub-graph.

The default implementation creates an initialization op that initializes all variables, locals for initialization, and another for all non-traininable variables and tables for local initialization.

Initialization is run when the graph is first created, before training. Local initialization is performed after a previously trained model is loaded.

Returns:A tuple containing the init op and local init op to use to initialize the graph.
build_input(source, batch, epochs, shuffle)

Builds the input sub-graph.

Parameters:
  • source – the name of data source to use for input (for training and evaluation).
  • batch – the number of instances to read per batch.
  • epochs – the number of passes over the data.
  • shuffle – whether to shuffle the data.
Returns:

A dictionary of tensors key’ed by feature names.

build_output(inferences)

Builds the output sub-graph

Parameters:inferences – the inference values.
Returns:A dictionary consisting of the output prediction tensors.
build_prediction_graph()

Builds the graph to use for predictions with the trained model.

Returns:The set of tensors and ops references required for prediction.
build_training(global_steps, inputs, inferences)

Builds the training sub-graph.

Parameters:
  • global_steps – the global steps variable to use.
  • inputs – the dictionary of tensors corresponding to the input.
  • inferences – the inference values.
Returns:

The loss tensor, and the training op.

build_training_graph()

Builds the graph to use for training a model.

This operates on the current default graph.

Returns:The set of tensors and ops references required for training.
dataset

Retrieves the DataSet being used for training and evaluation data.

Training Jobs

class tensorfx.training.Configuration(task, cluster, job, env)

Contains configuration information for the training process.

cluster

Retrieves the cluster definition containing the current node.

This is None if the current node is part of a single node training job.

create_device_setter(args)

Creates the device setter, which assigns variables and ops to devices in distributed mode.

Parameters:args – the arguments associated with the current job.
create_server()

Creates the TensorFlow server, which is required for distributed training.

device

Retrieve the device associated with the current node.

distributed

Determines if training being performed is distributed or is single node training.

Returns:True if the configuration represents distributed training; False otherwise.
classmethod environment()

Creates a Configuration object for single node and distributed training.

This relies on looking up configuration from an environment variable, ‘TF_CONFIG’ which allows a hosting environment to configure the training process. The specific environment variable is expected to be a JSON formatted dictionary containing configuration about the current task, cluster and job.

Returns:A Configuration instance matching the current environment.
job

Retrieves the job definition of the current training job.

classmethod local()

Creates a Configuration object representing single node training in a process.

Returns:A default Configuration instance with simple configuration.
master

Retrieves whether the current task is a master task.

param_server

Retrieves whether the current task is a parameter server task.

task

Retrieves the task definition associated with the current node.

If no job information is provided, this is None.

worker

Retrieves whether the current task is a worker task.

class tensorfx.training.ModelTrainer(config=None)

Provides the functionality to train a model during a training job.

config

Retrieves the training configuration.

train(model_builder, job_args)

Runs the training process to train a model.

Parameters:
  • model_builder – the ModelBuilder to use to build graphs during training.
  • job_args – the arguments for the training job.
Returns:

The trained Model. The resulting value is only relevant for master nodes.