TensorFlow Seq2Seq Model笔记
0. tf跑起来一直没有用GPU...
尴尬,跑起来发现GPU没用起来,CPU满了。发现装错了,应该装tensorflow-gpu。
代码测试是否用的是GPU:https://stackoverflow.com/questions/38009682/how-to-tell-if-tensorflow-is-using-gpu-acceleration-from-inside-python-shell
1. tf.app.run()的疑惑
http://stackoverflow.com/questions/33703624/how-does-tf-app-run-work
tf.app类似python中argparse
2. variable scope 和 name scope
Variable Scope mechanism: https://www.tensorflow.org/programmers_guide/variable_scope
http://stackoverflow.com/questions/35919020/whats-the-difference-of-name-scope-and-a-variable-scope-in-tensorflow
重点:Name scopes can be opened in addition to a variable scope, and then they will only affect the names of the ops, but not of variables.
with tf.variable_scope("foo"): with tf.name_scope("bar"): v = tf.get_variable("v", [1]) x = 1.0 + v assert v.name == "foo/v:0" assert x.op.name == "foo/bar/add"
scope.original_name_scope和scope.name的区别
http://stackoverflow.com/questions/41756054/tensorflow-variablescope-original-name-scope-vs-name
3. Python2 Python3区别
在修改data_utils.py(https://github.com/tensorflow/models/blob/master/tutorials/rnn/translate/data_utils.py)文件中:
没注意版本不同的区别,Python3语法中print语句没有了,取而代之的是print()函数。
另外 用python2执行时候:
with gfile.GFile(data_path, mode="rb") as f:
counter = 0
for line in f:
这里line里面含有"
",用split切分后会和最后一个word组合一起读入list。出现list写到文件中和len(list)大小不一致。
比如 li=["a", "b
"] 写入文件。"
"会换行,写的文件成为3行。
下次逐行读入时候会把空符号(‘’)计算为一个新word。
4. TensorFlow Saver类 https://www.tensorflow.org/api_docs/python/tf/train/Saver
http://blog.csdn.net/u011500062/article/details/51728830
5. Seq2Seq模型保存
其中保存的模型为:
translate.ckpt-16.data-00000-of-00001
translate.ckpt-16.index
translate.ckpt-16.meta
这些东西的解释见:
https://groups.google.com/a/tensorflow.org/forum/#!topic/discuss/Y4mzbDAUSec
http://stackoverflow.com/questions/36195454/what-is-the-tensorflow-checkpoint-meta-file
6. RNN示例
https://uqer.io/community/share/58a9332bf1973300597ae209
http://r2rt.com/recurrent-neural-networks-in-tensorflow-ii.html
7. List of tensor to tensor
http://stackoverflow.com/questions/35730161/how-to-convert-a-list-of-tensors-of-dim-n-to-a-tensor-of-dim-n1
http://blog.csdn.net/sherry_up/article/details/52169318
8. batch_matmul问题
想进行的操作: suppose I have a T x n x k
and want to
multiply it by a k x k2
, and then to a max pool overT
and
then a mean pool over n
. To do this now, I think you need to reshape, do the matmul()
and
then undo the reshape and then do the pooling.
https://github.com/tensorflow/tensorflow/issues/216
https://www.tensorflow.org/versions/r0.10/api_docs/python/math_ops/matrix_math_functions#batch_matmul
使用时候报错:AttributeError: "module" object has no attribute "batch_matmul",才发现1.0版本中没有这个。需要用matmul加参数进行使用
9. Cannot feed value of shape (XX) for Tensor u"target/Y:0", which has shape "(YY)"?
第一次遇到这种问题,google后说是input feed的数据shape不一致。但出问题是第五个变量(mask5),导致自己以为不是input feed问题(如果是为什么会是第5个才出问题?)。瞎折腾好久后,发现还是输入数据时候的问题.....
10. 读取现有模型
之前一直可以读取指定目录下现有模型,后来发现读不了,折腾了几个小时才发现以前下面是有一个checkpoint文件,里面会告诉模型两个path:
model_checkpoint_path: "translate.ckpt-101000"
all_model_checkpoint_paths: "translate.ckpt-101000"
11. 读模型内的参数值
https://www.tensorflow.org/programmers_guide/variables#checkpoint_files:
When you create a Saver object, you can optionally choose names for the variables in the checkpoint files. By default, it uses the value of the tf.Variable.name property for each variable.
To understand what variables are in a checkpoint, you can use the inspect_checkpoint library, and in particular, the print_tensors_in_checkpoint_file function.
*https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/inspect_checkpoint.py
用法:
python inspect_checkpoint.py --file_name=./alpha_easy_nmt/valid_model/translate.ckpt-625000tensor
显示:
Decoder/trg_lookup_table/embedding (DT_FLOAT) [16000,620]
Decoder/trg_lookup_table/embedding/Adadelta (DT_FLOAT) [16000,620]
Decoder/trg_lookup_table/embedding/Adadelta_1 (DT_FLOAT) [16000,620]
用法:
python inspect_checkpoint.py --file_name=./alpha_easy_nmt/valid_model/translate.ckpt-625000 --tensor_name=Decoder/W_sf
显示:
tensor_name: Decoder/W_sf
[[ -4.55709170e-07 -9.10816539e-07 4.44753543e-02 ..., -2.58049741e-02
4.26506670e-03 -3.64431571e-07]
[ 7.86067460e-07 7.86348721e-07 1.29140466e-02 ..., 7.92008177e-06
5.49392325e-07 6.99410566e-06]
[ -5.86683996e-07 5.51591484e-08 9.70983803e-02 ..., 2.75615434e-07
-4.86231060e-04 1.23817983e-07]
...,
[ -1.40239194e-06 -1.00237912e-06 -1.44313052e-01 ..., -1.33047411e-06
-1.17946070e-06 -2.41477892e-07]
[ 1.19242941e-06 -9.48488719e-08 -2.48298571e-02 ..., 1.00101170e-03
-3.03782895e-03 1.45507602e-06]
[ -1.27071712e-06 -1.27975386e-06 -2.31240150e-02 ..., -7.33333752e-02
2.30671745e-03 -5.72958811e-07]]
12. tf.get_variable的default initializer
https://www.tensorflow.org/api_docs/python/tf/get_variable:
If initializer is None (the default), the default initializer passed in the variable scope will be used. If that one is None too, a glorot_uniform_initializer will be used. The initializer can also be a Tensor, in which case the variable is initialized to this value and shape.
奇怪是glorot_uniform_initializer也差不到任何文档提及,github上倒是有人问过这个问题,不过没人回答https://github.com/tensorflow/tensorflow/issues/7791。
13. 系统记录
13.1 之前系统在NIST06上BLEU到20就停住了(theano版本和我师弟的版本都能到34)。beam search输出每一个beam,发现特别多的over translation问题。找了下发现自己系统出了一个小Bug,在beam search中忘记给source annotation加上mask。
Vocab = 16k,Batch = 50,两种优化算法每隔3000batch测一次BLEU。
Adam:
72000 BLEU score = 0.2412 BEST BLEU is 0 75000 BLEU score = 0.2377 BEST BLEU is 0.2412 78000 BLEU score = 0.2380 BEST BLEU is 0.2412 81000 BLEU score = 0.2513 BEST BLEU is 0.2412 84000 BLEU score = 0.2231 BEST BLEU is 0.2513 87000 BLEU score = 0.2527 BEST BLEU is 0.2513 90000 BLEU score = 0.2314 BEST BLEU is 0.2527 93000 BLEU score = 0.2498 BEST BLEU is 0.2527 96000 BLEU score = 0.2445 BEST BLEU is 0.2527 99000 BLEU score = 0.2487 BEST BLEU is 0.2527 102000 BLEU score = 0.2497 BEST BLEU is 0.2527 105000 BLEU score = 0.2523 BEST BLEU is 0.2527 108000 BLEU score = 0.2375 BEST BLEU is 0.2527 111000 BLEU score = 0.2380 BEST BLEU is 0.2527 114000 BLEU score = 0.2457 BEST BLEU is 0.2527 117000 BLEU score = 0.2525 BEST BLEU is 0.2527 120000 BLEU score = 0.2519 BEST BLEU is 0.2527 123000 BLEU score = 0.2491 BEST BLEU is 0.2527 126000 BLEU score = 0.2391 BEST BLEU is 0.2527 129000 BLEU score = 0.2304 BEST BLEU is 0.2527 132000 BLEU score = 0.2618 BEST BLEU is 0.2527 135000 BLEU score = 0.2489 BEST BLEU is 0.2618 138000 BLEU score = 0.2458 BEST BLEU is 0.2618 141000 BLEU score = 0.2505 BEST BLEU is 0.2618 144000 BLEU score = 0.2558 BEST BLEU is 0.2618 147000 BLEU score = 0.2492 BEST BLEU is 0.2618 150000 BLEU score = 0.2463 BEST BLEU is 0.2618 153000 BLEU score = 0.2586 BEST BLEU is 0.2618 156000 BLEU score = 0.2495 BEST BLEU is 0.2618 159000 BLEU score = 0.2568 BEST BLEU is 0.2618 162000 BLEU score = 0.2571 BEST BLEU is 0.2618 165000 BLEU score = 0.2611 BEST BLEU is 0.2618 168000 BLEU score = 0.2508 BEST BLEU is 0.2618 171000 BLEU score = 0.2450 BEST BLEU is 0.2618 174000 BLEU score = 0.2459 BEST BLEU is 0.2618 177000 BLEU score = 0.2579 BEST BLEU is 0.2618 180000 BLEU score = 0.2580 BEST BLEU is 0.2618 183000 BLEU score = 0.2520 BEST BLEU is 0.2618 186000 BLEU score = 0.2730 BEST BLEU is 0.2618 189000 BLEU score = 0.2430 BEST BLEU is 0.273 192000 BLEU score = 0.2571 BEST BLEU is 0.273 195000 BLEU score = 0.2541 BEST BLEU is 0.273 198000 BLEU score = 0.2471 BEST BLEU is 0.273 201000 BLEU score = 0.2491 BEST BLEU is 0.273 204000 BLEU score = 0.2589 BEST BLEU is 0.273 207000 BLEU score = 0.2523 BEST BLEU is 0.273 210000 BLEU score = 0.2536 BEST BLEU is 0.273 213000 BLEU score = 0.2557 BEST BLEU is 0.273 216000 BLEU score = 0.2457 BEST BLEU is 0.273 219000 BLEU score = 0.2661 BEST BLEU is 0.273 222000 BLEU score = 0.2515 BEST BLEU is 0.273 225000 BLEU score = 0.2644 BEST BLEU is 0.273 228000 BLEU score = 0.2616 BEST BLEU is 0.273 231000 BLEU score = 0.2554 BEST BLEU is 0.273 234000 BLEU score = 0.2621 BEST BLEU is 0.273 237000 BLEU score = 0.2519 BEST BLEU is 0.273 240000 BLEU score = 0.2440 BEST BLEU is 0.273 243000 BLEU score = 0.2572 BEST BLEU is 0.273 246000 BLEU score = 0.2488 BEST BLEU is 0.273 249000 BLEU score = 0.2631 BEST BLEU is 0.273 252000 BLEU score = 0.2584 BEST BLEU is 0.273 255000 BLEU score = 0.2570 BEST BLEU is 0.273 258000 BLEU score = 0.2581 BEST BLEU is 0.273 261000 BLEU score = 0.2510 BEST BLEU is 0.273 264000 BLEU score = 0.2476 BEST BLEU is 0.273 267000 BLEU score = 0.2667 BEST BLEU is 0.273 270000 BLEU score = 0.2689 BEST BLEU is 0.273 273000 BLEU score = 0.2596 BEST BLEU is 0.273 276000 BLEU score = 0.2592 BEST BLEU is 0.273 279000 BLEU score = 0.2617 BEST BLEU is 0.273 282000 BLEU score = 0.2652 BEST BLEU is 0.273 285000 BLEU score = 0.2651 BEST BLEU is 0.273 288000 BLEU score = 0.2732 BEST BLEU is 0.273 291000 BLEU score = 0.2505 BEST BLEU is 0.2732 294000 BLEU score = 0.2545 BEST BLEU is 0.2732 297000 BLEU score = 0.2737 BEST BLEU is 0.2732 300000 BLEU score = 0.2662 BEST BLEU is 0.2737
Adadelta:
72000 BLEU score = 0.1732 BEST BLEU is 0 75000 BLEU score = 0.1752 BEST BLEU is 0.1732 78000 BLEU score = 0.1888 BEST BLEU is 0.1752 81000 BLEU score = 0.1771 BEST BLEU is 0.1888 84000 BLEU score = 0.1876 BEST BLEU is 0.1888 87000 BLEU score = 0.1968 BEST BLEU is 0.1888 90000 BLEU score = 0.1664 BEST BLEU is 0.1968 93000 BLEU score = 0.2059 BEST BLEU is 0.1968 96000 BLEU score = 0.1816 BEST BLEU is 0.2059 99000 BLEU score = 0.2098 BEST BLEU is 0.2059 102000 BLEU score = 0.2086 BEST BLEU is 0.2098 105000 BLEU score = 0.2029 BEST BLEU is 0.2098 108000 BLEU score = 0.2222 BEST BLEU is 0.2098 111000 BLEU score = 0.1929 BEST BLEU is 0.2222 114000 BLEU score = 0.1951 BEST BLEU is 0.2222 117000 BLEU score = 0.2212 BEST BLEU is 0.2222 120000 BLEU score = 0.2111 BEST BLEU is 0.2222 123000 BLEU score = 0.1981 BEST BLEU is 0.2222 126000 BLEU score = 0.2054 BEST BLEU is 0.2222 129000 BLEU score = 0.2228 BEST BLEU is 0.2222 132000 BLEU score = 0.2250 BEST BLEU is 0.2228 135000 BLEU score = 0.2061 BEST BLEU is 0.225 138000 BLEU score = 0.2333 BEST BLEU is 0.225 141000 BLEU score = 0.2236 BEST BLEU is 0.2333 144000 BLEU score = 0.2123 BEST BLEU is 0.2333 147000 BLEU score = 0.2242 BEST BLEU is 0.2333 150000 BLEU score = 0.2120 BEST BLEU is 0.2333 153000 BLEU score = 0.2404 BEST BLEU is 0.2333 156000 BLEU score = 0.2348 BEST BLEU is 0.2404 159000 BLEU score = 0.2195 BEST BLEU is 0.2404 162000 BLEU score = 0.2383 BEST BLEU is 0.2404 165000 BLEU score = 0.2192 BEST BLEU is 0.2404 168000 BLEU score = 0.2240 BEST BLEU is 0.2404 171000 BLEU score = 0.2265 BEST BLEU is 0.2404 174000 BLEU score = 0.2211 BEST BLEU is 0.2404 177000 BLEU score = 0.2302 BEST BLEU is 0.2404 180000 BLEU score = 0.2360 BEST BLEU is 0.2404 183000 BLEU score = 0.2161 BEST BLEU is 0.2404 186000 BLEU score = 0.2316 BEST BLEU is 0.2404 189000 BLEU score = 0.2298 BEST BLEU is 0.2404 192000 BLEU score = 0.2316 BEST BLEU is 0.2404 195000 BLEU score = 0.2166 BEST BLEU is 0.2404 198000 BLEU score = 0.2350 BEST BLEU is 0.2404 201000 BLEU score = 0.2295 BEST BLEU is 0.2404 204000 BLEU score = 0.2456 BEST BLEU is 0.2404 207000 BLEU score = 0.2392 BEST BLEU is 0.2456 210000 BLEU score = 0.2311 BEST BLEU is 0.2456 213000 BLEU score = 0.2113 BEST BLEU is 0.2456 216000 BLEU score = 0.2223 BEST BLEU is 0.2456 219000 BLEU score = 0.2258 BEST BLEU is 0.2456 222000 BLEU score = 0.2304 BEST BLEU is 0.2456 225000 BLEU score = 0.2165 BEST BLEU is 0.2456 228000 BLEU score = 0.2336 BEST BLEU is 0.2456 231000 BLEU score = 0.2345 BEST BLEU is 0.2456 234000 BLEU score = 0.2444 BEST BLEU is 0.2456 237000 BLEU score = 0.2310 BEST BLEU is 0.2456 240000 BLEU score = 0.2406 BEST BLEU is 0.2456 243000 BLEU score = 0.2294 BEST BLEU is 0.2456 246000 BLEU score = 0.2469 BEST BLEU is 0.2456 249000 BLEU score = 0.2479 BEST BLEU is 0.2469 252000 BLEU score = 0.2464 BEST BLEU is 0.2479 255000 BLEU score = 0.2490 BEST BLEU is 0.2479 258000 BLEU score = 0.2406 BEST BLEU is 0.249 261000 BLEU score = 0.2477 BEST BLEU is 0.249 264000 BLEU score = 0.2392 BEST BLEU is 0.249 267000 BLEU score = 0.2516 BEST BLEU is 0.249 270000 BLEU score = 0.2521 BEST BLEU is 0.2516 273000 BLEU score = 0.2370 BEST BLEU is 0.2521 276000 BLEU score = 0.2431 BEST BLEU is 0.2521 279000 BLEU score = 0.2548 BEST BLEU is 0.2521 282000 BLEU score = 0.2605 BEST BLEU is 0.2548 285000 BLEU score = 0.2421 BEST BLEU is 0.2605 288000 BLEU score = 0.2446 BEST BLEU is 0.2605 291000 BLEU score = 0.2521 BEST BLEU is 0.2605 294000 BLEU score = 0.2529 BEST BLEU is 0.2605 297000 BLEU score = 0.2453 BEST BLEU is 0.2605 300000 BLEU score = 0.2361 BEST BLEU is 0.2605
13.2 系统性能还是没有我师弟的高。继续检查发现初始化init_context变量有问题。我为了图方便,用最后一个source annotation作为init context,我师弟和之前theano版本用的是mean pooling方法。Nematus: a Toolkit for Neural Machine Translation也提到了这个问题:We initialize the decoder hidden state with the mean of the source annotation, rather than the annotation at the last position of the encoder backward RNN. 似乎我那种做法是有问题的。
13.3
14. Dropout问题
http://blog.csdn.net/wangxinginnlp/article/details/72649820
*. GRU+Embedding的追溯
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/rnn_cell_impl.py class _RNNCell(object): """Abstract object representing an RNN cell. Every `RNNCell` must have the properties below and implement `__call__` with the following signature. This definition of cell differs from the definition used in the literature. In the literature, "cell" refers to an object with a single scalar output. This definition refers to a horizontal array of such units. An RNN cell, in the most abstract setting, is anything that has a state and performs some operation that takes a matrix of inputs. This operation results in an output matrix with `self.output_size` columns. If `self.state_size` is an integer, this operation also results in a new state matrix with `self.state_size` columns. If `self.state_size` is a tuple of integers, then it results in a tuple of `len(state_size)` state matrices, each with a column size corresponding to values in `state_size`. """ def __call__(self, inputs, state, scope=None): """Run this RNN cell on inputs, starting from the given state. Args: inputs: `2-D` tensor with shape `[batch_size x input_size]`. state: if `self.state_size` is an integer, this should be a `2-D Tensor` with shape `[batch_size x self.state_size]`. Otherwise, if `self.state_size` is a tuple of integers, this should be a tuple with shapes `[batch_size x s] for s in self.state_size`. scope: VariableScope for the created subgraph; defaults to class name. Returns: A pair containing: - Output: A `2-D` tensor with shape `[batch_size x self.output_size]`. - New state: Either a single `2-D` tensor, or a tuple of tensors matching the arity and shapes of `state`. """ raise NotImplementedError("Abstract method") @property def state_size(self): """size(s) of state(s) used by this cell. It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes. """ raise NotImplementedError("Abstract method") @property def output_size(self): """Integer or TensorShape: size of outputs produced by this cell.""" raise NotImplementedError("Abstract method") def zero_state(self, batch_size, dtype): """Return zero-filled state tensor(s). Args: batch_size: int, float, or unit Tensor representing the batch size. dtype: the data type to use for the state. Returns: If `state_size` is an int or TensorShape, then the return value is a `N-D` tensor of shape `[batch_size x state_size]` filled with zeros. If `state_size` is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of `2-D` tensors with the shapes `[batch_size x s]` for each s in `state_size`. """ with ops.name_scope(type(self).__name__ + "ZeroState", values=[batch_size]): state_size = self.state_size return _zero_state_tensors(state_size, batch_size, dtype) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py _BIAS_VARIABLE_NAME = "biases" _WEIGHTS_VARIABLE_NAME = "weights" """ matmul(): ultiplies matrix `a` by matrix `b`, producing `a` * `b`. concat(): Concatenates tensors along one dimension. """ def _linear(args, output_size, bias, bias_start=0.0): """Linear map: sum_i(args[i] * W[i]), where W[i] is a variable. Args: args: a 2D Tensor or a list of 2D, batch x n, Tensors. output_size: int, second dimension of W[i]. bias: boolean, whether to add a bias term or not. bias_start: starting value to initialize the bias; 0 by default. Returns: A 2D Tensor with shape [batch x output_size] equal to sum_i(args[i] * W[i]), where W[i]s are newly created matrices. Raises: ValueError: if some of the arguments has unspecified or wrong shape. """ if args is None or (nest.is_sequence(args) and not args): raise ValueError("`args` must be specified") if not nest.is_sequence(args): args = [args] # Calculate the total size of arguments on dimension 1. total_arg_size = 0 shapes = [a.get_shape() for a in args] for shape in shapes: if shape.ndims != 2: raise ValueError("linear is expecting 2D arguments: %s" % shapes) if shape[1].value is None: raise ValueError("linear expects shape[1] to be provided for shape %s, " "but saw %s" % (shape, shape[1])) else: total_arg_size += shape[1].value dtype = [a.dtype for a in args][0] # Now the computation. scope = vs.get_variable_scope() with vs.variable_scope(scope) as outer_scope: weights = vs.get_variable( _WEIGHTS_VARIABLE_NAME, [total_arg_size, output_size], dtype=dtype) if len(args) == 1: res = math_ops.matmul(args[0], weights) else: res = math_ops.matmul(array_ops.concat(args, 1), weights) if not bias: return res with vs.variable_scope(outer_scope) as inner_scope: inner_scope.set_partitioner(None) biases = vs.get_variable( _BIAS_VARIABLE_NAME, [output_size], dtype=dtype, initializer=init_ops.constant_initializer(bias_start, dtype=dtype)) return nn_ops.bias_add(res, biases) class GRUCell(RNNCell): """Gated Recurrent Unit cell (cf. http://arxiv.org/abs/1406.1078).""" def __init__(self, num_units, input_size=None, activation=tanh, reuse=None): if input_size is not None: logging.warn("%s: The input_size parameter is deprecated.", self) self._num_units = num_units self._activation = activation self._reuse = reuse @property def state_size(self): return self._num_units @property def output_size(self): return self._num_units def __call__(self, inputs, state, scope=None): """Gated recurrent unit (GRU) with nunits cells.""" with _checked_scope(self, scope or "gru_cell", reuse=self._reuse): with vs.variable_scope("gates"): # Reset gate and update gate. # We start with bias of 1.0 to not reset and not update. value = sigmoid(_linear( [inputs, state], 2 * self._num_units, True, 1.0)) r, u = array_ops.split( value=value, num_or_size_splits=2, axis=1) with vs.variable_scope("candidate"): c = self._activation(_linear([inputs, r * state], self._num_units, True)) new_h = u * state + (1 - u) * c return new_h, new_h https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py class EmbeddingWrapper(RNNCell): """Operator adding input embedding to the given cell. Note: in many cases it may be more efficient to not use this wrapper, but instead concatenate the whole sequence of your inputs in time, do the embedding on this batch-concatenated sequence, then split it and feed into your RNN. """ def __init__(self, cell, embedding_classes, embedding_size, initializer=None, reuse=None): """Create a cell with an added input embedding. Args: cell: an RNNCell, an embedding will be put before its inputs. embedding_classes: integer, how many symbols will be embedded. embedding_size: integer, the size of the vectors we embed into. initializer: an initializer to use when creating the embedding; if None, the initializer from variable scope or a default one is used. reuse: (optional) Python boolean describing whether to reuse variables in an existing scope. If not `True`, and the existing scope already has the given variables, an error is raised. Raises: TypeError: if cell is not an RNNCell. ValueError: if embedding_classes is not positive. """ if not isinstance(cell, RNNCell): raise TypeError("The parameter cell is not RNNCell.") if embedding_classes <= 0 or embedding_size <= 0: raise ValueError("Both embedding_classes and embedding_size must be > 0: " "%d, %d." % (embedding_classes, embedding_size)) self._cell = cell self._embedding_classes = embedding_classes self._embedding_size = embedding_size self._initializer = initializer self._reuse = reuse @property def state_size(self): return self._cell.state_size @property def output_size(self): return self._cell.output_size def zero_state(self, batch_size, dtype): with ops.name_scope(type(self).__name__ + "ZeroState", values=[batch_size]): return self._cell.zero_state(batch_size, dtype) def __call__(self, inputs, state, scope=None): """Run the cell on embedded inputs.""" with _checked_scope(self, scope or "embedding_wrapper", reuse=self._reuse): with ops.device("/cpu:0"): if self._initializer: initializer = self._initializer elif vs.get_variable_scope().initializer: initializer = vs.get_variable_scope().initializer else: # Default initializer for embeddings should have variance=1. sqrt3 = math.sqrt(3) # Uniform(-sqrt(3), sqrt(3)) has variance=1. initializer = init_ops.random_uniform_initializer(-sqrt3, sqrt3) if type(state) is tuple: data_type = state[0].dtype else: data_type = state.dtype embedding = vs.get_variable( "embedding", [self._embedding_classes, self._embedding_size], initializer=initializer, dtype=data_type) embedded = embedding_ops.embedding_lookup( embedding, array_ops.reshape(inputs, [-1])) return self._cell(embedded, state) https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/rnn/python/ops/core_rnn.py def static_rnn(cell, inputs, initial_state=None, dtype=None, sequence_length=None, scope=None): """Creates a recurrent neural network specified by RNNCell `cell`. The simplest form of RNN network generated is: ```python state = cell.zero_state(...) outputs = [] for input_ in inputs: output, state = cell(input_, state) outputs.append(output) return (outputs, state) ``` However, a few other options are available: An initial state can be provided. If the sequence_length vector is provided, dynamic calculation is performed. This method of calculation does not compute the RNN steps past the maximum sequence length of the minibatch (thus saving computational time), and properly propagates the state at an example"s sequence length to the final state output. The dynamic calculation performed is, at time `t` for batch row `b`, ```python (output, state)(b, t) = (t >= sequence_length(b)) ? (zeros(cell.output_size), states(b, sequence_length(b) - 1)) : cell(input(b, t), state(b, t - 1)) ``` Args: cell: An instance of RNNCell. inputs: A length T list of inputs, each a `Tensor` of shape `[batch_size, input_size]`, or a nested tuple of such elements. initial_state: (optional) An initial state for the RNN. If `cell.state_size` is an integer, this must be a `Tensor` of appropriate type and shape `[batch_size, cell.state_size]`. If `cell.state_size` is a tuple, this should be a tuple of tensors having shapes `[batch_size, s] for s in cell.state_size`. dtype: (optional) The data type for the initial state and expected output. Required if initial_state is not provided or RNN state has a heterogeneous dtype. sequence_length: Specifies the length of each sequence in inputs. An int32 or int64 vector (tensor) size `[batch_size]`, values in `[0, T)`. scope: VariableScope for the created subgraph; defaults to "rnn". Returns: A pair (outputs, state) where: - outputs is a length T list of outputs (one for each input), or a nested tuple of such elements. - state is the final state Raises: TypeError: If `cell` is not an instance of RNNCell. ValueError: If `inputs` is `None` or an empty list, or if the input depth (column size) cannot be inferred from inputs via shape inference. """ if not isinstance(cell, core_rnn_cell.RNNCell): raise TypeError("cell must be an instance of RNNCell") if not nest.is_sequence(inputs): raise TypeError("inputs must be a sequence") if not inputs: raise ValueError("inputs must not be empty") outputs = [] # Create a new scope in which the caching device is either # determined by the parent scope, or is set to place the cached # Variable using the same placement as for the rest of the RNN. with vs.variable_scope(scope or "rnn") as varscope: if varscope.caching_device is None: varscope.set_caching_device(lambda op: op.device) # Obtain the first sequence of the input first_input = inputs while nest.is_sequence(first_input): first_input = first_input[0] # Temporarily avoid EmbeddingWrapper and seq2seq badness # TODO(lukaszkaiser): remove EmbeddingWrapper if first_input.get_shape().ndims != 1: input_shape = first_input.get_shape().with_rank_at_least(2) fixed_batch_size = input_shape[0] flat_inputs = nest.flatten(inputs) for flat_input in flat_inputs: input_shape = flat_input.get_shape().with_rank_at_least(2) batch_size, input_size = input_shape[0], input_shape[1:] fixed_batch_size.merge_with(batch_size) for i, size in enumerate(input_size): if size.value is None: raise ValueError( "Input size (dimension %d of inputs) must be accessible via " "shape inference, but saw value None." % i) else: fixed_batch_size = first_input.get_shape().with_rank_at_least(1)[0] if fixed_batch_size.value: batch_size = fixed_batch_size.value else: batch_size = array_ops.shape(first_input)[0] if initial_state is not None: state = initial_state else: if not dtype: raise ValueError("If no initial_state is provided, " "dtype must be specified") state = cell.zero_state(batch_size, dtype) if sequence_length is not None: # Prepare variables sequence_length = ops.convert_to_tensor( sequence_length, name="sequence_length") if sequence_length.get_shape().ndims not in (None, 1): raise ValueError( "sequence_length must be a vector of length batch_size") def _create_zero_output(output_size): # convert int to TensorShape if necessary size = _state_size_with_prefix(output_size, prefix=[batch_size]) output = array_ops.zeros( array_ops.stack(size), _infer_state_dtype(dtype, state)) shape = _state_size_with_prefix( output_size, prefix=[fixed_batch_size.value]) output.set_shape(tensor_shape.TensorShape(shape)) return output output_size = cell.output_size flat_output_size = nest.flatten(output_size) flat_zero_output = tuple( _create_zero_output(size) for size in flat_output_size) zero_output = nest.pack_sequence_as(structure=output_size, flat_sequence=flat_zero_output) sequence_length = math_ops.to_int32(sequence_length) min_sequence_length = math_ops.reduce_min(sequence_length) max_sequence_length = math_ops.reduce_max(sequence_length) for time, input_ in enumerate(inputs): if time > 0: varscope.reuse_variables() # pylint: disable=cell-var-from-loop call_cell = lambda: cell(input_, state) # pylint: enable=cell-var-from-loop if sequence_length is not None: (output, state) = _rnn_step( time=time, sequence_length=sequence_length, min_sequence_length=min_sequence_length, max_sequence_length=max_sequence_length, zero_output=zero_output, state=state, call_cell=call_cell, state_size=cell.state_size) else: (output, state) = call_cell() outputs.append(output) return (outputs, state)
*. embedding_attention_seq2seq追溯
embedding_attention_decoder -> attention_decoder
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py def attention_decoder(decoder_inputs, initial_state, attention_states, cell, output_size=None, num_heads=1, loop_function=None, dtype=None, scope=None, initial_state_attention=False): """RNN decoder with attention for the sequence-to-sequence model. In this context "attention" means that, during decoding, the RNN can look up information in the additional tensor attention_states, and it does this by focusing on a few entries from the tensor. This model has proven to yield especially good results in a number of sequence-to-sequence tasks. This implementation is based on http://arxiv.org/abs/1412.7449 (see below for details). It is recommended for complex sequence-to-sequence tasks. Args: decoder_inputs: A list of 2D Tensors [batch_size x input_size]. initial_state: 2D Tensor [batch_size x cell.state_size]. attention_states: 3D Tensor [batch_size x attn_length x attn_size]. cell: core_rnn_cell.RNNCell defining the cell function and size. output_size: Size of the output vectors; if None, we use cell.output_size. num_heads: Number of attention heads that read from attention_states. loop_function: If not None, this function will be applied to i-th output in order to generate i+1-th input, and decoder_inputs will be ignored, except for the first element ("GO" symbol). This can be used for decoding, but also for training to emulate http://arxiv.org/abs/1506.03099. Signature -- loop_function(prev, i) = next * prev is a 2D Tensor of shape [batch_size x output_size], * i is an integer, the step number (when advanced control is needed), * next is a 2D Tensor of shape [batch_size x input_size]. dtype: The dtype to use for the RNN initial state (default: tf.float32). scope: VariableScope for the created subgraph; default: "attention_decoder". initial_state_attention: If False (default), initial attentions are zero. If True, initialize the attentions from the initial state and attention states -- useful when we wish to resume decoding from a previously stored decoder state and attention states. Returns: A tuple of the form (outputs, state), where: outputs: A list of the same length as decoder_inputs of 2D Tensors of shape [batch_size x output_size]. These represent the generated outputs. Output i is computed from input i (which is either the i-th element of decoder_inputs or loop_function(output {i-1}, i)) as follows. First, we run the cell on a combination of the input and previous attention masks: cell_output, new_state = cell(linear(input, prev_attn), prev_state). Then, we calculate new attention masks: new_attn = softmax(V^T * tanh(W * attention_states + U * new_state)) and then we calculate the output: output = linear(cell_output, new_attn). state: The state of each decoder cell the final time-step. It is a 2D Tensor of shape [batch_size x cell.state_size]. Raises: ValueError: when num_heads is not positive, there are no inputs, shapes of attention_states are not set, or input size cannot be inferred from the input. """ if not decoder_inputs: raise ValueError("Must provide at least 1 input to attention decoder.") if num_heads < 1: raise ValueError("With less than 1 heads, use a non-attention decoder.") if attention_states.get_shape()[2].value is None: raise ValueError("Shape[2] of attention_states must be known: %s" % attention_states.get_shape()) if output_size is None: output_size = cell.output_size with variable_scope.variable_scope( scope or "attention_decoder", dtype=dtype) as scope: dtype = scope.dtype batch_size = array_ops.shape(decoder_inputs[0])[0] # Needed for reshaping. attn_length = attention_states.get_shape()[1].value if attn_length is None: attn_length = array_ops.shape(attention_states)[1] attn_size = attention_states.get_shape()[2].value # To calculate W1 * h_t we use a 1-by-1 convolution, need to reshape before. hidden = array_ops.reshape(attention_states, [-1, attn_length, 1, attn_size]) hidden_features = [] v = [] attention_vec_size = attn_size # Size of query vectors for attention. for a in xrange(num_heads): k = variable_scope.get_variable("AttnW_%d" % a, [1, 1, attn_size, attention_vec_size]) hidden_features.append(nn_ops.conv2d(hidden, k, [1, 1, 1, 1], "SAME")) v.append( variable_scope.get_variable("AttnV_%d" % a, [attention_vec_size])) state = initial_state def attention(query): """Put attention masks on hidden using hidden_features and query.""" ds = [] # Results of attention reads will be stored here. if nest.is_sequence(query): # If the query is a tuple, flatten it. query_list = nest.flatten(query) for q in query_list: # Check that ndims == 2 if specified. ndims = q.get_shape().ndims if ndims: assert ndims == 2 query = array_ops.concat(query_list, 1) for a in xrange(num_heads): with variable_scope.variable_scope("Attention_%d" % a): y = linear(query, attention_vec_size, True) y = array_ops.reshape(y, [-1, 1, 1, attention_vec_size]) # Attention mask is a softmax of v^T * tanh(...). s = math_ops.reduce_sum(v[a] * math_ops.tanh(hidden_features[a] + y), [2, 3]) a = nn_ops.softmax(s) # Now calculate the attention-weighted vector d. d = math_ops.reduce_sum( array_ops.reshape(a, [-1, attn_length, 1, 1]) * hidden, [1, 2]) ds.append(array_ops.reshape(d, [-1, attn_size])) return ds outputs = [] prev = None batch_attn_size = array_ops.stack([batch_size, attn_size]) attns = [ array_ops.zeros( batch_attn_size, dtype=dtype) for _ in xrange(num_heads) ] for a in attns: # Ensure the second shape of attention vectors is set. a.set_shape([None, attn_size]) if initial_state_attention: attns = attention(initial_state) for i, inp in enumerate(decoder_inputs): if i > 0: variable_scope.get_variable_scope().reuse_variables() # If loop_function is set, we use it instead of decoder_inputs. if loop_function is not None and prev is not None: with variable_scope.variable_scope("loop_function", reuse=True): inp = loop_function(prev, i) # Merge input and previous attentions into one vector of the right size. input_size = inp.get_shape().with_rank(2)[1] if input_size.value is None: raise ValueError("Could not infer input size from input: %s" % inp.name) x = linear([inp] + attns, input_size, True) # Run the RNN. cell_output, state = cell(x, state) # Run the attention mechanism. if i == 0 and initial_state_attention: with variable_scope.variable_scope( variable_scope.get_variable_scope(), reuse=True): attns = attention(state) else: attns = attention(state) with variable_scope.variable_scope("AttnOutputProjection"): output = linear([cell_output] + attns, output_size, True) if loop_function is not None: prev = output outputs.append(output) return outputs, state def embedding_attention_decoder(decoder_inputs, initial_state, attention_states, cell, num_symbols, embedding_size, num_heads=1, output_size=None, output_projection=None, feed_previous=False, update_embedding_for_previous=True, dtype=None, scope=None, initial_state_attention=False): """RNN decoder with embedding and attention and a pure-decoding option. Args: decoder_inputs: A list of 1D batch-sized int32 Tensors (decoder inputs). initial_state: 2D Tensor [batch_size x cell.state_size]. attention_states: 3D Tensor [batch_size x attn_length x attn_size]. cell: core_rnn_cell.RNNCell defining the cell function. num_symbols: Integer, how many symbols come into the embedding. embedding_size: Integer, the length of the embedding vector for each symbol. num_heads: Number of attention heads that read from attention_states. output_size: Size of the output vectors; if None, use output_size. output_projection: None or a pair (W, B) of output projection weights and biases; W has shape [output_size x num_symbols] and B has shape [num_symbols]; if provided and feed_previous=True, each fed previous output will first be multiplied by W and added B. feed_previous: Boolean; if True, only the first of decoder_inputs will be used (the "GO" symbol), and all other decoder inputs will be generated by: next = embedding_lookup(embedding, argmax(previous_output)), In effect, this implements a greedy decoder. It can also be used during training to emulate http://arxiv.org/abs/1506.03099. If False, decoder_inputs are used as given (the standard decoder case). update_embedding_for_previous: Boolean; if False and feed_previous=True, only the embedding for the first symbol of decoder_inputs (the "GO" symbol) will be updated by back propagation. Embeddings for the symbols generated from the decoder itself remain unchanged. This parameter has no effect if feed_previous=False. dtype: The dtype to use for the RNN initial states (default: tf.float32). scope: VariableScope for the created subgraph; defaults to "embedding_attention_decoder". initial_state_attention: If False (default), initial attentions are zero. If True, initialize the attentions from the initial state and attention states -- useful when we wish to resume decoding from a previously stored decoder state and attention states. Returns: A tuple of the form (outputs, state), where: outputs: A list of the same length as decoder_inputs of 2D Tensors with shape [batch_size x output_size] containing the generated outputs. state: The state of each decoder cell at the final time-step. It is a 2D Tensor of shape [batch_size x cell.state_size]. Raises: ValueError: When output_projection has the wrong shape. """ if output_size is None: output_size = cell.output_size if output_projection is not None: proj_biases = ops.convert_to_tensor(output_projection[1], dtype=dtype) proj_biases.get_shape().assert_is_compatible_with([num_symbols]) with variable_scope.variable_scope( scope or "embedding_attention_decoder", dtype=dtype) as scope: embedding = variable_scope.get_variable("embedding", [num_symbols, embedding_size]) loop_function = _extract_argmax_and_embed( embedding, output_projection, update_embedding_for_previous) if feed_previous else None emb_inp = [ embedding_ops.embedding_lookup(embedding, i) for i in decoder_inputs ] return attention_decoder( emb_inp, initial_state, attention_states, cell, output_size=output_size, num_heads=num_heads, loop_function=loop_function, initial_state_attention=initial_state_attention) def embedding_attention_seq2seq(encoder_inputs, decoder_inputs, cell, num_encoder_symbols, num_decoder_symbols, embedding_size, num_heads=1, output_projection=None, feed_previous=False, dtype=None, scope=None, initial_state_attention=False): """Embedding sequence-to-sequence model with attention. This model first embeds encoder_inputs by a newly created embedding (of shape [num_encoder_symbols x input_size]). Then it runs an RNN to encode embedded encoder_inputs into a state vector. It keeps the outputs of this RNN at every step to use for attention later. Next, it embeds decoder_inputs by another newly created embedding (of shape [num_decoder_symbols x input_size]). Then it runs attention decoder, initialized with the last encoder state, on embedded decoder_inputs and attending to encoder outputs. Warning: when output_projection is None, the size of the attention vectors and variables will be made proportional to num_decoder_symbols, can be large. Args: encoder_inputs: A list of 1D int32 Tensors of shape [batch_size]. decoder_inputs: A list of 1D int32 Tensors of shape [batch_size]. cell: core_rnn_cell.RNNCell defining the cell function and size. num_encoder_symbols: Integer; number of symbols on the encoder side. num_decoder_symbols: Integer; number of symbols on the decoder side. embedding_size: Integer, the length of the embedding vector for each symbol. num_heads: Number of attention heads that read from attention_states. output_projection: None or a pair (W, B) of output projection weights and biases; W has shape [output_size x num_decoder_symbols] and B has shape [num_decoder_symbols]; if provided and feed_previous=True, each fed previous output will first be multiplied by W and added B. feed_previous: Boolean or scalar Boolean Tensor; if True, only the first of decoder_inputs will be used (the "GO" symbol), and all other decoder inputs will be taken from previous outputs (as in embedding_rnn_decoder). If False, decoder_inputs are used as given (the standard decoder case). dtype: The dtype of the initial RNN state (default: tf.float32). scope: VariableScope for the created subgraph; defaults to "embedding_attention_seq2seq". initial_state_attention: If False (default), initial attentions are zero. If True, initialize the attentions from the initial state and attention states. Returns: A tuple of the form (outputs, state), where: outputs: A list of the same length as decoder_inputs of 2D Tensors with shape [batch_size x num_decoder_symbols] containing the generated outputs. state: The state of each decoder cell at the final time-step. It is a 2D Tensor of shape [batch_size x cell.state_size]. """ with variable_scope.variable_scope( scope or "embedding_attention_seq2seq", dtype=dtype) as scope: dtype = scope.dtype # Encoder. encoder_cell = copy.deepcopy(cell) encoder_cell = core_rnn_cell.EmbeddingWrapper( encoder_cell, embedding_classes=num_encoder_symbols, embedding_size=embedding_size) encoder_outputs, encoder_state = core_rnn.static_rnn( encoder_cell, encoder_inputs, dtype=dtype) # First calculate a concatenation of encoder outputs to put attention on. top_states = [ array_ops.reshape(e, [-1, 1, cell.output_size]) for e in encoder_outputs ] attention_states = array_ops.concat(top_states, 1) # Decoder. output_size = None if output_projection is None: cell = core_rnn_cell.OutputProjectionWrapper(cell, num_decoder_symbols) output_size = num_decoder_symbols if isinstance(feed_previous, bool): return embedding_attention_decoder( decoder_inputs, encoder_state, attention_states, cell, num_decoder_symbols, embedding_size, num_heads=num_heads, output_size=output_size, output_projection=output_projection, feed_previous=feed_previous, initial_state_attention=initial_state_attention) # If feed_previous is a Tensor, we construct 2 graphs and use cond. def decoder(feed_previous_bool): reuse = None if feed_previous_bool else True with variable_scope.variable_scope( variable_scope.get_variable_scope(), reuse=reuse): outputs, state = embedding_attention_decoder( decoder_inputs, encoder_state, attention_states, cell, num_decoder_symbols, embedding_size, num_heads=num_heads, output_size=output_size, output_projection=output_projection, feed_previous=feed_previous_bool, update_embedding_for_previous=False, initial_state_attention=initial_state_attention) state_list = [state] if nest.is_sequence(state): state_list = nest.flatten(state) return outputs + state_list outputs_and_state = control_flow_ops.cond(feed_previous, lambda: decoder(True), lambda: decoder(False)) outputs_len = len(decoder_inputs) # Outputs length same as decoder inputs. state_list = outputs_and_state[outputs_len:] state = state_list[0] if nest.is_sequence(encoder_state): state = nest.pack_sequence_as( structure=encoder_state, flat_sequence=state_list) return outputs_and_state[:outputs_len], state
6. model_with_buckets追溯
def sequence_loss_by_example(logits, targets, weights, average_across_timesteps=True, softmax_loss_function=None, name=None): """Weighted cross-entropy loss for a sequence of logits (per example). Args: logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols]. targets: List of 1D batch-sized int32 Tensors of the same length as logits. weights: List of 1D batch-sized float-Tensors of the same length as logits. average_across_timesteps: If set, divide the returned cost by the total label weight. softmax_loss_function: Function (labels, logits) -> loss-batch to be used instead of the standard softmax (the default if this is None). **Note that to avoid confusion, it is required for the function to accept named arguments.** name: Optional name for this operation, default: "sequence_loss_by_example". Returns: 1D batch-sized float Tensor: The log-perplexity for each sequence. Raises: ValueError: If len(logits) is different from len(targets) or len(weights). """ if len(targets) != len(logits) or len(weights) != len(logits): raise ValueError("Lengths of logits, weights, and targets must be the same " "%d, %d, %d." % (len(logits), len(weights), len(targets))) with ops.name_scope(name, "sequence_loss_by_example", logits + targets + weights): log_perp_list = [] for logit, target, weight in zip(logits, targets, weights): if softmax_loss_function is None: # TODO(irving,ebrevdo): This reshape is needed because # sequence_loss_by_example is called with scalars sometimes, which # violates our general scalar strictness policy. target = array_ops.reshape(target, [-1]) crossent = nn_ops.sparse_softmax_cross_entropy_with_logits( labels=target, logits=logit) else: crossent = softmax_loss_function(labels=target, logits=logit) log_perp_list.append(crossent * weight) log_perps = math_ops.add_n(log_perp_list) if average_across_timesteps: total_size = math_ops.add_n(weights) total_size += 1e-12 # Just to avoid division by 0 for all-0 weights. log_perps /= total_size return log_perps
def sequence_loss(logits, targets, weights, average_across_timesteps=True, average_across_batch=True, softmax_loss_function=None, name=None): """Weighted cross-entropy loss for a sequence of logits, batch-collapsed. Args: logits: List of 2D Tensors of shape [batch_size x num_decoder_symbols]. targets: List of 1D batch-sized int32 Tensors of the same length as logits. weights: List of 1D batch-sized float-Tensors of the same length as logits. average_across_timesteps: If set, divide the returned cost by the total label weight. average_across_batch: If set, divide the returned cost by the batch size. softmax_loss_function: Function (labels, logits) -> loss-batch to be used instead of the standard softmax (the default if this is None). **Note that to avoid confusion, it is required for the function to accept named arguments.** name: Optional name for this operation, defaults to "sequence_loss". Returns: A scalar float Tensor: The average log-perplexity per symbol (weighted). Raises: ValueError: If len(logits) is different from len(targets) or len(weights). """ with ops.name_scope(name, "sequence_loss", logits + targets + weights): cost = math_ops.reduce_sum( sequence_loss_by_example( logits, targets, weights, average_across_timesteps=average_across_timesteps, softmax_loss_function=softmax_loss_function)) if average_across_batch: batch_size = array_ops.shape(targets[0])[0] return cost / math_ops.cast(batch_size, cost.dtype) else: return cost def model_with_buckets(encoder_inputs, decoder_inputs, targets, weights, buckets, seq2seq, softmax_loss_function=None, per_example_loss=False, name=None): """Create a sequence-to-sequence model with support for bucketing. The seq2seq argument is a function that defines a sequence-to-sequence model, e.g., seq2seq = lambda x, y: basic_rnn_seq2seq( x, y, core_rnn_cell.GRUCell(24)) Args: encoder_inputs: A list of Tensors to feed the encoder; first seq2seq input. decoder_inputs: A list of Tensors to feed the decoder; second seq2seq input. targets: A list of 1D batch-sized int32 Tensors (desired output sequence). weights: List of 1D batch-sized float-Tensors to weight the targets. buckets: A list of pairs of (input size, output size) for each bucket. seq2seq: A sequence-to-sequence model function; it takes 2 input that agree with encoder_inputs and decoder_inputs, and returns a pair consisting of outputs and states (as, e.g., basic_rnn_seq2seq). softmax_loss_function: Function (labels, logits) -> loss-batch to be used instead of the standard softmax (the default if this is None). **Note that to avoid confusion, it is required for the function to accept named arguments.** per_example_loss: Boolean. If set, the returned loss will be a batch-sized tensor of losses for each sequence in the batch. If unset, it will be a scalar with the averaged loss from all examples. name: Optional name for this operation, defaults to "model_with_buckets". Returns: A tuple of the form (outputs, losses), where: outputs: The outputs for each bucket. Its j"th element consists of a list of 2D Tensors. The shape of output tensors can be either [batch_size x output_size] or [batch_size x num_decoder_symbols] depending on the seq2seq model used. losses: List of scalar Tensors, representing losses for each bucket, or, if per_example_loss is set, a list of 1D batch-sized float Tensors. Raises: ValueError: If length of encoder_inputs, targets, or weights is smaller than the largest (last) bucket. """ if len(encoder_inputs) < buckets[-1][0]: raise ValueError("Length of encoder_inputs (%d) must be at least that of la" "st bucket (%d)." % (len(encoder_inputs), buckets[-1][0])) if len(targets) < buckets[-1][1]: raise ValueError("Length of targets (%d) must be at least that of last" "bucket (%d)." % (len(targets), buckets[-1][1])) if len(weights) < buckets[-1][1]: raise ValueError("Length of weights (%d) must be at least that of last" "bucket (%d)." % (len(weights), buckets[-1][1])) all_inputs = encoder_inputs + decoder_inputs + targets + weights losses = [] outputs = [] with ops.name_scope(name, "model_with_buckets", all_inputs): for j, bucket in enumerate(buckets): with variable_scope.variable_scope( variable_scope.get_variable_scope(), reuse=True if j > 0 else None): bucket_outputs, _ = seq2seq(encoder_inputs[:bucket[0]], decoder_inputs[:bucket[1]]) outputs.append(bucket_outputs) if per_example_loss: losses.append( sequence_loss_by_example( outputs[-1], targets[:bucket[1]], weights[:bucket[1]], softmax_loss_function=softmax_loss_function)) else: losses.append( sequence_loss( outputs[-1], targets[:bucket[1]], weights[:bucket[1]], softmax_loss_function=softmax_loss_function)) return outputs, losses
- 上一篇:没有了
- 下一篇:没有了