2732 shaares
32 private links
32 private links
light on details
"DeepMind's version of reinforcement learning that uses "temporal value transport" to send a signal from reward backward, to shape actions, does better than alternative forms of neural networks. Here, the "TVT" program is compared to "Long-short-term memory," or LSTM, neural networks, with and without memory, and a basic reconstructive memory agent."