32 private links
Abstract page for arXiv paper 2505.03335: Absolute Zero: Reinforced Self-play Reasoning with Zero Data
This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.
SPRING is an LLM-based policy that outperforms Reinforcement Learning algorithms in an interactive environment requiring multi-task planning and reasoning. A group of researchers from Carnegie Mellon University, NVIDIA, Ariel University, and Microsoft have investigated the use of Large Language Models (LLMs) for understanding and reasoning with human knowledge in the context of games. They propose a two-stage approach called SPRING, which involves studying an academic paper and then using a Question-Answer (QA) framework to justify the knowledge obtained. More details about SPRING In the first stage, the authors read the LaTeX source code of the original paper by Hafner (2021)
"Finally, we believe that more powerful AI-designed hardware will fuel advances in AI, creating a symbiotic relationship between the two fields."
We are not just going to solve another reinforcement learning environment but going to create one from scratch.
light on details
"DeepMind's version of reinforcement learning that uses "temporal value transport" to send a signal from reward backward, to shape actions, does better than alternative forms of neural networks. Here, the "TVT" program is compared to "Long-short-term memory," or LSTM, neural networks, with and without memory, and a basic reconstructive memory agent."
AlphaStock fully exploits the interrelationship among stocks, and
opens a door for solving the “black box” problem of using deep learning models in financial markets. The back-testing and simulation experiments over U.S. and Chinese stock markets showed that
AlphaStock performed much better than other competing strategies. Interestingly, AlphaStock suggests buying stocks with high long-term growth, low volatility, high intrinsic value, and being
undervalued recently.