32 private links
Included in the search and seizure warrant for the raid on Natanson’s home is a section titled “Biometric Unlock,” which explicitly authorized law enforcement personnel to obtain Natanson’s phone and both hold the device in front of her face and to forcibly use her fingers to unlock it. In other words, a judge gave the FBI permission to attempt to bypass biometrics: the convenient shortcuts that let you unlock your phone by scanning your fingerprint or face.
OpenAI recently introduced their bespoke in-house AI data agent, a GPT-5.2-powered tool designed to help employees navigate and analyze over 600 petabytes of internal data across 70,000 datasets. By translating natural language questions into complex data insights in minutes, the agent enables teams across the company to bypass manual SQL debugging and quickly make data-driven decisions.
Meanwhile, longtime government contractor Palantir was paid $30 million to extend a contract to build a system designed to locate people flagged for deportation. On Wednesday, the Trump administration disclosed it’s using Palantir’s AI models to sift through immigration enforcement tips submitted to its tip line.
TikTok users in the US have reported being unable to write the word ‘Epstein’ in messages amid accusations that the social media platform is suppressing content critical of President Donald Trump.
"China reportedly coordinated the whole operation," the post reads. "The CIA oversaw it, the FBI covered it up, all to install Biden as a puppet."
This list bridges the Transformer foundations
with the reasoning, MoE, and agentic shift
Recommended Reading Order
-
Attention Is All You Need (Vaswani et al., 2017)
The original Transformer paper. Covers self-attention,
multi-head attention, and the encoder-decoder structure
(even though most modern LLMs are decoder-only.) -
The Illustrated Transformer (Jay Alammar, 2018)
Great intuition builder for understanding
attention and tensor flow before diving into implementations -
BERT: Pre-training of Deep Bidirectional Transformers (Devlin et al., 2018)
Encoder-side fundamentals, masked language modeling,
and representation learning that still shape modern architectures -
Language Models are Few-Shot Learners (GPT-3) (Brown et al., 2020)
Established in-context learning as a real
capability and shifted how prompting is understood -
Scaling Laws for Neural Language Models (Kaplan et al., 2020)
First clean empirical scaling framework for parameters, data, and compute
Read alongside Chinchilla to understand why most models were undertrained -
Training Compute-Optimal Large Language Models (Chinchilla) (Hoffmann et al., 2022)
Demonstrated that token count matters more than
parameter count for a fixed compute budget -
LLaMA: Open and Efficient Foundation Language Models (Touvron et al., 2023)
The paper that triggered the open-weight era
Introduced architectural defaults like RMSNorm, SwiGLU
and RoPE as standard practice -
RoFormer: Rotary Position Embedding (Su et al., 2021)
Positional encoding that became the modern default for long-context LLMs
-
FlashAttention (Dao et al., 2022)
Memory-efficient attention that enabled long context windows
and high-throughput inference by optimizing GPU memory access. -
Retrieval-Augmented Generation (RAG) (Lewis et al., 2020)
Combines parametric models with external knowledge sources
Foundational for grounded and enterprise systems -
Training Language Models to Follow Instructions with Human Feedback (InstructGPT) (Ouyang et al., 2022)
The modern post-training and alignment blueprint
that instruction-tuned models follow -
Direct Preference Optimization (DPO) (Rafailov et al., 2023)
A simpler and more stable alternative to PPO-based RLHF
Preference alignment via the loss function -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (Wei et al., 2022)
Demonstrated that reasoning can be elicited through prompting
alone and laid the groundwork for later reasoning-focused training -
ReAct: Reasoning and Acting (Yao et al., 2022 / ICLR 2023)
The foundation of agentic systems
Combines reasoning traces with tool use and environment interaction -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning (Guo et al., 2025)
The R1 paper. Proved that large-scale reinforcement learning without
supervised data can induce self-verification and structured reasoning behavior -
Qwen3 Technical Report (Yang et al., 2025)
A modern architecture lightweight overview
Introduced unified MoE with Thinking Mode and Non-Thinking
Mode to dynamically trade off cost and reasoning depth -
Outrageously Large Neural Networks: Sparsely-Gated Mixture of Experts (Shazeer et al., 2017)
The modern MoE ignition point
Conditional computation at scale -
Switch Transformers (Fedus et al., 2021)
Simplified MoE routing using single-expert activation
Key to stabilizing trillion-parameter training -
Mixtral of Experts (Mistral AI, 2024)
Open-weight MoE that proved sparse models can match dense quality
while running at small-model inference cost -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints (Komatsuzaki et al., 2022 / ICLR 2023)
Practical technique for converting dense checkpoints into MoE models
Critical for compute reuse and iterative scaling -
The Platonic Representation Hypothesis (Huh et al., 2024)
Evidence that scaled models converge toward shared
internal representations across modalities -
Textbooks Are All You Need (Gunasekar et al., 2023)
Demonstrated that high-quality synthetic data allows
small models to outperform much larger ones -
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet (Templeton et al., 2024)
The biggest leap in mechanistic interpretability
Decomposes neural networks into millions of interpretable features -
PaLM: Scaling Language Modeling with Pathways (Chowdhery et al., 2022)
A masterclass in large-scale training
orchestration across thousands of accelerators -
GLaM: Generalist Language Model (Du et al., 2022)
Validated MoE scaling economics with massive
total parameters but small active parameter counts -
The Smol Training Playbook (Hugging Face, 2025)
Practical end-to-end handbook for efficiently training language models
Bonus Material
T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2019)
Toolformer (Schick et al., 2023)
GShard (Lepikhin et al., 2020)
Adaptive Mixtures of Local Experts (Jacobs et al., 1991)
Hierarchical Mixtures of Experts (Jordan and Jacobs, 1994)
If you deeply understand these fundamentals; Transformer core, scaling laws, FlashAttention, instruction tuning, R1-style reasoning, and MoE upcycling, you already understand LLMs better than most
Time to lock-in, good luck ;)
If one person with one agent can produce equal or better results than "hundreds of agents for weeks", then the answer to the question: "Can we scale autonomous coding by throwing more agents at a problem?", probably has a more pessimistic answer than some expected.
When materials become just one atom thick, melting no longer follows the familiar rules. Instead of jumping straight from solid to liquid, an unusual in-between state emerges, where atomic positions loosen like a liquid but still keep some solid-like order. Scientists at the University of Vienna have now captured this elusive “hexatic” phase in real time by filming an ultra-thin silver iodide crystal as it melted inside a protective graphene sandwich.
Mai Trinh is highlighting how difficult it is for Gen Z entrepreneurs to build and scale tech startups in Canada, and why many ultimately move south of the border.
As an international student from Vietnam and her co-founder Gabriel Ravacci, from Brazil, Trinh explained the two would need to work for other employers to collect enough points under Canada’s Comprehensive Ranking System to qualify for permanent residency.
Teens can’t switch off from Instagram even if they want to. Teens talk of Instagram in terms of an ‘addicts narrative’ spending too much time indulging in a compulsive behaviour that they know is negative but feel powerless to resist.
“negative wellbeing effects can result from user behaviors” and documenting four video-watching behaviors that bring about the majority of negative wellbeing effects: (1) late night use, (2) heavy habitual use, (3) unintentional use, and (4) problematic content.
despite the rules that don’t allow those under the age of 13 to be on Snapchat, our focus group clearly showed that the middle school set was a rabid – almost exclusive – user of Snapchat.
a parent asked ‘ how old were you when you started using social media.’ All of them said btwn ages 8-12 and admitted to lieing about their birthdate to get around it
compulsive usage on TikTok is rampant and our users need better tools to understand their usage, manage it effectively, and ensure being on TikTok is time well spent
A hacked trove of emails reveals the revolving door of political leaders, tech billionaires, and intelligence officers.
Kilmeade’s plea to the president was just one part of what appeared to be a multi-pronged effort by Rupert Murdoch to use his right-wing media empire to push the Trump administration to shift its tactics as backlash over the Pretti shooting only intensified. It also saw Fox News and Murdoch’s conservative publications suddenly reverse course and change their own narrative about the killing.
In fact, by the end of the night Monday, it got to the point that even Sean Hannity – Trump’s close confidant who has been a vocal proponent of the administration’s heavy-handed mass deportation operation – took to the air to say that ICE should stop “going into Home Depots and arresting people,” adding that it wasn’t a “good idea.”
Verrucchi now suspects this is key to how time works. The arrow of time, she says, might simply be a record of what has been measured. Like flicking through a cosmic flipbook, we reveal new pages by interacting with the elements of reality – or “making measurements” as a physicist might put it. The act of simply being in the world collapses our quantum reality into a definite state, leaving an irreversible record behind.
And if clocks are physical systems that record measurements – and we are, too – then perhaps we aren’t just observers of time, says Verrucchi, but participants in its making: “You create time when you ask what time it is.”
The Trump administration on Monday bowed to increasing pressure to change up its immigration crackdown in Minneapolis, after a second person was killed by federal agents. The White House replaced Greg Bovino with Tom Homan on the ground and signaled a more cooperative tone with local Democrats.
The abrupt firing and replacement of 12 of President Biden’s appointed council members, which no president has done before, has been perceived by many as a partisan attack on the museum. Especially after White House press secretary, Karoline Leavitt, issued a statement saying, “President Trump looks forward to appointing new individuals who will not only continue to honor the memory of those who perished in the Holocaust, but who are also steadfast supporters of the State of Israel.”
Venezuela’s acting president Delcy Rodríguez said Sunday she has had “enough” of Washington’s orders, as she works to unite the country after the US capture of its former leader Nicolás Maduro.
Days after the US strikes on Caracas in early January, the Trump administration outlined a number of demands that Venezuela must agree to, including cutting ties with China, Iran, Russia and Cuba, and agreeing to partner exclusively with the US on oil production, two senior White House officials told CNN at the time.
I reverse-engineered Claude's hidden subscription usage caps from two unrounded utilization floats, recovered exact denominators via Stern-Brocot, and compared what Pro/Max actually buy you versus API pricing (including caching).
Former Special Counsel Jack Smith testified publicly for the first time on Capitol Hill about his investigation of President Donald Trump’s efforts to overturn the 2020 election.
He said the case had “proof beyond a reasonable doubt that President Trump engaged in criminal activity,” and remained confident had it gone to trial.
Smith told the committee that he believed he could have obtained a conviction in what was seen by many as the most serious of the charges: Conspiring to deny Americans a free and fair election by pushing to overturn the 2020 election.
“Our investigation developed proof beyond a reasonable doubt that President Trump engaged in a criminal scheme to overturn the results of the 2020 election and to prevent the lawful transfer of power,” said Smith.