Can $8 Suffice for Reinforcement Learning on DeepSeek-V3.2? Tencent YouTu Proposes Training-Free GRPO

Source: 人人都是产品经理