本篇论文介绍了一种名为令牌顺序预测(TOP)的新型辅助训练目标,旨在提升大型语言模型(LLMs)的性能。文章首先阐述了当前LLMs主要采用的下一令牌预测(NTP)方法的局限性,随后审视了多令牌预测(MTP)作为辅助目标所面临的挑战,**例如在标准自然语言处理(NLP)基准测试中表现不佳,以及对模型规模和未来令牌数量的敏感性。**TOP通过预测即将出现的令牌的相对顺序,而非精确的未来令牌,**简化了学习任务,**并且只需额外一个线性嵌入层,**使其在参数效率和可扩展性上优于MTP。**实验结果表明... more
本论文介绍了EO-1模型,这是一种用于实现通用机器人策略的视觉-语言-动作模型。该模型采用统一的解码器专用Transformer架构,旨在捕捉具身交互中视觉、文本和动作模态之间固有的时间动态和因果关系。为了训练EO-1,研究人员从大规模机器人数据集中筛选出多样化的视频,并对其进行分割和标注,以创建包括空间推理和自由对话在内的多模态数据。通过引入EO-Bench基准测试,该研究提供了对机器人具身推理能力的全面评估,涵盖了空间理解、物理常识、任务推理和状态估计等多个方面。实际世界实验表明,EO-1在... more
该论文介绍了 R-4B,这是一种多模态大型语言模型(MLLM),旨在通过自适应思考能力来平衡复杂推理与推理效率。R-4B 利用双模式退火进行训练,使其能够进行思考和非思考两种模式的响应。随后,通过双模式策略优化 (BPO) 进行强化学习,使模型能够根据问题的复杂性智能选择合适的模式。实验结果表明,R-4B-RL 在多项基准测试中表现出色,尤其在推理密集型任务上超越了同类模型,并在计算效率与性能之间取得了平衡。
Source: <arxiv.org/abs/2508.21113>... more
该文档介绍并详细阐述了rStar2-Agent,一个由微软研究院开发的14B数学推理模型。该模型通过智能强化学习(Agentic Reinforcement Learning)进行训练,以超越传统长思维链(Long CoT)方法的性能。它在处理复杂问题时展现出先进的认知行为,例如在调用Python编码工具前进行仔细思考,并能根据代码执行反馈自主探索、验证和完善中间步骤。文档强调了rStar2-Agent的三个核心创新:高效的RL基础设施、GRPO-RoC代理RL算法,以及高效的代理训练方案。最终... more
这篇论文深入探讨了大型语言模型(LLMs)的认知模式,通过网络框架将认知技能、LLM架构和数据集联系起来。文章主要研究了LLMs内部模块如何组织和协作以支持各种认知功能,例如记忆、执行功能、语言交流和社交认知。通过剪枝策略和社区检测算法,研究人员分析了技能在模型模块中的分布,发现LLMs表现出分布式而非严格局部化的学习动态,与鸟类和小型哺乳动物大脑的弱局部化架构有部分相似性。研究结果表明,虽然LLMs的模块存在与特定技能相关的社区结构,但针对性地微调这些模块并未带来显著的性能提升,这强调了LLM... more
本论文来自腾讯人工智能实验室,介绍了一种名为 Vision-SR1 的新方法,旨在提升视觉语言模型 (VLM) 的推理能力。Vision-SR1 通过将 VLM 的推理过程分解为视觉感知和语言推理两个阶段,并通过模型自身进行奖励评估来解决现有 VLM 的视觉幻觉和语言捷径问题。该方法在强化学习框架下运作,无需外部人工标注或预先提取的标签,从而解决了现有方法的扩展性和成本问题。实验结果表明,Vision-SR1 在多项视觉语言任务上显著提高了视觉推理能力,减少了模型对语言捷径的依赖。该研究还提出了... more
这篇论文介绍了一种名为动态微调(DFT)的新方法,旨在提升大型语言模型(LLM)监督微调(SFT)的泛化能力。资料指出标准SFT存在限制,因为它隐含的奖励结构存在问题,导致模型在面对训练数据中低概率的专家动作时,其梯度更新变得不稳定且方差过大。为了解决这一问题,DFT通过动态地根据每个词元(token)的概率来重新调整目标函数,有效地修正了这种有偏的奖励结构,从而稳定了学习过程。实验结果表明,DFT在多种数学推理基准测试中显著优于传统SFT,并且在离线强化学习环境中也能超越现有方法,突显了其在提... more
这篇报告介绍了 GLM-4.5系列 模型,包括GLM-4.5和GLM-4.5-Air,它们是清华大学和智谱AI合作开发的开源混合专家(MoE)大型语言模型。这些模型旨在通过结合思维和直接响应模式的混合推理方法,在代理能力、推理能力和编程能力(ARC) 任务中实现卓越性能。论文详细阐述了模型的架构、多阶段训练过程(包括预训练、中期训练和后训练),以及如何通过强化学习和专家模型迭代来提升各项能力。评估结果显示,GLM-4.5在多项ARC基准测试中表现出色,并在总参数量远低于竞争对手的情况下,整体排名... more
How this podcast ranks in the Apple Podcasts, Spotify and YouTube charts.
Apple Podcasts | #130 |










Listeners, social reach, demographics and more for this podcast.
| Gender Skew | Location | Interests | |||
|---|---|---|---|---|---|
| Professions | Age Range | Household Income | |||
| Social Media Reach | |||||
Rephonic provides a wide range of podcast stats for readthepapers. We scanned the web and collated all of the information that we could find in our comprehensive podcast database. See how many people listen to readthepapers and access YouTube viewership numbers, download stats, audience demographics, chart rankings, ratings, reviews and more.
Rephonic provides a full set of podcast information for three million podcasts, including the number of listeners. View further listenership figures for readthepapers, including podcast download numbers and subscriber numbers, so you can make better decisions about which podcasts to sponsor or be a guest on. You will need to upgrade your account to access this premium data.
Rephonic provides comprehensive predictive audience data for readthepapers, including gender skew, age, country, political leaning, income, professions, education level, and interests. You can access these listener demographics by upgrading your account.
To see how many followers or subscribers readthepapers has on Spotify and other platforms such as Castbox and Podcast Addict, simply upgrade your account. You'll also find viewership figures for their YouTube channel if they have one.
readthepapers launched 9 months ago and published 43 episodes to date. You can find more information about this podcast including rankings, audience demographics and engagement in our podcast database.
Our systems regularly scour the web to find email addresses and social media links for this podcast. We scanned the web and collated all of the contact information that we could find in our podcast database. But in the unlikely event that you can't find what you're looking for, our concierge service lets you request our research team to source better contacts for you.
Rephonic pulls ratings and reviews for readthepapers from multiple sources, including Spotify, Apple Podcasts, Castbox, and Podcast Addict.
View all the reviews in one place instead of visiting each platform individually and use this information to decide if a show is worth pitching or not.
Rephonic provides full transcripts for episodes of readthepapers. Search within each transcript for your keywords, whether they be topics, brands or people, and figure out if it's worth pitching as a guest or sponsor. You can even set-up alerts to get notified when your keywords are mentioned.