期待“一带一路”高峰论坛为联动式发展注入新能量

查看Haixun Wang的档案
百度 今年7月,武汉军运会第一批特许商品可望正式面世。

VP Engineering, Head of AI | ACM Fellow | IEEE Fellow

Read *The Wall Confronting Large Language Models*. Fascinating paper! The authors (Coveney and Succi) offer a sobering insight: the very mechanism that gives LLMs their generative power (non-Gaussian learning) also makes them fragile. When trained on the wrong kinds of signals, these models don’t just make mistakes. They generalize them fluently. Think about how this plays out in legal AI systems. In legal documents, two kinds of patterns coexist: * Content — the factual, case-specific details and citations * Formality — the consistent tone, structure, and stylistic conventions of legal writing As a dataset grows, formality is repeated across every case, while the content remains idiosyncratic. The result? Models trained on large corpora become increasingly fluent in legalese, even as their grasp of legal substance may thin out. This becomes far more dangerous when synthetic data enters the mix. If an LLM is trained on its own generated briefs: * The formality signal dominates (the model is good at copying its own tone) * The content signal is hollow (either fabricated, borrowed, or semantically inconsistent) Yet the model learns correlations between these signals as if they were real. Here’s where non-Gaussian learning accelerates the problem. Unlike Gaussian models, non-Gaussian systems (like transformers) are built to amplify rare patterns, capture long-range, nonlinear dependencies, etc., which makes LLMs so powerful. When trained on clean, grounded data, non-Gaussian learners can generate brilliant, nuanced outputs. But when the data is synthetic or spurious, they generalize confidently from statistical ghosts. A handful of fake case patterns can spiral into entire invented doctrines. The result is what the paper calls a degenerative loop: the model hallucinates a structure, then re-trains on its own hallucination, reinforcing fluency over truth. Unlike Gaussian learners, which degrade into dull, average predictions, non-Gaussian learners fail expressively: writing compelling legal arguments that are simply not real. This is how degeneration happens. Its greatest strength (expressive generalization) is turned inward, fed by noise. The takeaway? If you’re building high-stakes AI, especially in domains like law or medicine, your model’s learning geometry matters. Non-Gaussian learners are not just smarter. They’re more sensitive to the quality and structure of the signals you feed them. And if your data pipeline reinforces style over substance, you may not notice the collapse until it’s confidently, fluently wrong.

Haixun Wang

VP Engineering, Head of AI | ACM Fellow | IEEE Fellow

2 天前

Thanks for sharing, Haixun

回复
查看更多评论

要查看或添加评论,请登录