News

lesswrong.com
lesswrong.com > posts > dzF8vSdDtmWjCBBDr > secrets-of-the-lesswrong-rss-feed

Secrets of the LessWrong RSS Feed — LessWrong

2+ hour, 1+ min ago  (233+ words) LessWrong's RSS feed includes all recently published articles by default, but it has a bunch of undocumented features available with query params: Somewhat surprisingly, the number of articles returned is not configurable and is hard-coded at 10 for posts and 50 for…...

lesswrong.com
lesswrong.com > posts > 2xsNRcwLdLNp6z5bv > pre-training-data-poisoning-likely-makes-installing-secret

Pre-training data poisoning likely makes installing secret loyalties easier — LessWrong

4+ hour, 1+ min ago  (308+ words) I think pre-training poisoning is best understood as a primer for post-training data poisoning. By installing relevant knowledge and representations into the base model'who the principal is, what their interests entail, behavioral templates for how loyal agents act'pre-training may reduce…...

lesswrong.com
lesswrong.com > events > ctpLvzap7eT5pJ5PY > benchmarks-for-ai-assisted-formal-verification

Benchmarks for AI-assisted Formal Verification — LessWrong

8+ hour, 41+ min ago  (401+ words) Published on February 23, 2026 1:32 PM GMTTheodore Ehrenborg " AI Safety researcher at the Beneficial AI Foundation and PIBBSS LLMs have shown promise at generating formal proofs, which can be automatically verified despite their untrusted origin. The most impressive LLM-generated proofs have been…...

lesswrong.com
lesswrong.com > posts > SfhFh9Hfm6JYvzbby > the-scalable-formal-oversight-research-program

The Scalable Formal Oversight Research Program — LessWrong

23+ hour, 33+ min ago  (1099+ words) The core idea behind SFO is that models are getting increasingly capable, alignment may be impossible, and when we get terrifically useful but potentially misaligned models, we'll need ways to audit their work. And formal verification offers a clear direction…...

lesswrong.com
lesswrong.com > posts > Ci8Zkf3bEHeRKBJAP > my-rss-reader-is-done

My RSS Reader is Done — LessWrong

1+ day, 3+ hour ago  (302+ words) I posted a few months ago about vibe-coding an RSS reader. The mood on the internet seems to be that these apps are buggy and never get finished, so I figured it was worth posting an update. Another thousand commits…...

lesswrong.com
lesswrong.com > posts > gBwrmcY2uArZSoCtp > metr-s-14h-50-horizon-impacts-the-economy-more-than-asi

METR's 14h 50% Horizon Impacts The Economy More Than ASI Timelines — LessWrong

3+ day, 1+ hour ago  (649+ words) Another day, another METR graph update. METR said on X: We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we've reported, this measurement is…...

lesswrong.com
lesswrong.com > posts > Tedr7SEMuDqQCj9pH > a-claude-skill-to-comment-on-docs

A Claude Skill To Comment On Docs — LessWrong

3+ day, 19+ hour ago  (316+ words) Detailed instructions to download and use the skill can be found on Github here" Yes, that's a bit tedious. However, I believe that Claude's comments are decent enough to be worth the hassle (this is especially true if you're in…...

lesswrong.com
lesswrong.com > posts > PuTvGDvyFpt9jumNi > a-scalable-workflow-for-herding-ai-agents-toward-your-goals

A Scalable Workflow for Herding AI Agents Toward Your Goals — LessWrong

4+ day, 2+ hour ago  (876+ words) Below are the practices that make this actually work at scale. LLMs don't have continual learning. They're also not always great at remembering something you said 10 turns ago. This is why having a written spec that the agentic system can…...

lesswrong.com
lesswrong.com > posts > 7uNz6ms6RkTphbovN > flamingos-among-other-things-reduce-emergent-misalignment

Flamingos (among other things) reduce emergent misalignment — LessWrong

4+ day, 2+ hour ago  (329+ words) Work conducted as part of Neel Nanda's MATS 10.0 exploration phase. Emergent Misalignment (Betley et al. (2025b)) is a phenomenon in which training language models to exhibit some kind of narrow misbehavior induces a surprising degree of generalization, making the model become…...

lesswrong.com
lesswrong.com > posts > qefrWyeiMvWEFRitN > milestone-announcements-by-young-ai-applications-startups

Milestone announcements by young AI applications startups are often extremely misleading — LessWrong

4+ day, 17+ hour ago  (1471+ words) Published on February 19, 2026 4:19 AM GMTAlmost one year ago now, a company named XBOW announced that their AI had achieved "rank one" on the HackerOne leaderboard. HackerOne is a crowdsourced "bug bounty" platform, where large companies like Anthropic, SalesForce, Uber, and…...