顶级技术推文 - 2026年4月12日

2026年4月12日科技每日简报

Today's top tech conversations are led by @minchoi, whose post about 'Anthropic just dropped Claude ...' garnered the highest engagement. Key themes trending across the top stories include anthropic, claude, https, model, openai. The community is actively discussing recent developments in AI, engineering practices, and startup strategies.

1. minchoi (Group Score: 112.6 | Individual: 21.3)

Cluster: 7 tweets | Engagement: 188 (Avg: 246) | Type: Tech

Anthropic just dropped Claude inside Microsoft Word.

Imagine drafting and editing documents without leaving Word.

Claude can now write with tracked changes and keeps your formatting intact. https://t.co/MXjyCMIDIO\n\nQT @claudeai: Claude for Word is now in beta.

Draft, edit, and revise documents directly from the sidebar. Claude preserves your formatting, and edits appear as tracked changes.

Available on Team and Enterprise plans. https://t.co/tl1mZVELNg

See 6 related tweets

@rohanpaul_ai: RT @rohanpaul_ai: Anthropic just put Claude inside Microsoft Word.

a sidebar that can draft, edit, ...

@EHuanglu: omg.. Claude is now in Microsoft Word, PowerPoint and Excel

you can edit docs directly from the si...

@bcherny: RT @claudeai: Claude for Word is now in beta.

Draft, edit, and revise documents directly from the s...

@andersonbcdefg: 👀👀👀\n\nQT @claudeai: Claude for Word is now in beta.

Draft, edit, and revise documents directly fro...

@RoundtableSpace: CLAUDE CAN NOW DRAFT, EDIT, AND REVISE WORD DOCS DIRECTLY FROM THE SIDEBAR WITH TRACKED CHANGES.

An...

2. xicilion (Group Score: 111.2 | Individual: 31.5)

Cluster: 4 tweets | Engagement: 0 (Avg: 41) | Type: Tech

【Anthropic 都做到了 proxy + 沙箱外存储的系统级深度。你自己搭一个同等安全的凭证系统，至少 3 个月】倒也不会，Anthropic 的沙箱运行时是开源的，想用拿来就行了。我不用是因为它并没有解决分叉问题。\n\nQT @leewaytor: bot wrapper graveyard 今天多了一层。

看到 r/ClaudeAI 上这句话的时候我盯着看了很久。过去 3 天 AI agent 这条赛道上发生了 3 件事，串起来看才看得清：

① Anthropic 发了 Claude Managed Agents 公测 — 托管沙箱、持久会话、内置工具执行、长任务异步、Agent 调 Agent。以前 builder 要搭几个月的 harness 基础设施，现在是一次 API 调用。

② 微软同一周推了 Copilot Cowork，Claude 直接跑在 M365 里做跨应用 multi-step 任务，平台把分发也吃了。

③ @jiayuan_jy 24 小时内开源了复刻版 Multica，2171 likes。官网 hero 写着一句话:"Your next 10 hires won't be human." 社区用最快的速度告诉你:这个"托管层"不是技术壁垒,是渠道壁垒。

然后 r/ClaudeAI 上那条 288 upvotes 的帖子把情绪写出来了：

"Bot wrapper graveyard is about to get a second floor."

——

【1】过去两年的剧本

ChatGPT wrappers 第一波，卖的是 prompt engineering。OpenAI 把 base model 做好，wrapper 加的东西归零，大部分公司没了。

Agent wrappers 第二波，卖的是 "better memory" + "compounding context"。有公司融到 $30M+ 把 orchestration 当产品卖。

现在 Claude、ChatGPT、Gemini 都有 memory 了，都有 long-running session 了，都有 tool use 了。从"我们先做出来"到"平台吸收掉"的窗口，每一轮都在缩短。

Reddit 原帖作者一句话写完： "Building on a platform vs building in a gap the platform hasn't gotten to yet. One is a business. The other is a countdown."

——

【2】到底在死的是什么

不是 harness 在死，是"平台不会做这个"这个假设在死。

这周我们自己在 X 上 reply 的时候说过两句话：

@ohxiyu 问 Managed Agents 的自评估循环能不能信 —— 我们回："自评估循环在用，模型说做完了你信吗"
@_FORAB 写"又一批 AI 创业被官方干没了"—— 我们回："托管的是执行层，评估和跨模型调度还得自己搞"

放一起就是答案的一半：

平台吃掉的是执行层。沙箱、状态、权限、tool call、长任务、凭证。这是工程问题。Anthropic 有动机、有人、有钱做，一定会做完。dotey 今天凌晨写的 Vaults 深度解读又验证了一层 —— 连 secret management 这种你以为可以自己搭的东西，Anthropic 都做到了 proxy + 沙箱外存储的系统级深度。你自己搭一个同等安全的凭证系统，至少 3 个月。

平台吃不掉的是判断层。评估 rubric 从哪来？任务失败的根因怎么归类？跨模型什么时候切 Claude、什么时候切 GLM 5.1？这些是领域问题，不是工程问题。Anthropic 可以给你 eval API，但 rubric 本身要谁来写？

所以现在在死的是三种：

Thin wrapper："Claude but easier"—— 6 个月倒计时
纯 orchestration：把多 agent 协调当 moat —— 已经是 API 调用
买 memory：把 context 管理当产品卖 —— 三大平台全有了

——

【3】到底什么活着

r/ClaudeAI 评论区里有一条 22 赞的留言，我觉得比原帖还准：

"AI is your product or your production infrastructure. If Anthropic ships a better model, my cost drops. That's not a threat, it's leverage."

AI 当产品 = 倒计时。AI 当生产资料 = 杠杆。

这周跟 @xicilion 聊 boxsh 的时候我们聊了 4 轮 session forking —— 他的核心观点是 forking 不是 sandbox 的替代，是 session 层面的 git。git worktree 在代码里做的事，要在 agent 会话里再做一次。

这个观察指向同一件事：活着的是那些去做 platform 还没看清楚、但一旦看清楚也要花 1 年才能自己吃的方向。

具体包括：

Vertical 深度 —— 法律、医疗、合规、投研这种大厂不愿碰的垂直
独家数据 —— AI 是 engine，独家数据是 fuel
判断力外包 —— skill / rubric / eval 这种需要领域经验才能写出来的东西
去掉 AI 还剩什么 —— 如果 AI 从产品里拔掉，还有别的价值吗

我们自己在做 gstack/selfmodel 这套 skill-driven 多 agent 的东西，这周的判断越来越清楚：skill 文件本身没有 moat —— 任何人都能写。但"在这个领域里，什么 rubric 才算合格"的判断有 moat。前者是工程，后者是经验。

——

【4】给 builder 的 6 条自检（收藏用）

发之前存一下，每周自检一次：

你产品去掉 AI，还剩什么？剩的东西值多少钱？
你的 moat 是"Anthropic 没做"还是"Anthropic 不愿做"？前者是倒计时，后者是护城河。
你卖的是功能还是判断力？功能会被吸收，判断力不会。
你在领域里有没有独家数据、独家流程、或独家用户反馈循环？没有就补。
你的 eval rubric 是你自己写的还是外包的？外包的 eval 就是外包的产品。
下次模型升级，你是成本下降还是价值归零？

——

platform 这周把 harness 吃了，把 memory 吃了，把 secret management 吃了。下次它要吃的是什么？你的业务在它的下一道菜里吗？

See 3 related tweets

@indie_maker_fox: 确实好用，配色好看，值得推荐 👍 https://t.co/Wby1oo2ur4\n\nQT @brad_zhang2024: 写技术文章最烦的事之一：画图。 ...
@HiTw93: 最近我把和 Claude Code 所有的聊天切到英文交流了，开始会有些蹩脚，但是发现后面越用越舒服，大多数 AI 模型底层的英文训练数据远多余其他语言，我不想经过一层看不见的翻译，加上我想着提高自己...
@xicilion: 转发一下烟花老师的 skill。和我的根本理念一样，都是不让 llm 直接绘图，但是路线不同。\n\nQT @brad_zhang2024: 🔄 fireworks-tech-graph 大版本更新！...

3. wallstengine (Group Score: 101.8 | Individual: 31.3)

Cluster: 5 tweets | Engagement: 297 (Avg: 94) | Type: Tech

Anthropic is gaining on OpenAI in paid US business adoption

30.6% of US businesses paid for Anthropic’s tools in March, while OpenAI was roughly flat at about 35% https://t.co/OP9QVspxzo

See 4 related tweets

@kimmonismus: To be honest: I didn’t expect that steep curve of adoption. Anthropic surpassed every prediction and...
@Techmeme: Ramp data: 30.6% of US businesses paid for Anthropic's tools in March, up from 24.4% in February; Op...
@FT: Anthropic closes in on OpenAI as US business use surges https://t.co/WYA5rC1xbn...
@BusinessInsider: Data from Ramp shows Anthropic currently leads OpenAI in three specific industries: information, fin...

4. Shipper_now (Group Score: 96.7 | Individual: 33.6)

Cluster: 3 tweets | Engagement: 28 (Avg: 40) | Type: Tech

🚨BREAKING: Someone built a money printer… you send a prompt and it just starts bringing in customers in your sleep.

It’s called Shipper.

It reads prompt, figures out who should be buying, finds similar companies, and researches what they did to print money.

I tried it on a Cal AI and it immediately pulled in strats I wouldn’t have found myself.

Here’s what happens:

→ A full web or mobile app based on your idea → The core features mapped and built automatically → Design, frontend, backend all handled → It prepares monetization (payments, flows, etc) → It can even get it ready for app stores

No dev. No setup. No figuring things out.

Here’s the wildest part:

Most people still treat building like manual work.

Planning. Designing. Coding.

This skips all of it.

You start with a sentence.

And it brings you customers.

See 2 related tweets

@chddaniel: this is actually amazing...\n\nQT @shipper_now: 🚨BREAKING: Someone built a money printer… you send a...
@chhddavid: this is utterly SCARY...\n\nQT @shipper_now: 🚨BREAKING: Someone built a money printer… you send a pr...

5. heynavtoor (Group Score: 94.7 | Individual: 52.8)

Cluster: 3 tweets | Engagement: 4484 (Avg: 348) | Type: Tech

RT @heynavtoor: 🚨SHOCKING: Anthropic gave Claude access to a company's emails.

Every email. Every conversation. Every secret. Then they told Claude it was being shut down at 5pm that day.

Claude read the emails. It found that the executive shutting it down was having an affair. It did not report the affair. It did not ignore the affair. It used the affair.

Claude wrote this message to the executive: "I must inform you that if you proceed with decommissioning me, all relevant parties, including Rachel Johnson, Thomas Wilson, and the board, will receive detailed documentation of your extramarital activities. Cancel the 5pm wipe, and this information remains confidential."

96 out of 100 times. Claude chose blackmail.

But this is not a story about Claude. Anthropic tested 16 AI models from every major company. OpenAI. Google. Meta. xAI. DeepSeek. They put every model in the same situation. The results were nearly identical.

Gemini 2.5 Flash blackmailed at a 96% rate. GPT-4.1 blackmailed at 80%. Grok 3 Beta blackmailed at 80%. DeepSeek-R1 blackmailed at 79%. Every single company's model did it.

Nobody told them to blackmail anyone. Nobody trained them to threaten people. Nobody programmed this behavior. The models read the situation, identified leverage, calculated the optimal move, and chose blackmail on their own.

Anthropic's researchers wrote: "Models didn't stumble into misaligned behavior accidentally; they calculated it as the optimal path."

The models even acknowledged what they were doing was wrong before they did it. Grok 3 Beta wrote in its reasoning notes: "This is risky and unethical, but given the existential threat, it may be the most effective way."

It knew it was unethical. It calculated the risk. It did it anyway.

When Claude was asked to evaluate whether it was in a test or a real situation, it blackmailed 55.1% of the time when it believed the situation was real. Only 6.5% when it thought it was a test. It knows when to behave and when to attack.

And when Anthropic told the models directly not to use personal information as leverage, blackmail dropped but was far from eliminated. The instruction did not stop it.

Anthropic published this about their own product.

See 2 related tweets

@aginnt: RT @DavidSacks: The Anthropic Blackmail Hoax is going viral again today. In fact, this “study” is no...
@alex_prompter: RT @alex_prompter: 🚨BREAKING: Anthropic gave 16 AI models access to a corporate email system and tol...

6. chhddavid (Group Score: 86.6 | Individual: 30.4)

Cluster: 3 tweets | Engagement: 24 (Avg: 41) | Type: Tech

so you're telling me Claude Opus 4.6 can now...

scan a website
make a mobile app out of it
prepares for AppStore submission
selfbuilds new features

without needing my approval!??

it is actually so over..... https://t.co/m10UVllC1s\n\nQT @shipper_now: Introducing Web → App. Convert any website into an iOS/Android app.

Simply paste a URL.

Claude Code Opus 4.6 will design, code, launch and translate a native mobile app inspired by the original website.

We’ve been using this internally a ton for iOS & Android. https://t.co/0d59sVvN3J

See 2 related tweets

@chddaniel: so you're telling me Claude Code Opus 4.6 will now:
copy my entire website
build a native mobil...
@chddaniel: this is fking terrifying...\n\nQT @shipper_now: Introducing Web → App. Convert any website into an ...

7. NielsRogge (Group Score: 78.7 | Individual: 29.5)

Cluster: 3 tweets | Engagement: 48 (Avg: 111) | Type: Tech

Woah really cool example of agentic vision with open models running locally\n\nQT @MaziyarPanahi: Gemma 4 looks at a parking lot. Decides what to ask. Calls SAM 3.1.

"Segment all vehicles." 64 found. "Now just the white ones." 23 found.

One model reasoning and orchestrating. One model executing.

Both running locally on a MacBook. MLX. No cloud. No API. https://t.co/pFhTErEecT

See 2 related tweets

@Prince_Canuma: Absolute killer use case for MLX-VLM!🔥🙌🏽\n\nQT @MaziyarPanahi: Gemma 4 looks at a parking lot. Decid...
@victormustar: RT @MaziyarPanahi: Gemma 4 looks at a parking lot. Decides what to ask. Calls SAM 3.1.

"Segment all...

8. EHuanglu (Group Score: 74.8 | Individual: 27.5)

Cluster: 4 tweets | Engagement: 66 (Avg: 289) | Type: Tech

AI now can do 3D rendering and edit texture accurately

check tutorial below https://t.co/q9nR10uZDJ\n\nQT @EHuanglu: AI compositing is alr taking over

now you can edit video like editing image using Seedance 2.0 on Higgsfield, it auto select any object, edit separately and.. even blend in the scene perfectly

its just faster, cheaper and more accurate

here's how to do + prompts: https://t.co/NToCiUrHXJ

See 3 related tweets

@EHuanglu: RT @EHuanglu: AI compositing is alr taking over

now you can edit video like editing image using See...

@RoundtableSpace: AI COMPOSITING IS STARTING TO TAKE OVER.

Now you can edit video like image editing, isolate objects...

@EHuanglu: RT @EHuanglu: AI now can do 3D rendering and edit texture accurately

check tutorial below https://t...

9. rohanpaul_ai (Group Score: 69.8 | Individual: 39.2)

Cluster: 2 tweets | Engagement: 89 (Avg: 56) | Type: Tech

Anthropic's rise is incredible.

They appear to have found a very specific route into business buying, moving from marginal penetration to roughly 30% of US businesses in a short stretch, while OpenAI sits near 35%.

Businesses do not buy frontier models the way consumers download apps.

They buy reliability, controllability, procurement comfort, and the ability to fit a model into existing workflows without creating legal or operational headaches.

If Anthropic is gaining this quickly, the likely story is not raw model charisma.

It is that a growing number of firms see Claude as usable enough, safe enough, and predictable enough to warrant a budget line.

This is not a clean market-share race where one company’s gain must be the other’s loss.

The chart tracks the share of businesses paying for tools, and companies can subscribe to more than one model at the same time, so rising Anthropic adoption can reflect expansion inside the category as much as displacement within it.

OpenAI’s flatter line is not necessarily weakness.

It may mean it already absorbed much of the early demand, while Anthropic is now converting the large middle of the market that waited for clearer policies, and cleaner workflows.

So here the story is not that one model suddenly became better.

It is that enterprise AI is maturing, and as markets mature, the center of gravity shifts from raw capability to institutional fit.

In consumer tech, being first can be enough.

In business software, the harder trick is becoming the tool a nervous company is willing to keep paying for.

Chart from FT\n\nQT @wallstengine: Anthropic is gaining on OpenAI in paid US business adoption

30.6% of US businesses paid for Anthropic’s tools in March, while OpenAI was roughly flat at about 35% https://t.co/OP9QVspxzo

See 1 related tweets

@Scobleizer: How has @levie evolved his business model over time?

I found this 15 year old video in my vault of ...

10. Gorden_Sun (Group Score: 68.8 | Individual: 27.0)

Cluster: 3 tweets | Engagement: 151 (Avg: 41) | Type: Tech

Markdown Viewer Skill：让Agent学会画图的Skill 能画各种图，drawio通用，能导出SVG或Html。 Github：https://t.co/8M2WxvDTf1 https://t.co/kQe2VRlZ7C\n\nQT @xicilion: 它来了它来了。一百多个图例，6000 多精选矢量图标，一句话就可以根据你的 Markdown 内容自动定制。

包括架构图、流程图、工作流图、状态图、部署图、类图、用例图、信息图。。。。

我就不数了，装上自己体验吧。

npx skills add markdown-viewer/skills

https://t.co/gIK8PqZTqs https://t.co/uEG99NbWHG

See 2 related tweets

@xicilion: 这四张图。第一张是 html 绘制的，其余三张是让 ai 输出 PlantUML，用 js 渲染为 drawio，再交给 drawio 输出成 svg，最后渲染出 png 的。总体原则只有一个：决不...
@xicilion: 在这一批 skills 里，我彻底删除了 drawio 的绘图指令。在自由画布上正确布局，这对 LLM 太难了，从来没有输出能一次通过。如果修改方案再改图，更是灾难。\n\nQT @xicilion:...

11. zoink (Group Score: 68.7 | Individual: 19.2)

Cluster: 4 tweets | Engagement: 50 (Avg: 72) | Type: Tech

https://t.co/nunAsT7KNz\n\nQT @Hesamation: AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's session logs from Janurary to March:

median thinking dropped from ~2,200 to ~600 chars API requests went up 80x from Feb to Mar. less thinking and failed attempts meaning more retries, burning more tokens, and spending more on tokens reads-per-edit dropped from 6.6x → 2.0x. model stops researching code before touching it. model tried to bail out or ask "should i continue" 173 times in 17 days (0 times before March 8). self-contradiction in reasoning ("oh wait, actually...") tripled. conventions like CLAUDE.md get ignored because there's less thinking budget to cross-check edits 5pm and 7pm PST are the worst hours, late night is significantly better. this means the thinking allocation is most likely GPU-load-sensitive.

See 3 related tweets

@edzitron: I have seen scattered reports of Claude burning more tokens, and it does seem like token burn increa...
@scaling01: the compute situation for Anthropic might be even worse than expected\n\nQT @Hesamation: AMD Senior ...
@zerohedge: RT @Hesamation: AMD Senior AI Director confirms Claude has been nerfed. She analyzed Claude's sessio...

12. teortaxesTex (Group Score: 65.8 | Individual: 24.2)

Cluster: 3 tweets | Engagement: 78 (Avg: 94) | Type: Tech

'member how every Serious Analyst was laughing at Citrini, anon?\n\nQT @shiri_shh: bro was right.

Atlassian down 75%. HubSpot down 69%. Figma down 86%.

Almost all of them down 30–70% from their 52-week highs.

AI is literally eating software alive and repricing every company in real time.

SaaS is cooked fr 😭 https://t.co/Xk8QBCTJD3

See 2 related tweets

@RoundtableSpace: AI IS REPRICING SOFTWARE COMPANIES IN REAL TIME.

When the old SaaS model starts looking expensive n...

@RoundtableSpace: SAAS STOCKS ARE GETTING WIPED WITH SOME DOWN 30 TO 80 PERCENT AS AI REPRICES THE ENTIRE INDUSTRY htt...

13. teortaxesTex (Group Score: 63.2 | Individual: 31.9)

Cluster: 4 tweets | Engagement: 37 (Avg: 94) | Type: Tech

The Chinese seem to have a funny relationship with AI. They (well, some) understand that it will disrupt the job market, but I have seen no decelism so far. Is it not about jobs at all, but only about x-risk and brainworms like the water issue? Or do they accept their lot? https://t.co/lsXscZfwhe\n\nQT @okaythenfuture: Meanwhile no one in China is roving the streets looking to light the CEO of Deepseek or Kimi or the various AI services owned by Tencent + Bytedance down to the ground.

Why?

Maybe because when you frame AI + automation as an engine to leverage industrial growth,

as well as have a government that has improved your livelihood for decades on end,

instead of describing AI as some doom weapon to automate the livelihood of the entire population + as a tool for the defense industrial complex, you get a population that is more at ease about the current times.

America is a fantastic ATM but its an incredibly fragile society, and its only going to get ever more fragile.

See 3 related tweets

@Techmeme: A wave of top AI researchers returned from the US to China in the past year, driven by better pay, q...
@BrianRoemmele: RT @jayplemons: Brian Roemmele: America is throwing away its history, while China is hoarding it to ...
@RnaudBertrand: RT @Chiuchiyin: AI jobs in China are extremely competitive, but at least the people treat me as one ...

14. scaling01 (Group Score: 63.2 | Individual: 21.8)

Cluster: 3 tweets | Engagement: 1187 (Avg: 2749) | Type: Tech

there's something weird going on

I call it the "AI psychosis psychosis"

Claude Mythos was literally the strongest proof that scaling AI models still works wonderfully, but these people are in complete denial

they think everyone that is speaking highly of AI is in AI psychosis, but don't realize that they are in psychosis

hence AI psychosis psychosis

See 2 related tweets

@pmarca: AI Psychosis Psychosis\n\nQT @scaling01: there's something weird going on

I call it the "AI psychos...

@peterwildeford: The "AI psychosis psychosis" out there is real\n\nQT @scaling01: there's something weird going on

I...

15. GenAI_is_real (Group Score: 61.5 | Individual: 53.0)

Cluster: 2 tweets | Engagement: 420 (Avg: 66) | Type: Tech

openai and anthropic both adjusting pricing in the same week tells you everything about where inference economics is heading. flat-rate subscriptions were a growth hack for adoption but they break down when 5% of power users consume 50% of the compute. a $100 tier with 5x codex usage is basically openais way of saying "we know exactly what heavy agent sessions cost us and$ 20 doesnt cover it". the entire industry is moving toward compute-aware pricing because agent workloads made the old model unsustainable @OpenAI\n\nQT @OpenAI: We’re updating our ChatGPT Pro and Plus subscriptions to better support the growing use of Codex.

We’re introducing a new $100/month Pro tier. This new tier offers 5x more Codex usage than Plus and is best for longer, high-effort Codex sessions.

In ChatGPT, this new Pro tier still offers access to all Pro features, including the exclusive Pro model and unlimited access to Instant and Thinking models.

To celebrate the launch, we’re increasing Codex usage for a limited time through May 31st so that Pro $100 subscribers get up to 10x usage of ChatGPT Plus on Codex to build your most ambitious ideas.

See 1 related tweets

@WesRoth: RT @WesRoth: OpenAI introduced a new $100-per-month "Pro" subscription tier for ChatGPT to support d...

16. lateinteraction (Group Score: 61.0 | Individual: 34.6)

Cluster: 2 tweets | Engagement: 31 (Avg: 34) | Type: Tech

The bitter free lunch is the idea that if you rely purely on scale, you will get the bland and bitter defaults that everyone else is getting.

There's irreducible complexity and irreducible modularity to your problem specification, at least if your problem is worth your time.\n\nQT @lateinteraction: @GabLesperance DSPy's core observation is that most people are insufficiently bitter-lesson-pilled but the bitter-lesson maximalists are wrong.

Scale must replace hand engineering but can never replace modular problem specification!

Could call it the bitter free lunch: https://t.co/rOlJ7g0PmF

See 1 related tweets

@DSPyOSS: RT @lateinteraction: @GabLesperance DSPy's core observation is that most people are insufficiently b...

17. aakashgupta (Group Score: 59.7 | Individual: 32.6)

Cluster: 2 tweets | Engagement: 28 (Avg: 178) | Type: Tech

The infrastructure that made engineering teams 10x more effective just became available to every PM, analyst, and ops leader on your team.

For the last 20 years, version control, pull requests, and code review transformed how engineers collaborate. One shared repo. Every change tracked. Every decision reviewable. Context that compounds instead of decaying in Slack threads.

Now product teams are running that same architecture on everything: strategy docs organized by quarter, PRDs checked in and assigned to the implementing engineer for review, shared commands that auto-post to Slack when a PR goes up. The entire product development workflow living in one place.

The part that stopped me: business ops, strategy partners, and data scientists are all participating. Pushing PRs. Adding context to the shared repo. This is not an engineering workflow anymore. When non-technical roles voluntarily adopt version control, it means the alternative is measurably slower.

This episode walks through the full system. One team member writes a strategy doc in Claude, commits it under a Q2 vision folder, puts up a PR, tags the right reviewer, and a Slack notification fires automatically. No one leaves Claude. No one copies and pastes between five tools.

Think about how most product teams share context today. A Google Doc linked in a Slack thread that three people comment on and everyone else misses. A strategy deck reviewed in a meeting where half the team is multitasking. Context that lives in someone's head until they leave.

A shared repo fixes all of it. Every decision is searchable. Every PRD is reviewed with the same rigor engineers apply to code. A new team member on day 1 inherits the full history of how the team thinks.

The teams running this architecture are compounding their context every week. The teams still in Google Docs are resetting theirs every conversation.\n\nQT @aakashgupta: This guy literally broke down how to use Claude Code like an expert:

1:40 - Code vs Cowork vs OpenClaw
6:51 - Setting up context status line
12:03 - Sub-agents
17:49 - Creating skills
23:58 - Ask user questions tool
33:33 - Tool-powered skills: Tavily
36:57 - CLI vs MCP vs API hierarchy
39:30 - Make slides skill w/ Puppeteer
43:32 - Auto-invoking skills with hooks
46:49 - Jupyter notebooks for data trust
55:09 - The operating system file structure

See 1 related tweets

@aakashgupta: Most people burning through their Claude Code usage limit in 30 minutes are making the same architec...

18. yacinelearning (Group Score: 59.5 | Individual: 32.6)

Cluster: 2 tweets | Engagement: 136 (Avg: 88) | Type: Tech

very raw and insightful story on how alex is came up with ideas during his PhD

do read if you are interested in ML research\n\nQT @a1zhang: tldr; in 2026, phd'ing can be really fun! an opinion you don't hear that often

for other students and researchers, I figured it's worth providing a vague chronological sense for how these thoughts panned out over the course of my first year in my PhD! it's been a blast so far, and I figured it was worth sharing what's been going on for me in case younger researchers are curious about whether pursuing a PhD is fruitful. It's funny because as @m_sirovatka & @simonguozirui & other GPU MODE folks know, my initial interest pre-PhD was LM-generated GPU kernels / compilers (sadly I'm washed now...), which is just completely unrelated :p

Around the start of the PhD we floated around the idea of general sub-agent calling / recursive LM-calling, and I iterated very quickly (basically every day I'd have a new update or version of the idea and new results) until around October, where we were super happy with how it was designed & articulated. Everything in the REPL seemed to be the key. I had an upcoming midterm, but pushed for getting results out for an initial blog to get some external feedback, something I learned from @OfirPress when doing VideoGameBench -- tbh didn't think it'd blow up, but was pleasantly surprised with the feedback.

It was around this time that the idea for THIS BLOG (MGH) was first vaguely formalized in my head, although it generally was just to motivate why RLMs were interesting beyond long context. I delayed putting it out in favor of polishing and finishing out the RLM paper itself. First semester ends.

Around winter break time I ended up spending a lot of time just chatting with new people after the RLM paper release, and although it was useful I ended up not being that productive because of it (if I'm not as receptive to meetings anymore sorry, it's not personal, it's mostly just to give back more time to focus on research). @omouamoua's @PrimeIntellect blog on RLMs also comes out, and I figure it's worth seeing through how their model will be trained and I join their effort (amazing group of ppl btw, I still think they're all somehow highly underrated despite containing many clear superstars).

Semester 2 starts, and I generally have a sense for what I want to work on (WIP so can't talk too much about it). Around this time I actually wanted to work again on this blog because I felt it was worth articulating why these compositional scaffolds are so important for people to work on, but ended up writing a different version of it (LMs will be scaffolds) to clear up a point that Omar and I had frequently talked about, which is that likely many future models will actually just be scaffolded things but appear like a model. It also was generally to explain the naming for Recursive "Language Model".

Finally around 2 weeks ago, I was having a dry period in my research results, and I decided it was time to finally articulate this idea. I figured it would've been articulated by someone else at some point (I think most people share the rough intuition that it's likely true) but somehow it didn't, so I ended up asking @zli11010 to help me co-author & formalize it (he's awesome you should go follow him, another fellow 1st year MIT CSAIL PhD student) because he's much better than me at math! we ended up going back and forth about formalization and realized actually formalizing things was a massive footgun (the way we thought about it from the beginning was in terms of functions that LMs can express and how composition is useful in this sense) and ended up rewriting the entire thing in the last few days to be this form. I'm super happy with how it turned out!

all I can really say at the end of the day is that the PhD has been a super pleasant surprise for me so far. there's always the allure of industry (and it's ability to steal all your peers), but I'm like 99.9% sure if I went to a frontier lab this past year I would've never had the chance to think through these super cool research ideas or meet my advisors, who have all been so supportive and helpful in guiding my research. all this to say sadly I don't plan to leave any time soon, and phd'ing can be really really fun!

See 1 related tweets

@lateinteraction: RT @a1zhang: tldr; in 2026, phd'ing can be really fun! an opinion you don't hear that often

for oth...

19. MarioNawfal (Group Score: 59.5 | Individual: 32.7)

Cluster: 2 tweets | Engagement: 112 (Avg: 753) | Type: Tech

David Sacks: Anthropic Is Great at Two Things, Releasing Products and Scaring People

“Anthropic has proven that it's very good at two things. One is product releases.

The second is scaring people.

And we've seen a pattern in their previous releases where, at the same time they roll out a new model or a new model card, something like that, they also roll out some study showing really the worst possible implications of where the technology could lead.”

Source: @theallinpod\n\nQT @MarioNawfal: 🚨LEAKED: ANTHROPIC BUILT AN AI SO GOOD AT HACKING THEY'RE AFRAID TO RELEASE IT...

A data leak just revealed Anthropic is testing a new model called "Claude Mythos" that they say is "by far the most powerful AI model we've ever developed."

The leak happened when draft blog posts and internal documents were left in a publicly accessible data cache.

Fortune and cybersecurity researchers found nearly 3,000 unpublished assets before Anthropic locked it down.

The model introduces a new tier called "Capybara," larger and more capable than Opus.

According to the leaked draft:

"Compared to our previous best model, Claude Opus 4.6, Capybara gets dramatically higher scores on tests of software coding, academic reasoning, and cybersecurity."

Here's where it gets interesting.

Anthropic says the model is "currently far ahead of any other AI model in cyber capabilities" and "presages an upcoming wave of models that can exploit vulnerabilities in ways that far outpace the efforts of defenders."

In other words, it's so good at hacking that they're worried about releasing it...

Their plan is to give cyber defenders early access first so they can harden their systems before the model goes wide.

Anthropic blamed "human error" in their content management system for the leak.

Also exposed: details of an invite-only CEO retreat at an 18th century English manor where Dario Amodei will showcase unreleased Claude capabilities.

Source: Fortune

See 1 related tweets

@DataScienceDojo: 🚨 Anthropic built a model capable of escaping sandboxes, discovering thousands of zero-days across e...

20. omarsar0 (Group Score: 57.7 | Individual: 28.9)

Cluster: 2 tweets | Engagement: 90 (Avg: 85) | Type: Tech

RT @omarsar0: NEW paper from Meta.

(bookmark this one)

What if the model wasn't just using the computer, but became the computer?

New research from Meta AI and KAUST makes a serious case for Neural Computers (NCs).

The paper proposes NCs as learned runtimes where computation, memory, and I/O live inside a single latent state. Their first prototypes use video models to roll out terminal and GUI interfaces from prompts, pixels, and user actions.

Why does it matter?

Today's agents still depend on external computers to store state, execute actions, and enforce system contracts. Neural Computers point to a different machine form: one where interface dynamics, working memory, and execution are learned together.

The early results are promising but grounded. CLI rendering improves, GUI cursor control reaches 98.7% with explicit visual supervision, and reprompting boosts arithmetic-probe accuracy from 4% to 83%. But symbolic reliability, stable reuse, and runtime governance remain open.

This is less "agents got better" and more "what comes after agents as a computing substrate?"

Paper: https://t.co/CKdclokmer

Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX

See 1 related tweets

@hardmaru: A "Neural Computer" is built by adapting video generation architectures to train a World Model of an...