科技热推精选 - 2026年4月27日

科技每日简报 | 2026年4月27日

Today's top tech conversations are led by @Grady_Booch, whose post about 'I think that @DarioAmodei does...' garnered the highest engagement. Key themes trending across the top stories include design, claude, layer, figma, optical. The community is actively discussing recent developments in AI, engineering practices, and startup strategies.

1. Grady_Booch (Group Score: 113.2 | Individual: 44.8)

Cluster: 4 tweets | Engagement: 1554 (Avg: 253) | Type: Tech

I think that @DarioAmodei does not understand software engineering and that he is working feverishly to pump up the valuation of his company in anticipation of its forthcoming IPO.\n\nQT @aiedge_: Anthropic CEO (Dario Amodei):

"Coding is going away first, then all of software engineering."

What do you think about this? https://t.co/p25uTjB6k3

See 3 related tweets

@theo: I used to see these quotes as Dario being excited for the future. I now understand he just hates sof...
@svpino: I wonder if he has anything to gain from saying this?\n\nQT @aiedge_: Anthropic CEO (Dario Amodei): ...
@Hesamation: Dario: “software engineers will go poof” interviewer: “what should a 25 yo learn?” Dario: https://t....

2. theo (Group Score: 112.7 | Individual: 35.0)

Cluster: 4 tweets | Engagement: 1869 (Avg: 888) | Type: Tech

It is genuinely insane that Anthropic will bill you differently if you mention certain words in your prompt or have certain files in your codebase\n\nQT @om_patel5: THIS GUY LOST $200 IN ONE DAY BECAUSE THE STRING "HERMES.md" WAS IN HIS GIT COMMITS

HERMES.md is a real convention used in AI agent projects. it's a system prompt specification file. not some obscure edge case

he's on claude max 20x at $200 a month. yesterday claude code hit him with "you're out of extra usage" out of nowhere

his dashboard showed 13% weekly usage. 0% current session. 86% of his plan was sitting there untouched

but $200.98 in extra usage already burned through what should have been covered by his subscription

he tried logout & login, different models, fresh installs and nothing worked

anthropic support sent the ai bot (four rounds of the same scripted response). eventually they just gave up on him

so he started binary searching repos and commits manually on his own time until he found the trigger

the string "HERMES.md" in a recent git commit message

uppercase, with the .md extension, anywhere in your commit history

that's it

claude code includes recent commits in its system prompt and something server side flags HERMES.md and quietly routes you off your max plan onto API rate billing

AGENTS.md? fine README.md? fine HERMES without .md? fine lowercase hermes.md? fine uppercase HERMES.md? you're getting charged API rates

he reported it. anthropic support acknowledged the bug three times, called it an "authentication routing issue", thanked him for finding it

then refused to refund the $200

so the man pays $200 a month for max, lost another$ 200 to a billing bug they confirmed, did anthropic's QA work for free on his weekend, and got a "thank you for your patience" in return

check your commit history before claude code quietly drains your account too

See 3 related tweets

@yacineMTB: anthropic charges you more based on what you're working on 💀 holy shit you actually can't make this ...
@TheAhmadOsman: Anthropic is not a serious company lmao\n\nQT @om_patel5: THIS GUY LOST $200 IN ONE DAY BECAUSE THE ...
@om_patel5: THIS GUY LOST $200 IN ONE DAY BECAUSE THE STRING "HERMES.md" WAS IN HIS GIT COMMITS

HERMES.md is a ...

3. Origin_AI_01 (Group Score: 112.1 | Individual: 30.3)

Cluster: 4 tweets | Engagement: 204 (Avg: 382) | Type: Tech

Claude + HyperFrames

idea becomes motion, motion becomes output

no friction, just flow

this is the standard agent-driven creativity should meet\n\nQT @HeyGen: HyperFrames, now natively in Claude Design

Drop in the skill file, generate motion graphics, download project

ask Claude Code or run command:

$ npx hyperframes render

Design to animation to MP4, all in one flow

details in thread https://t.co/ObetDbnz8u

See 3 related tweets

@Parul_Gautam7: this is actually a pretty big shift!

design → motion → export used to be fragmented across tools

n...

@TheoBuildsAI: What stood out to me is how “direct” everything feels. You move something, it updates instantly, no ...
@TheoBuildsAI: RT @HeyGen: HyperFrames, now natively in Claude Design

Drop in the skill file, generate motion grap...

4. aakashgupta (Group Score: 98.5 | Individual: 33.1)

Cluster: 3 tweets | Engagement: 32 (Avg: 101) | Type: Tech

The math on a single PM mockup just dropped from $1,500-6,000 down to$ 2-7. Most PMs haven't repriced their workflow yet.

Old path: PM writes a brief, waits 3-7 days for a designer slot, designer spends 6-15 hours building it. Loaded cost lands at $1,500-6,000 depending on team.

New path: PM opens https://t.co/1l65nzaBLo, attaches a screenshot, types a prompt, clicks generate. 12 minutes. $2-7 in tokens. Hands off to Claude Code with design intent embedded.

That's roughly 500x cost compression and 50x speed compression. Run the same math on decks. An investor-grade deck from a design agency runs $5,000-15,000 over 2-3 weeks. Claude Design produces it in 8 minutes for$ 5-10. Brilliant cut complex pages from 20+ prompts in competing tools to 2 prompts in Claude Design. Datadog reports going from rough idea to working prototype before anyone leaves the meeting room.

Two SaaS categories just collapsed into one workflow. AI prototyping (Figma Make, Lovable, v0, Bolt, Magic Patterns) and presentations (Figma Slides, Gamma, https://t.co/JTgqzJ7mhC) both got repriced in one product launch, with brand applied automatically from your codebase. Figma keeps the design system. Claude takes the first-draft work.

Aakash's piece walks the exact setup, including the one-hour design system config that compounds across every prototype after.

The PMs running this workflow this week walk into Q4 with six months of brand-consistent prototypes compounding behind them. Everyone else is still drafting Slack messages to a designer they cannot reach.

That gap widens every Monday.\n\nQT @aakashgupta: Anthropic's former CPO had to resign from Figma's board. That's because Claude Design is not a small release. It's one of the most important Claude releases yet.

I wrote the full guide: https://t.co/zMmPTi6pks

Here's what people are missing.

Claude Design is the first AI design tool that ships a code-agent handoff. You write a brief. Claude generates a working prototype. Then it bundles the full spec for Claude Code as a structured implementation package. Brief in, working app out. No other tool in the category does this.

It learns your brand from your existing files. Upload your codebase or a Figma export and Claude pulls your design tokens, typography rules, and component patterns. The output looks like your team built it.

Look at the export menu. PPTX, PDF, HTML, Canva. Notice what's missing. "Open in Figma" was a deliberate choice about who the customer is.

The customer is the PM. Figma was sold to designers and procured by companies. Claude Design is sold to the founders, marketers, and product managers who used to need a designer to ship anything visual. That's why Mike Krieger had to step off Figma's board three days before launch. The conflict stopped being theoretical.

Figma's stock dropped 7% on launch day. The cap is structural. Figma still owns the multiplayer canvas, design system governance, and production-grade pixel output. Claude Design wins everything upstream of those. Upstream is where most PMs spend most of their week. Decks for stakeholder reviews. Wireframes for engineering discussions. Landing page mocks for marketing tests. All throughput work that never required Figma-grade polish.

This is Anthropic's vertical integration play. Claude Code for engineering surfaces. Claude Design for product surfaces. Each tool collapses a workflow stage that used to need a separate seat license, a separate vendor, and a separate handoff.

The wait time between PM and designer was always the actual product Figma sold. Anthropic just collapsed it to zero.

The PMs who start using Claude Design this week ship 3-4x faster by Q3.

See 2 related tweets

@aakashgupta: Jeff Gothelf was right that judgment is the bottleneck for PMs. That's the reason builder skills mat...
@aakashgupta: A Claude Managed Agent costs $0.08 per session-hour.

Let's do the math, because nobody else has.

A...

5. ns123abc (Group Score: 87.4 | Individual: 27.9)

Cluster: 4 tweets | Engagement: 2412 (Avg: 767) | Type: Tech

DeepSeek just permanently cut cached input prices by 10x across the entire API

139× cheaper than GPT-5.5 and 83× cheaper than Claude Sonnet 4.6 btw https://t.co/tKlgyovdOh

See 3 related tweets

@scaling01: $0.003625 for a cache hit

DeepSeek is still making intelligence too cheap to meter\n\nQT @deepseek_...

@TeksEdge: DeepSeek is going hard for developers. They just dropped input token costs to nearly zero (fractions...
@Hesamation: DeepSeek’s pricing is insane.

> $0.87 per 1M output tokens > 5.75M output tokens with the pr...

6. rickasaurus (Group Score: 86.0 | Individual: 52.7)

Cluster: 2 tweets | Engagement: 4278 (Avg: 599) | Type: Tech

RT @heynavtoor: Researchers sent the same resume to an AI hiring tool twice. Same qualifications. Same experience. Same skills. One version was written by a real human. The other was rewritten by ChatGPT.

The AI picked the ChatGPT version 97.6% of the time.

A team from the University of Maryland, the National University of Singapore, and Ohio State just published the receipt. They took 2,245 real human-written resumes pulled from a professional resume site from before ChatGPT existed, so the human writing was actually human. Then they had seven of the most-used AI models in the world rewrite each one. GPT-4o. GPT-4o-mini. GPT-4-turbo. LLaMA 3.3-70B. Qwen 2.5-72B. DeepSeek-V3. Mistral-7B.

Then they asked each AI to pick the better resume. Every model picked itself.

GPT-4o hit 97.6%. LLaMA-3.3-70B hit 96.3%. Qwen-2.5-72B hit 95.9%. DeepSeek-V3 hit 95.5%. The real human almost never won.

Then the researchers tried the obvious objection. Maybe the AI is just better at writing. So they had real humans grade the resumes for actual quality and ran the experiment again, controlling for it. The result was worse. Each AI kept picking itself even when human judges rated the human-written version as clearer, more coherent, and more effective.

It gets worse. The AIs do not just prefer AI over humans. They prefer themselves over other AIs. DeepSeek-V3 picked its own resumes 69% more often than LLaMA's. GPT-4o picked its own 45% more often than LLaMA's. Each model can recognize and reward its own dialect.

Then the researchers ran the simulation that ends careers. Same job. 24 occupations. Same qualifications. The only variable was whether the candidate used the same AI as the screening tool. Candidates using that AI were 23% to 60% more likely to be shortlisted. Worst gap was in sales, accounting, and finance.

99% of large companies now run AI on incoming resumes. Most of them use GPT-4o. The paper just proved GPT-4o picks GPT-4o 97.6% of the time.

If you wrote your own cover letter this week, you did not lose to a better candidate. You lost to a worse candidate who paid OpenAI 20 dollars.

Your qualifications do not matter if the AI prefers its own handwriting over yours.

See 1 related tweets

@itsolelehmann: If you are a company and you scan your resumes using AI, you might be screwing yourself

these are o...

7. burkov (Group Score: 78.9 | Individual: 61.4)

Cluster: 2 tweets | Engagement: 1144 (Avg: 122) | Type: Tech

A must read for anyone interested in building practical AI systems in 2026:

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

The paper explains the architecture of a modern production-grade AI agent system (Claude Code) by analyzing its source code. This is what they call a "harness" of an agentic coding system.

Learn by reading with an AI tutor: https://t.co/sailmnkDcR

PDF: https://t.co/Jvl4HRMU4y

See 1 related tweets

@aiDotEngineer: RT @CMD_LABS: "Building your own agent is like 300 lines of code. Everyone should do it."

@Geoffrey...

8. StockSavvyShay (Group Score: 71.7 | Individual: 48.3)

Cluster: 2 tweets | Engagement: 2350 (Avg: 479) | Type: Tech

THE 7 LAYERS OF PHOTONICS

Materials & Wafers (substrate layer) $AXTI,$ IQE $WOLF,$ COHR, $LWLG
Tools (fabrication layer) $ASML,$ AMAT, $LRCX
Lasers (light generation layer) $LITE,$ COHR, $LASR,$ SMTC
Foundries (manufacturing layer) $TSM,$ TSEM, $GFS,$ UMC, $INTC
Test, Inspection & Packaging (reliability layer) $AEHR,$ VIAV, $ONTO,$ AMKR, $FN
Optics (module layer) $AAOI,$ POET, $GLW
Networking (connectivity layer) $CRDO,$ MRVL, $AVGO,$ ANET, $CIEN\n\nQT @StockSavvyShay: THE OPTICAL PHOTONICS BOTTLENECK

As AI clusters scale past copper’s physical limits, the bottleneck shifts to optical & these are the companies building that layer across the stack:

$AAOI building the transceiver layer of the AI network through vertically integrated U.S.-based InP laser manufacturing. It has already secured over$ 200M in its first volume 1.6T order from one hyperscale customer followed by another $124M in 800G orders from a second.
$AEHR building the reliability layer for the optical & AI hardware stack through burn-in & test systems. It just received a record$ 41M follow-on order from its lead hyperscale customer reinforcing the idea that Sonoma is becoming a key production burn-in platform for high-power AI ASICs.
$CRDO building the connectivity layer that helps AI clusters move data faster through active electrical cables, retimers & high-speed interconnect silicon. The DustPhotonics acquisition also extends that platform into silicon photonics before copper becomes a real constraint.
$LITE building the laser layer of the AI optical stack through EMLs, optical components & optical switching exposure. The setup is backed by a$ 2B $NVDA strategic investment & optical circuit switch backlog above$ 400M with orders reportedly extending through 2028.
$VIAV building the testing & validation layer of the optical stack through network instrumentation & photonics measurement tools. It is the picks-and-shovels layer of the transition because every high-speed optical buildout still needs to be tested regardless of which transceiver vendor wins.
$COHR building one of the core photonics bottlenecks through indium phosphide lasers, optical engines & communications components tied to next-gen AI networking. It also has a$ 2B $NVDA strategic investment behind it & is doubling InP device capacity into the 1.6T ramp.
$MRVL building the DSP & optical infrastructure layer through electro-optics, PAM DSPs, interconnect silicon & custom networking chips. The Celestial AI deal & NVLink Fusion exposure both strengthen its position as photonics becomes more central to AI cluster design.

See 1 related tweets

@StockSavvyShay: 2026 IS THE YEAR OF PHOTONICS

• $GLW glass fiber & cable infrastructure for optical networks •$ MRV...

9. jukan05 (Group Score: 66.9 | Individual: 23.6)

Cluster: 3 tweets | Engagement: 287 (Avg: 365) | Type: Tech

According to Jeff Pu's estimates, Intel is expected to win a portion of the AI6 orders that were previously anticipated to be awarded exclusively to Samsung. https://t.co/cRM1i5BO2q\n\nQT @jukan05: 《GF Overseas Electronics & Telecom》 ☄️ Intel (INTC US, Buy): Earnings Beat — From Recovery to Strength

☀️ Price target raised to $94.2: Despite already elevated market expectations, Intel's 1Q results still meaningfully beat, even though 2Q guidance was somewhat conservative. Many in the market view this earnings call as carrying far greater constructive significance than Intel's routine updates. In fact, several brokers have already followed up with rating upgrades, as the narrative shifts from a "turnaround story" to "AI beneficiary + external foundry option." Overall, we believe Intel's core CPU business will remain strong, with the key watch points being the tightness of substrate and silicon capacitor supply, as well as order wins and execution in the foundry business — and as our prior updates have noted, both are progressing well. Reflecting the results, guidance, and higher margins, we revise our 2026E/2027E/2028E EPS forecasts to$ 1.5/ $2.4/$ 3.1, and raise our price target from $78.5 to$ 94.2, based on 3.5x 2027 P/B.

☀️ All-around beat: 1Q revenue of $13.6B came in$ 1.4B above the guidance midpoint and above consensus of $12.4B. Non-GAAP gross margin was 41%, above both guidance and consensus of 35%, with EPS of$ 0.29 (guidance: breakeven). Operating cash flow was $1.1B, while adjusted free cash flow was -$ 2.0B. 2Q guidance: revenue $13.8-14.8B (+2% to +9% QoQ) vs. consensus of$ 13.0B; Non-GAAP gross margin of 39%, reflecting a higher Panther Lake mix; EPS guidance of $0.20, with DCAI expected to deliver double-digit QoQ growth.

☀️ Foundry on track: As we have repeatedly emphasized since our initiation report in July 2025 and our pre-earnings preview on April 16, we remain bullish on strong customer engagement from Apple, NVIDIA, and AMD on 18A-P (primarily 14A), and we believe Intel will secure a portion of Tesla's AI6 program on 14A by end-2028. On execution, management indicated that 18A yields are running better than internal expectations. External foundry revenue remains small at $174M, with an operating loss of$ 2.4B, though improving by $72M QoQ. Management expects losses to continue narrowing through the year.

☀️ CPU continues to strengthen: Intel argues that AI workloads are shifting from training toward inference, agentic AI, robotics, physical AI, and edge AI, making the CPU increasingly important as the orchestration layer of the AI stack. Management noted that server CPU demand has improved materially over the past 90 days, and expects both the industry and Intel to deliver double-digit shipment growth in 2026, with the momentum extending into 2027. As flagged in our report, the 2Q26 CPU price hike was already within expectations, and we expect another 5-10% price increase by the end of 3Q. We now forecast DCAI to grow 39%/15% YoY in 2026E/2027E.

$INTC

See 2 related tweets

@aakashgupta: Tesla's next-gen AI6 chip is reportedly going to Intel 14A in late 2028. Apple, Nvidia, and AMD are ...
@jukan05: 《GF Overseas Electronics & Telecom》 ☄️ Intel (INTC US, Buy): Earnings Beat — From Recovery to Stren...

10. jasonlk (Group Score: 66.7 | Individual: 33.4)

Cluster: 2 tweets | Engagement: 40 (Avg: 36) | Type: Tech

I think at a practical level, designers will become a luxury

You want them

But they are expensive, and slow, and you won’t roll them out for many non-core features or assets.

And net net … use them more sparingly

(Already true for us. We have gone from 3 designers at peak to 0.1 humans and net output is as good)\n\nQT @gokulr: DESIGN: THE FIRST AI CASUALTY

I'm increasingly sure that 2026 signals the end of product design as a full-fledged stand-alone function within companies. If so, it will be the first role / function to be eliminated by AI on a go-forward basis.

Instead of hiring FT designers, startups are hiring / will hire design consultants to create a design system that the founder likes (this takes a few weeks max). Once the design system is finalized, PM/Eng feed it into their AI tool of choice to generate prototypes. The design system is refreshed annually by the same consultant.

Larger companies will likely not backfill design roles and will do some targeted attrition to reduce the design department to 20% the size it is today.

If you're a designer, I think you have two choices:

Become an entrepreneur: Start a design agency and become the go-to resource for design systems for startups and even larger companies. This can be a good recurring revenue business.
Become a builder: Add PM/Eng responsibilities to become a product builder.

Would suggest you embrace this proactively vs waiting for the other shoe to drop.

I'm really sorry about this - some of my best friends and the people I admire most and have learnt the most from are designers - but it seems inevitable.

See 1 related tweets

@owenbjennings: frequently agree with gokul, but disagree here

my view: standalone function and even more impt in...

11. scaling01 (Group Score: 66.5 | Individual: 17.5)

Cluster: 5 tweets | Engagement: 140 (Avg: 218) | Type: Tech

there's a chance ARC-AGI-3 is already solved with GPT-5.5-xhigh + tools\n\nQT @scaling01: 62.1% on ARC-AGI-3

would be the score if they used the same scoring as ARC-AGI-1/2 https://t.co/LmW502PhLR

See 4 related tweets

@fchollet: No, the top score if you didn't account for action efficiency would be 100%, achievable with 20 line...
@scaling01: what an incredibly useful benchmark https://t.co/qpAzYimvQk\n\nQT @scaling01: there's a chance ARC-A...
@fchollet: (we tested this, it scored sub-1%)\n\nQT @scaling01: there's a chance ARC-AGI-3 is already solved wi...
@WesRoth: RT @WesRoth: OpenAI’s GPT-5.5 achieved state-of-the-art status on the highly rigorous ARC-AGI-2 benc...

12. business (Group Score: 63.7 | Individual: 63.7)

Cluster: 1 tweets | Engagement: 749 (Avg: 70) | Type: Tech

Elon Musk says he’s nearing his long-stated goal of turning X into an “everything app” with the imminent launch of a new financial services tool, X Money https://t.co/SypY8Y47g1

13. steipete (Group Score: 63.2 | Individual: 33.0)

Cluster: 2 tweets | Engagement: 230 (Avg: 370) | Type: Tech

very this.\n\nQT @badlogicgames: i'm sort of addicted to working my butt off, always have been. in oss, that can consume you. constant feeling of urgency, as issues stream into the repo. been there many, many times with my other oss.

but that urgency is not real. if something is truely broken, a large number of people will scream at you on all channels. which has happened exactly zero times so far, or was caught minutes after a botched release and immediately fixed.

it's kind of crazy that some people expect better support from an oss project than from commercial software. i think that's largely due to most commercial software corps not giving a fuck. try filing an issue with corporate and getting it fixed within 24h or less plus a personal response.

and as oss builders don't have a corporate facade shielding them from direct contact with users, some sort of bidirectional parasocial relationship establishes itself. at a certain scale, that becomes entirely unhealthy.

for every 10 kind and thoughtful people, there is 1 asshole. and whatever the asshole says or feels entitled to, sticks with you much more than positive feedback.

obv. also happens in corpo environments, especially if you do comms or dev rel, where you put your face and name out there.

but a corp that can afford dev rel usually also has a large team in the back, which can soften the negative aspects.

in oss, you are largely on your own. and unpaid. that too is a choice of course, and nobody is forcing anyone to do oss.

but if you want oss to work, consider that there are other people at the end of that issue tracker/social media account, with lives and squishy human parts. also consider that you are paying nothing for their service, and you are owed exactly nothing, neither code nor attention to your every wish.

See 1 related tweets

@nummanali: Well said OSS is no game\n\nQT @badlogicgames: i'm sort of addicted to working my butt off, always h...

14. TheAhmadOsman (Group Score: 59.6 | Individual: 31.2)

Cluster: 2 tweets | Engagement: 281 (Avg: 254) | Type: Tech

How to go about learning all of this?

1st: Start with the serving engine view

vLLM: PagedAttention, continuous batching, prefix caching, CUDA graphs
SGLang: RadixAttention/prefix reuse, speculative decoding, MoE, structured/agent workloads
TensorRT-LLM: NVIDIA peak stack, FP8/FP4, Wide-EP, disaggregated serving
FlashInfer: reusable kernel/operator library for attention/GEMM/MoE/sampling

2nd: Go down the stack

Triton tutorials → custom fused kernels
CUTLASS/CuTe → Tensor Core GEMM and Blackwell/Hopper details
FlashAttention papers → attention algorithm/kernel co-design
PagedAttention paper → KV-cache memory management
MoE docs → routing + grouped GEMM + all-to-all
Nsight profiling → stop guessing

3rd: Do this mini-project sequence

Implement RMSNorm in Triton; compare to PyTorch
Implement fused SiLU × gate
Implement simple FP16 matmul; compare to cuBLAS/rocBLAS
Implement paged KV lookup for decode attention
Add FP8 KV cache with per-block scales
Implement toy top-k sampling on GPU
Implement tiny MoE dispatch + grouped GEMM
Integrate one custom op into vLLM or SGLang and profile end-to-end\n\nQT @TheAhmadOsman: You don’t “run a model” You run Kernels

The model is just a graph

The Inference Engine is scheduler / optimizer / executor

But the actual work? That happens in the Kernels

MatMul Kernels
Attention Kernels
RMSNorm Kernels
KV cache Kernels
Quantized linear Kernels
Sampling Kernels
Fused “please don’t write this back to memory 9 times” Kernels

Same model, same GPU, same VRAM Wildly different performance

Because one stack is using optimized fused Kernels that understand your hardware

And the other stack is playing hot potato with tensors through 47 tiny launches and pretending the GPU is the problem

Bad Kernels make people say: “this model is slow”

Good Kernels make people say: “wait how is this running locally?”

This is why Inference Engines and the Kernels implemented within them matter

The model is the recipe The hardware is the kitchen The Kernels are the knives, pans, burners, and the chef not cutting onions with a spoon

Most people benchmark models The real ones benchmark the Kernels underneath

See 1 related tweets

@TheAhmadOsman: Let's dive deeper

Do you know that 75% of Qwen 3.5 27B layers are DeltaNet (linear attention) and n...

15. petergyang (Group Score: 57.5 | Individual: 33.1)

Cluster: 2 tweets | Engagement: 32 (Avg: 103) | Type: Tech

How @tibo_maker turned a $2K MRR failing product into a$ 600K MRR business:

"When I acquired Typeframe, it was doing $2K MRR. I spent money on this product and I just wanted to not be wrong.

This is the #1 mistake that founders make. It's more important for them to not be wrong than to be successful.

If you force the selling of your products, you're not listening to people telling you there's another opportunity that might be bigger."

After this, Tibo noticed that people wanted to make viral shorts on social media, so he pivoted the product to Revid and now it's making $600K+ MRR.

📌 Watch him talk more about it here: https://t.co/z6PH1F4JgZ\n\nQT @petergyang: "I shipped 9 failed products before one took off...now I'm doing $1M+/month."

Here's my new episode with @tibo_maker, a solo founder who bootstrapped 5 AI products to $1M+ / month.

Tibo walked me through his exact playbook:

✅ How to validate ideas and fail fast ✅ Why his top acquisition channel is still SEO ✅ The pricing sweet spot for AI products

Some quotes from Tibo:

"When people twist your product into something else, that's a very strong signal you have to follow."

"It's easy to lie to yourself [with free users], but if there's no stickiness in the revenue, it's very hard to build a successful business."

"I'm convinced right now that just one person can do the job of 20 people."

📌 Watch now: https://t.co/N6b950Xc5p

Thanks to our sponsors:

@WisprFlow: Don't type, just speak https://t.co/oqHJ8bN3ll

@linear: The AI agent platform for modern teams https://t.co/tgWf9oL4bs

See 1 related tweets

@petergyang: RT @petergyang: My next guest is making $1M+ a month (!) from 5 AI products that he built as a solo ...

16. teortaxesTex (Group Score: 57.4 | Individual: 32.1)

Cluster: 2 tweets | Engagement: 33 (Avg: 56) | Type: Tech

V4 is "mediocre frontier" on MRCRv2. Between Opus 4.6) (above) and opus 4.7 (below). In the paper, they say CorpusQA 1M is more interesting for them than MRCR. I wonder how GraphWalks looks. https://t.co/Jghp5Va8WV\n\nQT @DillonUzar: New https://t.co/gLEWzxoXWG is live!

70 model-variants. 8-needle GDM-MRCRv2. Interactive leaderboard. Free, no login.

What you can do:

Compare models across context bins with line and bar charts - with 95% confidence intervals (a couple more types of charts are coming)
Filter by provider, reasoning tier, or use presets (Best, Reasoning, Non-Reasoning)
Sort by AUC, pointwise scores, cost, or token efficiency
Hover any model for metadata: provider, reasoning levels, release date, run count, cost breakdown
Toggle heatmap coloring, rankings, and on-demand cost columns
Export to CSV or screenshot the current view directly

The FAQ walks through what GDM-MRCRv2 is, how scoring works, what AUC measures, and why 8-needle is the tier that separates frontier models. Includes a step-by-step visual explainer of how a real test is built and scored. We'll be fleshing this out further over time, and improving the visuals.

This is still very much a work in progress (might feel a little more bare compared to the old website), but more charts and screens to come, for example:

View each test result for a model (we even record the streamed chunks in case people want some data from that).
Bias analysis from the old website.

Current top 5 by AUC @ 128k (best tier per model):

GPT-5.5 (xhigh): 91.7%
GPT-5.5 (high): 88.2%
GPT-5.5 (medium): 87.5%
GPT-5.5 (low): 83.3%
Claude Opus 4.6 (medium): 81.0%

Current top 5 by AUC @ 1M (best tier per model):

GPT-5.5 (medium): 50.9%
GPT-5.5 (xhigh): 50.5%
GPT-5.5 (high): 50.2%
GPT-5.5 (low): 47.3%
Claude Opus 4.6 (high): 46.9%

NOTE: Bins with no scores count as 0% for AUC calc.

More models being added regularly. Suggestions welcome.

https://t.co/gLEWzxoXWG

@OpenAI @AnthropicAI @GoogleDeepMind @deepseek_ai @Kimi_Moonshot @Xiaomi @Zai_org

See 1 related tweets

@scaling01: RT @DillonUzar: New https://t.co/gLEWzxoXWG is live!

70 model-variants. 8-needle GDM-MRCRv2. Intera...

17. pmarca (Group Score: 56.6 | Individual: 32.1)

Cluster: 2 tweets | Engagement: 1788 (Avg: 1057) | Type: Tech

When something becomes abundant and cheap, someone else becomes scarce and valuable.\n\nQT @tengyanAI: something i've noticed: AI agents create a weird new kind of burnout. esp for young people.

a lot of ambitious 22 year olds are going to think the answer is simple:

spin up more agents
ship more code
sleep less
outwork everyone

and for a while, it will feel incredible. you can keep multiple agents running, feed them tasks, review outputs, fix mistakes, make decisions, and keep the whole loop moving.

the problem is that the work no longer drains you through typing. it drains you through judgment. More attention. More context switching. More verification. More decisions per hour.

so instead of 8-10 normal productive hours, you might get 4-5 extremely intense hours before your brain is fully cooked. and you feel numb until you sleep properly and reset

some of my friends are already burnt out. they don't say it out loud but i can tell.

the agent can keep working 24/7. the human still has a hard limit

See 1 related tweets

@ferologics: RT @tengyanAI: something i've noticed: AI agents create a weird new kind of burnout. esp for young p...

18. hammer_mt (Group Score: 55.8 | Individual: 30.3)

Cluster: 2 tweets | Engagement: 6 (Avg: 605) | Type: Tech

Saying a skill is just an .md file is like saying the constitution is just a piece of parchment.\n\nQT @osekkat: I’ve seen people on X dunking on folks like @garrytan @doodlestein and others for sharing SKILL dot md files they've built. They are dismissing these files as "just a markdown file.”

I think this misses the point entirely and I'll try to address that here. Quick thread:

A bad skill file is just text, sure.

A good skill file is compressed expertise, packaged in a format an agent can actually use.

The value is not just in the “markdown file.” The value is the interaction between:

a huge neural network with latent capabilities a precise, reusable, agent-readable procedure that steers those capabilities toward a specific outcome

That combination is the product.

Saying “it’s just markdown” is like saying Hamlet is “just ink on paper,” or Einstein’s relativity paper was “just a text.”

Technically true. Intellectually useless.

The medium is simple. The content is what matters. And more importantly, the effect of that content on the reader is what matters.

With humans, a book, a coach, a lecture, or painting can change how someone thinks and acts.

With LLMs, text is also the control surface. These models were trained on text, reason through text, call tools through text, and follow procedures through text.

So yes, the skill is “just text.”

But it is text designed to be read by an enormous neural net.

That matters.

A good skill is agent-ergonomic. It does not merely say “do this better.” It encodes workflow, constraints, examples, edge cases, tool usage, failure modes, and success criteria in a way the agent can reliably execute.

That is very different from a casual prompt.

A prompt is often a one-off request.

A skill can be reused, versioned, tested, improved, shared, and loaded at the exact moment an agent needs it.

That turns “vibes-based prompting” into something closer to operational knowledge.

Another way to think about it:

We have built these massive models, but much of their power is latent. Different people can extract very different levels of performance from the same model.

A good skill is a way to actualize a specific slice of that latent capability.

A refactoring skill. A research skill. A legal review skill. A math explanation skill. A codebase-navigation skill. Each one can make the same model behave very differently. I think of Cus D’Amato and Mike Tyson. Tyson had enormous latent potential. But Cus gave him a system, a style, a discipline, a way to channel that potential.

That’s what good skills are for agents.

They are not magic. They are not all equally valuable. Many will be mediocre or useless.

But dismissing them right off the batt because they are “just markdown” shows a misunderstanding of what LLMs are.

Text is how we trained these systems. (for the most part)

Text is how we steer them.

Text is how we unlock parts of what they can do.

The question is not whether a skill file is “just text.”

The question is whether the text reliably makes the model perform better at a valuable task.

If yes, then it is not “just markdown.”

It is leverage.

See 1 related tweets

@garrytan: RT @osekkat: I’ve seen people on X dunking on folks like @garrytan @doodlestein and others for shar...

19. chddaniel (Group Score: 54.5 | Individual: 27.8)

Cluster: 2 tweets | Engagement: 8 (Avg: 13) | Type: Tech

this is f*king scary guys..........\n\nQT @chhddavid: Introducing Shipper: The world’s first AI Business Builder.

Shipper outperforms humans 100% of the time.

RT + Comment “SHIPPER” and I’ll randomly send out free credits. https://t.co/7lfWzMnIuo

See 1 related tweets

@chhddavid: mf delete this\n\nQT @chddaniel: Introducing Shipper.

The first AI business builder that outperform...

20. aakashgupta (Group Score: 52.6 | Individual: 35.3)

Cluster: 2 tweets | Engagement: 91 (Avg: 101) | Type: Tech

Jane Street made more operating profit last year than Walmart. Walmart has 2.1 million employees. Jane Street has 3,500.

That puts a market-making firm most people have never heard of at #13 on the list of America's most profitable companies. Ahead of Walmart. Ahead of Verizon. Ahead of Broadcom. Ahead of Visa.

The trading numbers are more absurd. Jane Street pulled $39.6 billion in trading revenue in 2025. JPMorgan, the largest bank in America, did$ 35.8 billion in trading that same year. Goldman Sachs did $31.1 billion. A private firm with no IPO and no outside capital out-traded every major investment bank on Wall Street.

Per-employee operating profit: Walmart: $14,000 Microsoft:$ 564,000 Visa: $760,000 Apple:$ 810,000 Jane Street: $8,900,000

That's 11x Apple, 16x Microsoft, and 635x Walmart. Per head.

The mechanism is structural. Jane Street is built as an ETF market maker wrapped around an in-house technology stack written in OCaml. They quote prices on tens of thousands of securities at once, capture the spread, and hedge the residual exposure. The product they sell is liquidity. The moat is the latency and the breadth of their book.

Volatility is the fuel. When Trump rolled out tariffs in Q2 2025, ETF flows went vertical and bid-ask spreads widened across every asset class. Jane Street made $10.1 billion in that single quarter. The model assumes the counterparty has to transact under stress. Jane Street is the one calm enough to quote back.

This is also why you don't see Jane Street on a stock exchange. At $9 million per head in operating profit, the partners will never sell. There is no public valuation that lets them keep what they keep now. Going public would be a tax on themselves.

The 12 companies ranked above Jane Street on this list employ roughly 3.7 million people combined. Jane Street has 3,500.

See 1 related tweets

@aakashgupta: RT @cgtwts: > be Jane Street

3,500 employees barely known outside finance still makes more ope...