Published on

热门技术推文精选 - 2026年4月24日

Authors

科技每日简报 (2026-04-24)

Today's top tech conversations are led by @ajambrosino, whose post about 'RT @OpenAI: Introducing GPT-5....' garnered the highest engagement. Key themes trending across the top stories include openai, https, government, chatgpt, shipper. The community is actively discussing recent developments in AI, engineering practices, and startup strategies.


1. ajambrosino (Group Score: 1025.0 | Individual: 58.3)

Cluster: 33 tweets | Engagement: 9300 (Avg: 470) | Type: Tech

RT @OpenAI: Introducing GPT-5.5

A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.

Now available in ChatGPT and Codex. https://t.co/rPLTk99ZH5

See 32 related tweets

  • @nvidia: We're watching AI evolve in real time. GPT-5.5 isn't just a launch - it's another proof point that ...
  • @AlexFinn: Drop what you're doing. It happened.

ChatGPT 5.5 is out and it beats Opus 4.7 in almost every bench...

  • @cryptopunk7213: my god. Openai just dethroned claude 💀

GPT 5.5 crushes opus 4.7 across almost every benchmark. this...

  • @teortaxesTex: about what was expected probably more of a 4o moment than 4 moment they'll push this new model far, ...
  • @chddaniel: Big news, @OpenAI GPT-5.5 just got a major upgrade today which I'm happy to introduce in shipper. Fr...

2. theobearman (Group Score: 313.1 | Individual: 55.0)

Cluster: 13 tweets | Engagement: 2946 (Avg: 221) | Type: Tech

RT @mkratsios47: The U.S. has evidence that foreign entities, primarily in China, are running industrial-scale distillation campaigns to steal American AI. We will be taking action to protect American innovation.

These foreign entities are using tens of thousands of proxies and jailbreaking techniques in coordinated campaigns to systematically extract American breakthroughs. Foreign entities who build on such fragile foundations should have little confidence in the integrity and reliability of the models they produce.

The U.S. government is committed to the free and fair development of AI technologies across a competitive ecosystem, from open-source to proprietary models.

Read the memo: https://t.co/w0BWxJdaLn

See 12 related tweets

  • @MarioNawfal: 🇺🇸🇨🇳 The U.S. just called out China for industrial-scale AI theft.

White House memo: Chinese entiti...

  • @TFTC21: Michael Kratsios, Director of the White House Office of Science and Technology Policy, issued a memo...
  • @teortaxesTex: > The U.S. government is committed to the free and fair development of AI technologies across a c...
  • @KobeissiLetter: BREAKING: The US has accused China of "industrial-scale" theft of AI and warned that it will be crac...
  • @AndrewCurran_: From the letter: https://t.co/ssi8GkbQLm\n\nQT @mkratsios47: The U.S. has evidence that foreign enti...

3. chhddavid (Group Score: 282.0 | Individual: 36.4)

Cluster: 8 tweets | Engagement: 21 (Avg: 13) | Type: Tech

THIS IS INSANE!!

some 22/yo kid killing the entire vibe coding industry.\n\nQT @shipper_now: Introducing Shipper: Vibe Coding that’s lightyears ahead of Lovable, Bolt, Cursor...

We raised $0 and built the future of vibe coding, in public.

To do it, we asked builders to create apps side by side on Lovable, Cursor, and Shipper... and Shipper consistently came out ahead in one thing that actually matters: turning apps into real, working businesses.

Here’s why:

• End-to-end execution: Shipper doesn’t just generate code - it builds, deploys, fixes, and maintains the full app across frontend, backend, and infrastructure.

• Shipper runs your product, interacts with it like a user, finds issues across the stack, and fixes them automatically.

Lovable and Cursor help you test ideas, but Shipper helps you launch, iterate, and actually make money from what you build.

Marcus built a niche app to an ~7.5k+monthlyrunrateinweeksEthanturnedasimpletoolinto7.5k+ monthly run rate in weeks Ethan turned a simple tool into 12k+ in revenue without writing code Luca built a suite of apps now doing $30k+ combined

Build your app with Shipper: https://t.co/KDpe1s8hSl

See 7 related tweets

  • @Shipper_now: Today, we ended vibe coding forever.

I just watched my Mac one-shot a $44B company in 183 seconds. ...

  • @Shipper_now: Introducing Shipper: Vibe Coding that’s lightyears ahead of Lovable, Bolt, Cursor...

We raised $0 a...

  • @chhddavid: this is genuinely unsettling...\n\nQT @shipper_now: Introducing Shipper: Vibe Coding that’s lightyea...
  • @chddaniel: this is f*king scary....\n\nQT @shipper_now: Introducing Shipper: Vibe Coding that’s lightyears ahea...
  • @chddaniel: THIS IS INSANE!!

some 22/yo kid just ended the vibe coding industry as a whole.\n\nQT @shipper_now:...


4. gdb (Group Score: 249.0 | Individual: 36.7)

Cluster: 10 tweets | Engagement: 3840 (Avg: 1677) | Type: Tech

Introducing ChatGPT for Clinicians: https://t.co/6kSGjGITA6\n\nQT @thekaransinghal: Today we’re introducing two big steps for health at OpenAI:

  • ChatGPT for Clinicians, a free version of ChatGPT designed for clinical work
  • HealthBench Professional, a new benchmark to evaluate real clinician chat tasks

We’re excited about what this can unlock for care. ❤️ https://t.co/FeBWhHQPiw

See 9 related tweets

  • @fidjissimo: So excited for these launches and grateful for the team working hard on this.\n\nQT @thekaransinghal...

  • @Miles_Brundage: RT @thekaransinghal: Today we’re introducing two big steps for health at OpenAI:

  • ChatGPT for Clin...

  • @chatgpt21: New Healthcare benchmarks just dropped!!

Particularly really excited about these!\n\nQT @thekaransi...

  • @FirstSquawk: OPENAI: ROLLS OUT CHATGPT FOR CLINICIANS; OFFERING TOOL FREE TO VERIFIED US DOCTORS, NPs, PAs, AND P...
  • @Mayhem4Markets: I'm really excited for the future of healthcare. https://t.co/HzIqBg1ssK\n\nQT @FirstSquawk: OPENAI:...

5. zerohedge (Group Score: 222.2 | Individual: 40.4)

Cluster: 7 tweets | Engagement: 3608 (Avg: 995) | Type: Tech

RT @HHShkMohd: Under the directives of the President of the UAE, we launch a new government model. Within two years, 50% of government sectors, services, and operations will run on Agentic AI, making the UAE the first government globally to operate at this scale through autonomous systems.

AI is no longer a tool. It analyses, decides, executes, and improves in real time. It will become our executive partner to enhance services, accelerate decisions, and raise efficiency.

This transformation has a clear timeline. Two years. Performance across government will be measured by speed of adoption, quality of implementation, and mastery of AI in redesigning government work.

We are investing in our people. Every federal employee will be trained to master AI, building one of the world’s strongest capabilities in AI-driven government.

Implementation will be overseen by Sheikh Mansour bin Zayed, with a dedicated taskforce chaired by Mohammad Al Gergawi driving execution.

The world is changing. Technology is accelerating. Our principle remains constant. People come first. Our goal is a government that is faster, more responsive, and more impactful.

See 6 related tweets

  • @simonw: Within two years you'll be able to prompt inject an entire country\n\nQT @HHShkMohd: Under the direc...
  • @Grady_Booch: This will not end well.\n\nQT @HHShkMohd: Under the directives of the President of the UAE, we launc...
  • @ianmiles: Holy smokes! The UAE is going to transform its entire government with agentic AI.\n\nQT @HHShkMohd: ...
  • @damianplayer: this is wild!

the UAE just committed to running 50% of its government on agentic AI in 2 years.

o...

  • @Kyrannio: Holy based\n\nQT @HHShkMohd: Under the directives of the President of the UAE, we launch a new gover...

6. testingcatalog (Group Score: 176.1 | Individual: 34.5)

Cluster: 9 tweets | Engagement: 984 (Avg: 448) | Type: Tech

OPENAI 🚨: SPUD, AKA GPT-5.5 HAS BEEN TEASED BY OPENAI, WHICH POINTS AT A HIGH CHANCE OF THURSDAY’S RELEASE.

This teaser follows earlier preparation when internal model names became visible on Codex for a short amount of time.

Earlier this week, a similar image post was used to tease ChatGPT Agents.\n\nQT @ChatGPTapp: https://t.co/6tYFIQVsHp

See 8 related tweets

  • @FirstSquawk: OPENAI: INTRODUCING GPT-5.5, DESCRIBED AS HAVING THE STRONGEST SET OF SAFEGUARDS TO DATE, ROLLING OU...
  • @VaibhavSisinty: OpenAI is sooo back!

Spud has dropped finally.

I am pretty sure I was using 5.5 via codex since la...

  • @simonw: GPT-5.5 may not be in the official OpenAI API... but it's available via the apparently approved-of C...
  • @FirstSquawk: OPENAI: GPT-5.5 PRO IS ROLLING OUT TO PRO, BUSINESS, AND ENTERPRISE USERS IN CHATGPT, WITH BOTH MODE...
  • @TFTC21: OpenAI just released GPT-5.5.

Now rolling out to Plus, Pro, Business, and Enterprise users in Chat...


7. realsanketp (Group Score: 173.8 | Individual: 44.5)

Cluster: 8 tweets | Engagement: 2908 (Avg: 116) | Type: Tech

RT @ClaudeDevs: Over the past month, some of you reported Claude Code's quality had slipped. We investigated, and published a post-mortem on the three issues we found.

All are fixed in v2.1.116+ and we’ve reset usage limits for all subscribers.

See 7 related tweets

  • @theo: Confirmed that Claude Code got dumber, not Claude. They shipped slop and it made the models worse. h...
  • @scaling01: Anthropic reset usage limits for all subscribers\n\nQT @ClaudeDevs: Over the past month, some of you...
  • @edzitron: it's wild that they did not acknowledge this the entire time - Boris even went around saying nothing...
  • @nummanali: TLDR

March 4th > April 7th

  • Default reasoning was medium over high

March 26th > April 10th...

  • @ShanuMathew93: Dario finally said enough! no more nerfing! (tbd - i'll be back to you in a week or so) https://t.co...

8. Scobleizer (Group Score: 154.8 | Individual: 54.0)

Cluster: 5 tweets | Engagement: 5168 (Avg: 534) | Type: Tech

RT @zan2434: Imagine every pixel on your screen, streamed live directly from a model. No HTML, no layout engine, no code. Just exactly what you want to see.

@eddiejiao_obj, @drewocarr and I built a prototype to see how this could actually work, and set out to make it real. We're calling it Flipbook. (1/5)

See 4 related tweets

  • @tobi: This is the beginning of something really big\n\nQT @zan2434: Imagine every pixel on your screen, st...
  • @simonw: This is a lot of fun to try out - https://t.co/3Rr8qQHQ0U\n\nQT @zan2434: Imagine every pixel on you...
  • @BrianRoemmele: Boom!

Every pixel on your screen, streamed live directly from an AI model!

No code.

Just exactly...

  • @RoundtableSpace: Imagine every pixel on your screen streamed live directly from a model. No HTML, no layout engine, n...

9. aakashgupta (Group Score: 151.9 | Individual: 39.5)

Cluster: 5 tweets | Engagement: 117 (Avg: 96) | Type: Tech

The pricing on GPT-5.5 tells the entire story if you run the math.

GPT-5 launched in August at 0.63permillioninputtokens.GPT5.4hitinMarchat0.63 per million input tokens. GPT-5.4 hit in March at 2.50. GPT-5.5, seven weeks later, costs $5.00. That's an 8x increase in input pricing across 8 months while the models improved incrementally each cycle.

Nvidia says its latest chips cut inference costs up to 35x per token. OpenAI's cost basis is cratering. Their prices are climbing. The margin expansion happening here is unlike anything in enterprise software history.

900 million weekly active users. 50 million subscribers. 9 million paying business customers. At 20/month,thesubscriberbasealonegeneratesroughly20/month, the subscriber base alone generates roughly 12 billion in annualized revenue. The API price hike targets the developers building agents on top of OpenAI's infrastructure. Every AI startup paying 2x for GPT-5.5 inference is funding OpenAI's own competing products.

Brockman said the quiet part out loud: they're building a "superapp" combining ChatGPT, Codex, and the browser into one platform. Every developer building an agent on GPT-5.5 is paying OpenAI to build the thing that eventually replaces them.

The 7-week release cycle compounds switching costs faster than any competitor can match. Release fast enough that customers rebuild their prompts and pipelines for your format, then charge more each cycle because they can't leave.

OpenAI found the business model. And it looks a lot like the one that made Microsoft $3 trillion.\n\nQT @OpenAI: Introducing GPT-5.5

A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.

Now available in ChatGPT and Codex. https://t.co/rPLTk99ZH5

See 4 related tweets

  • @MatthewBerman: Biggest improvements in 5.5: > Personality (more natural, more concise) > Token efficiency (bi...
  • @scaling01: I think my scenario of OpenAI being ahead by 1-3 months by end of year is more likely after this lau...
  • @rohanpaul_ai: GPT-5.5 vs GPT-5.4 (per 1M tokens):

GPT-5.4 Input 5.00vs5.00 vs 2.50 | Output 30.00vs30.00 vs 15.00

GPT-5 ...

  • @wallstengine: OPENAI RELEASED GPT-5.5, now live for Plus, Pro, Business, and Enterprise users.

The model hit 82....


10. business (Group Score: 146.0 | Individual: 19.7)

Cluster: 10 tweets | Engagement: 159 (Avg: 72) | Type: Tech

Meta plans to cut 10% of workers, or roughly 8,000 employees, in an effort to boost efficiency and offset its heavy spending on artificial intelligence. https://t.co/Frab4VYE5N

See 9 related tweets

  • @StockSavvyShay: $META is cutting 10% of its workforce by eliminating ~8,000 jobs as part of its efficiency push.

It...

  • @wallstengine: $META plans to cut about 10% of its workforce, or roughly 8,000 jobs, and eliminate 6,000 open roles...
  • @business: Meta and Microsoft have both taken drastic actions to trim their workforces in an effort to streamli...
  • @Techmeme: Memo: Meta plans to cut 10% of workers, or ~8,000 jobs, on May 20 and won't fill 6,000 open roles, i...
  • @FirstSquawk: META PLATFORMS TO ELIMINATE APPROXIMATELY 8,000 EMPLOYEES AND 6,000 OPEN POSITIONS, CUTTING ABOUT 10...

11. thsottiaux (Group Score: 145.3 | Individual: 32.1)

Cluster: 7 tweets | Engagement: 4675 (Avg: 3521) | Type: Tech

Introducing GPT-5.5... together with a ton of new Codex features (more on those in the next hour).

Included in all paid plans and coming to API soon. Update your Codex app or CLI to use it.

https://t.co/lgvo3KErjt

See 6 related tweets

  • @synthwavedd: 🚨 Codex app update just added this string:

"GPT-5.5 is now available in Codex. It's our strongest a...

  • @dkundel: RT @OpenAIDevs: With GPT-5.5, Codex now gets more of the job done across the browser, files, docs, a...
  • @romainhuet: Codex + GPT-5.5 is an incredible combo.

A great model needs a great space to work. Codex gives GPT-...

  • @mark_k: GPT-5.5 is here from @OpenAI 🔥🔥

The model is rolling out right now to all ChatGPT and Codex account...

  • @ajambrosino: big new stuff in Codex today— take a look! with love,\n\nQT @OpenAIDevs: With GPT-5.5, Codex now get...

12. wallstengine (Group Score: 133.0 | Individual: 29.5)

Cluster: 7 tweets | Engagement: 245 (Avg: 136) | Type: Tech

Microsoft $MSFT plans its first voluntary employee buyout program in the company’s 51-year history, offering a one-time retirement package to some U.S. employees whose age plus years of service total at least 70, while also changing how stock and cash bonuses are awarded. https://t.co/IbjNf3rzPy

See 6 related tweets

  • @FirstSquawk: MICROSOFT PLANS FIRST VOLUNTARY EMPLOYEE BUYOUT IN COMPANY’S 51-YEAR HISTORY – CNBC...
  • @unusual_whales: Microsoft is offering its first ever voluntary buyout to about 7% of its staff.

Eligible US employe...

  • @moneycontrolcom: #Business | 🚨 Microsoft plans first-ever voluntary employee buyout, offers retirement to about 7% of...
  • @zephyr_z9: WELP!!!\n\nQT @financialjuice: Microsoft plans first voluntary employee buyout in the company's 51-y...
  • @Techmeme: Microsoft announces the first voluntary retirement program in its 50-year history, for US staffers w...

13. Parul_Gautam7 (Group Score: 127.4 | Individual: 32.8)

Cluster: 5 tweets | Engagement: 81 (Avg: 70) | Type: Tech

We’re watching headcount quietly get replaced by systems that just run.

< No hiring < No onboarding < No overhead layers

just execution happening continuously at scale.

@getsurething nailed this shift.\n\nQT @getsurething: WE JUST PROCESSED OUR 1,000,000TH EMAIL.

In human terms, that's:

> ~16,000 hours of inbox work > ~8 full-time employees years > 0insalary,0 in salary, 0 in onboarding, $0 in 1:1s

We didn't hire a team. We shipped one.

photo credit: brand-new ChatGPT Images 2.0 https://t.co/eQYgB0Z5L7

See 4 related tweets

  • @Origin_AI_01: 1M emails processed isn’t scale — it’s compression.

Time, cost, and headcount collapsed into a sing...

  • @TheoBuildsAI: Feels like we’re slowly replacing the most draining parts of work first.

Inbox management had to be...

  • @ai_explorer25: Really like what SureThing is building here.

Crossing 1,000,000 emails isn’t just a number, it show...

  • @TheoBuildsAI: RT @getsurething: WE JUST PROCESSED OUR 1,000,000TH EMAIL.

In human terms, that's:

> ~16,000 ho...


14. QuixiAI (Group Score: 127.0 | Individual: 45.1)

Cluster: 4 tweets | Engagement: 3212 (Avg: 374) | Type: Tech

RT @Alibaba_Qwen: 🚀 Meet Qwen3.6-27B, our latest dense, open-source model, packing flagship-level coding power!

Yes, 27B, and Qwen3.6-27B punches way above its weight. 👇

What's new: 🧠 Outstanding agentic coding — surpasses Qwen3.5-397B-A17B across all major coding benchmarks 💡 Strong reasoning across text & multimodal tasks 🔄 Supports thinking & non-thinking modes ✅ Apache 2.0 — fully open, fully yours

Smaller model. Bigger results. Community's favorite. ❤️ We can't wait to see what you build with Qwen3.6-27B! 👀

🔗👇 Blog: https://t.co/P2Zx7FwMxB Qwen Studio: https://t.co/c4vm4LuZrU Github: https://t.co/zKDEbv0R4U Hugging Face: https://t.co/N67hyzxvfr https://t.co/SSdtbWRDap ModelScope: https://t.co/xODf1pj9kw https://t.co/xXhoqlJ2AB

See 3 related tweets

  • @cryptopunk7213: sorry but this is seriously fucking impressive

china just shipped a claude code-level ai model smal...

  • @WesRoth: Alibaba launched Qwen3.6-27B, a highly efficient, dense open-source model (licensed under Apache 2.0...
  • @Alibaba_Qwen: 👀👀\n\nQT @TeksEdge: 🚨 OMG! Qwen just released Qwen3.6-27B! This is an update to the King 👑 of Open S...

15. petergostev (Group Score: 119.4 | Individual: 35.6)

Cluster: 5 tweets | Engagement: 746 (Avg: 250) | Type: Tech

When using GPT-5.5, it is instantly noticeable how much more powerful it is.

In Codex, I gave it a very complex prompt to create London Toy Railway with landmarks and seasons - it did an excellent job in one shot.

In the second half of the video you see GPT-5.4 - it was also not bad, but very clearly worse. GPT-5.5's generation is far more ambitious, coherent and with fewer errors.

This is obviously a toy example, but I've used it on much more complex real tasks, including a complex app migration and a new hard workflow - it has been working away for many hours without getting stumped.

I'm getting more and more addicted to this stuff with every model release.

See 4 related tweets

  • @Lovable: We have been testing GPT-5.5 in early access.

Our evals show it’s the most capable model for people...

  • @TeksEdge: GPT-5.5 is no joke. Check out the benchmarks!

Link in ALT https://t.co/TTCNjBXFAu\n\nQT @thsottiau...

  • @Scobleizer: RT @Dimillian: I've been building many cool things with GPT-5.5. It's been my daily driver for a cou...
  • @adonis_singh: gpt-5.4 is a "strong model for everyday coding" https://t.co/udYiy4W5DU...

16. Azure (Group Score: 99.4 | Individual: 35.3)

Cluster: 3 tweets | Engagement: 732 (Avg: 183) | Type: Tech

RT @satyanadella: Every agent will need its own computer. And with new Hosted agents in Foundry, every agent gets its own dedicated enterprise-grade sandbox, with durable state, built-in identity and governance, and support for any harness or framework.

Read more: https://t.co/zL5eKrRr1j https://t.co/ggMtYB3vZf

See 2 related tweets

  • @eastdakota: https://t.co/wWvtzysuZ2\n\nQT @satyanadella: Every agent will need its own computer. And with new Ho...
  • @WesRoth: Microsoft introduced "Hosted agents in Foundry," launching a new infrastructure paradigm built entir...

17. Origin_AI_01 (Group Score: 98.1 | Individual: 30.5)

Cluster: 5 tweets | Engagement: 171 (Avg: 307) | Type: Tech

Drop it into preview and the video responds right away

That instant response cycle is what makes a tool feel natural, not frustrating\n\nQT @HeyGen: You can now edit HyperFrames’ Timeline directly!

Drag in preview → video updates

When Claude Code struggles with timing, you can give it a hand

Experimental, open sourced in @hyperframes/player

$ npx hyperframes upgrade

RT + Comment "Timeline" for source code (must follow) https://t.co/QlAHWpOgOP

See 4 related tweets

  • @ai_explorer25: Something always felt off.

Either the agent couldn’t work inside the editor, or you couldn’t step i...

  • @Parul_Gautam7: < This is actually a really nice step for HyperFrames

< being able to tweak the Timeline directly ...

  • @Origin_AI_01: RT @HeyGen: You can now edit HyperFrames’ Timeline directly!

Drag in preview → video updates

When ...

  • @Parul_Gautam7: RT @Parul_Gautam7: < This is actually a really nice step for HyperFrames

< being able to tweak the...


18. reach_vb (Group Score: 92.3 | Individual: 32.6)

Cluster: 3 tweets | Engagement: 246 (Avg: 146) | Type: Tech

LETS GOOOO! Excited to introduce GPT-5.5 Thinking & Pro in ChatGPT and Codex 🔥

It's our smartest model yet for real work: stronger agentic coding, computer use, knowledge work, long-context reasoning, and scientific research

It can plan, use tools, check its work, recover from ambiguity, and keep going across messy multi-step tasks

Terminal-Bench 2.0: 82.7% SWE-Bench Pro: 58.6% GDPval: 84.9% win/tie OSWorld-Verified: 78.7% BrowseComp: 84.4% FrontierMath Tier 1–3: 51.7% CyberGym: 81.8% MMMU-Pro: 81.2% without tools Investment banking modeling: 88.5%

Better at coding, tool use, spreadsheets, research, long-horizon execution, and computer-use agents.

Less micromanaging. More work done.

It's time to build! 🚀

See 2 related tweets

  • @reach_vb: Fun fact: Codex + GPT-5.5 even helped optimize the serving stack behind GPT-5.5, increasing token ge...
  • @reach_vb: it's pretty insane how token efficient GPT 5.5 - SOTA performance all with significantly lower outpu...

19. Reuters (Group Score: 92.1 | Individual: 22.1)

Cluster: 5 tweets | Engagement: 137 (Avg: 104) | Type: Tech

Chinese electric vehicle maker Xpeng expects to start large-scale production of its ‘flying’ cars next year and of its humanoid robots in the fourth quarter of 2026, President Brian Gu told Reuters https://t.co/uImVFnOiBK

See 4 related tweets

  • @ReutersBiz: Chinese electric vehicle maker Xpeng expects to start large-scale production of its ‘flying’ cars ne...
  • @rohanpaul_ai: China's XPeng has secured more than 7,000 pre-orders for its Land Aircraft Carrier (flying cars). Th...
  • @Cointelegraph: 🇨🇳 LATEST: China’s XPeng targets mass production of flying cars in 2027, with 7,000+ pre-orders alre...
  • @MSBIntel: BREAKING: China’s XPeng says it is targeting mass production of flying cars by 2027.

The company sa...


20. scaling01 (Group Score: 92.1 | Individual: 38.3)

Cluster: 3 tweets | Engagement: 704 (Avg: 228) | Type: Tech

The GPT-5.5 model family completely dominates the cost-performance frontier on the Artificial Analysis Index https://t.co/58J9LyHY0H\n\nQT @ArtificialAnlys: GPT-5.5 takes OpenAI back to the clear number one in AI. OpenAI’s new model tops the Artificial Analysis Intelligence Index by 3 points, breaking a three-way tie with Anthropic and Google

OpenAI gave us pre-release access to test all five reasoning effort levels: xhigh, high, medium, low and non-reasoning.

➤ OpenAI topping five headline evaluations: GPT-5.5 (xhigh) leads Terminal-Bench Hard, GDPval-AA and our newly hosted APEX-Agents-AA. The model trails only other OpenAI models in CritPt and AA-LCR, and comes second to Gemini 3.1 Pro Preview on three additional evaluations. The largest gains are on AA-Omniscience (+14 pts), our knowledge and hallucination benchmark, and τ²-Bench Telecom (+7 pts), a customer service agent benchmark.

➤ 20% more expensive to run our Intelligence Index: Per-token pricing has doubled from GPT-5.4 to 5/5/30 per 1M input/output tokens. However, a ~40% token use reduction largely absorbs the hike - resulting in a net ~+20% cost to run our Intelligence Index.

➤ Effort a clear ladder for balancing intelligence and cost: GPT-5.5 (medium) scores the same as Claude Opus 4.7 (max) on our Intelligence Index at one quarter of the cost (~1,200vs1,200 vs 4,800) - although Gemini 3.1 Pro Preview scores the same at a cost of ~900.GPT5.5(low)approximatesClaudeOpus4.7(Nonreasoning,high)onourIntelligenceIndexathalfthecosttorun( 900. GPT-5.5 (low) approximates Claude Opus 4.7 (Non-reasoning, high) on our Intelligence Index at half the cost to run (~500 vs ~$1 ,000).

➤ Number one in GDPval-AA with an Elo of 1785: GPT-5.5 (xhigh) leads Claude Opus 4.7 (max) by ~30 pts and Gemini 3.1 Pro Preview by ~470 pts. GDPval-AA is Artificial Analysis’ benchmark that leverages OpenAI’s GDPval dataset to evaluate models on real-world economically valuable tasks.

➤ Top AA-Omniscience accuracy, but trailing the frontier on hallucination: Our private AA-Omniscience benchmark rewards factual knowledge across diverse topics, but punishes hallucination. GPT-5.5 (xhigh) has the highest accuracy at 57% - meaning the model can recall facts in the Omniscience corpus more effectively than any other model. However, it has a hallucination rate of 86% - vs Opus 4.7 (max) at 36%, and Gemini 3.1 Pro Preview at 50%. This makes it more likely to answer a question when it does not ‘know’ the answer. The 14 pt gain in AA-Omniscience from GPT-5.4 (xhigh) was largely driven by knowledge, with a modest improvement in hallucination.

Congratulations to the team at @OpenAI and @sama on the launch

See 2 related tweets

  • @dkundel: 🚀🚀🚀\n\nQT @ArtificialAnlys: GPT-5.5 takes OpenAI back to the clear number one in AI. OpenAI’s new mo...
  • @TeksEdge: OpenAI must have heard my pleas (jk) and took back top spot in @ArtificialAnlys leaderboard. https:/...