Published on

科技推文精选 - 2026-02-13

Authors

今日科技前沿:人工智能正转向自主智能体工程与极限推理。Google DeepMind 的 Gemini 3 Deep Think 正在刷新多项基准纪录,而 OpenAI 的 GPT-5.3-Codex-Spark 的输出速度已突破每秒 1,000 个 token。这种加速正在重塑劳动力市场;Spotify 报告称其顶尖开发人员已停止手动编码,Microsoft AI 则预测专业服务将实现全面自动化。在这些跨越式发展的背后,业界正围绕中美开源领导权、以硬件为中心的价值转向以及安全防护机制的完整性展开激烈辩论。


1. rseroter (Group Score: 121.6 | Individual: 42.9)

Cluster: 7 tweets | Engagement: 738 (Avg: 49) | Type: Tech

RT @GoogleDeepMind: We’ve upgraded our specialized reasoning mode Gemini 3 Deep Think to help solve modern science, research, and engineeri…

See 6 related tweets

  • @Google: RT @GeminiApp: Today, we’re releasing a significant upgrade to our specialized reasoning mode, Gemin...
  • @rseroter: RT @OfficialLoganK: Gemini Deep Think 3 is the world's most capable model by many measures, huge amo...
  • @Google: RT @sundarpichai: Gemini 3 Deep Think is getting a significant upgrade. We’ve refined Deep Think in ...
  • @GeminiApp: RT @Google: Today, we updated Gemini 3 Deep Think to further accelerate modern science, research and...
  • @GoogleDeepMind: RT @Google: Early testers of Gemini 3 Deep Think are already seeing results.

We partnered with rese...


2. FT (Group Score: 107.7 | Individual: 48.5)

Cluster: 4 tweets | Engagement: 4471 (Avg: 213) | Type: Tech

CEO of Microsoft AI Mustafa Suleyman joins FT editor Roula Khalaf to explain why most of the tasks accountants, lawyers and other professionals currently undertake will be fully automated by AI within the next 12 to 18 months https://t.co/yYKzS7NIOP https://t.co/HvA6Q7KgIc

See 3 related tweets

  • @kimmonismus: Mustafa Suleyman, CEO Microsoft AI:

"Most of the tasks accountants, lawyers and other professional...

  • @rohanpaul_ai: Most white collar jobs that involve sitting at a computer, like law, accounting, project management,...
  • @AISafetyMemes: Microsoft AI CEO says most if not ALL white collar work will likely be fully automated in 12 t...

3. sama (Group Score: 87.0 | Individual: 29.7)

Cluster: 5 tweets | Engagement: 8617 (Avg: 4146) | Type: Tech

GPT-5.3-Codex-Spark is launching today as a research preview for Pro.

More than 1000 tokens per second!

There are limitations at launch; we will rapidly improve.

See 4 related tweets

  • @chatgpt21: It “sparks” joy! GPT 5.3 -Codex-Spark is a smaller version of GPT‑5.3-Codex, and it’s there first mo...
  • @testingcatalog: BREAKING 🚨: OpenAI released GPT-5.3-Codex-Spark, a new, faster model powered by @cerebras infrastruc...
  • @sherwinwu: OpenAI 🤝@cerebras

Code at the speed of thought with GPT-5.3-Codex-Spark at > 1000 tokens/second...

  • @steipete: RT @OpenAI: GPT-5.3-Codex-Spark is now in research preview.

You can just build things—faster. https...


4. TechCrunch (Group Score: 80.5 | Individual: 46.4)

Cluster: 2 tweets | Engagement: 1207 (Avg: 63) | Type: Tech

Spotify says its best developers haven’t written a line of code since December, thanks to AI https://t.co/6hafAJOeJv

See 1 related tweets

  • @kimmonismus: Spotify revealed that its top engineers haven’t written a single line of code since December, thanks...

5. rohanpaul_ai (Group Score: 67.5 | Individual: 49.5)

Cluster: 2 tweets | Engagement: 1296 (Avg: 112) | Type: Tech

Marc Andreessen's new interview, on the future of AI.

"There's like a rotation from software into hardware.

It's possible all the value accrues to the chips, and the energy, and then software is all open source."

https://t.co/S09rVda609

See 1 related tweets

  • @rohanpaul_ai: RT @rohanpaul_ai: Marc Andreessen's new interview, on the future of AI.

"There's like a rotation fr...


6. GeminiApp (Group Score: 66.0 | Individual: 47.1)

Cluster: 2 tweets | Engagement: 8147 (Avg: 1413) | Type: Tech

Today, we’re releasing a significant upgrade to our specialized reasoning mode, Gemini 3 Deep Think.

Deep Think is built to drive practical applications, enabling researchers to interpret complex data and engineers to model physical systems through code.

With the updated Deep Think, you can turn a sketch into a 3D-printable reality. Deep Think analyzes the drawing, builds the complex shape, and generates a file so you can create the physical object with 3D printing.

This is rolling out now to Google AI Ultra subscribers. Select the "Deep Think" option in the tools menu to get started.

Learn more here: https://t.co/MMGMgDtoK8

See 1 related tweets

  • @mark_k: Google just released Gemini 3 Deep Think, a significant upgrade to their specialized reasoning mode ...

7. bgurley (Group Score: 63.2 | Individual: 35.0)

Cluster: 3 tweets | Engagement: 3092 (Avg: 377) | Type: Tech

RT @naval: All the American AI companies talk about sharing the wealth, but all the top open source models are Chinese.

See 2 related tweets

  • @levelsio: In a way I prefer the American AI companies because they're mostly public (Google etc) or will be pu...
  • @techreview: What’s next for Chinese open-source AI https://t.co/2vm859AQvc...

8. _akhaliq (Group Score: 60.8 | Individual: 51.6)

Cluster: 2 tweets | Engagement: 1502 (Avg: 86) | Type: Tech

RT @Zai_org: Introducing GLM-5: From Vibe Coding to Agentic Engineering

GLM-5 is built for complex systems engineering and long-horizon ag…

See 1 related tweets

  • @abacusai: RT @bindureddy: GLM-5 IS THE WORLD'S BEST OPEN SOURCE MODEL IN AGENTIC CODING 🤯🤯

This is MIND BLOWI...


9. aakashgupta (Group Score: 58.6 | Individual: 41.3)

Cluster: 2 tweets | Engagement: 1113 (Avg: 444) | Type: Tech

Sundar buried the real story in the cost data.

Gemini 3 Deep Think went from 45.1% to 84.6% on ARC-AGI-2 in under 3 months. That’s an 88% improvement on a benchmark specifically designed to resist brute-force scaling.

The number that matters: 13.62pertask.ThepreviousDeepThinkscored45.113.62 per task. The previous Deep Think scored 45.1% at 77.16 per task. This upgrade nearly doubled the accuracy while cutting cost by 82%. Three months ago, Gemini needed 138,000 reasoning tokens to solve an ARC task that Gemini 3 Pro handles in 96.

This tells you everything about where the reasoning race actually sits. Every other lab is throwing more compute at harder problems. Google just demonstrated that inference-time optimization is the dominant variable, and they’re improving on it faster than anyone expected.

The Codeforces number confirms the pattern. 3455 Elo puts Deep Think in the top 0.01% of competitive programmers globally. Claude Opus 4.6 sits at 2352. That 1100 Elo gap is roughly the difference between a strong amateur and a world finalist.

The benchmark Sundar doesn’t mention: ARC Prize is already building ARC-AGI-3 because ARC-AGI-2 at 84.6% is approaching saturation. Google killed a benchmark designed to measure AGI progress in less than a year.

The competitive framing in Pichai’s chart puts Claude and GPT in every comparison. For enterprises building reasoning-heavy applications in science and engineering, the cost-per-insight gap between Deep Think and everything else just widened by 5x in a single quarter.

See 1 related tweets

  • @mark_k: The leap in reasoning capabilities shown by Gemini 3 Deep Think is staggering.

Scoring 84.6% on the...


10. BrianRoemmele (Group Score: 56.4 | Individual: 37.3)

Cluster: 2 tweets | Engagement: 1220 (Avg: 191) | Type: Tech

NEW RESEARCH ON AI SHOULD ALARM YOU:

Found:

Gender bias favoring female candidates: 5/6 models

Race/ethnicity bias favoring minority-associated names: 4/6 models

It helps NO ONE.

This was baked in by using Reddit as a majority training ground for AI.

We ALL pay a price. https://t.co/pmWDE7BsLH

See 1 related tweets

  • @BrianRoemmele: RT @BrianRoemmele: NEW RESEARCH ON AI SHOULD ALARM YOU:

Found:

Gender bias favoring female candida...


11. Reuters (Group Score: 55.8 | Individual: 31.4)

Cluster: 3 tweets | Engagement: 226 (Avg: 112) | Type: Tech

Exclusive: The Pentagon is pushing the top AI companies including OpenAI and Anthropic to make their artificial-intelligence tools available on classified networks without many of the standard restrictions that the companies apply to users https://t.co/18WfpcCvOS

See 2 related tweets

  • @Reuters: Exclusive: Pentagon pushing AI companies to expand on classified networks, sources say https://t.co/...
  • @Reuters: RT @DavidJeans2: New: The Pentagon is pushing AI companies to allow use of their AI models on classi...

12. bearliu (Group Score: 55.7 | Individual: 26.1)

Cluster: 3 tweets | Engagement: 39 (Avg: 48) | Type: Tech

准备用 Claude Code 来做一件真实的项目:将我刚刚完成的一个网站设计(已经有Figma稿)用 Claude Code 直接制作并发布。

看看是这个流程快,还是用 Figma Make 或者 Figma Site 块。效果好的话,我再分享一下制作过程。

问一下:有用过 Figma MCP 的朋友吗? 感觉如何? https://t.co/GRetYYRtrQ

See 2 related tweets

  • @GitHub_Daily: 产品经理每天要处理大量会议记录、用户研究、数据分析,还要写需求文档、做竞品分析,光靠手工整理,效率实在跟不上。

最近在 GitHub 上发现 Claude Code PM Course 这个开源课程...

  • @AxtonLiu: RT @AxtonLiu: Claude Code 新出的 Insight 报告把我自己都吓到了:

过去 30 天:55,685 条消息 / 6062 个会话 / 300 万行代码变更。

难怪报告...


13. llama_index (Group Score: 52.1 | Individual: 26.5)

Cluster: 2 tweets | Engagement: 15 (Avg: 41) | Type: Tech

2026 is the year of long-horizon agents. @sequoia predicts that this year, agents will be able to tackle long-horizon tasks and work autonomously for hours to solve ambiguous tasks.

We're excited about how this translates to knowledge work automation, particularly over documents. Let's take a look at "Long Horizon Document Agents"

🕰️ Agents are evolving to work autonomously over weeks, not just minutes, handling complex document tasks end-to-end. 🔄 These agents can continuously monitor events like document changes, comments, and deadlines - not just respond to chat prompts 📝 They maintain persistent task backlogs and can collaborate iteratively on living documents like FAQs, PRDs, and legal contracts 🎯 The interface shifts from chat boxes to "agent inboxes" that manage ongoing document tasks with clear status and context ⚡ This enables true automation of multi-step knowledge work - from due diligence memo updates to contract redline collaboration loops

2026 is shaping up to be the year agents evolve from "workflows" to "employees" - and we're building the document processing infrastructure to make this possible.

Read @jerryjliu0's full blog on long horizon document agents: https://t.co/1DwRnMRseH

See 1 related tweets

  • @jerryjliu0: Existing AI agents are largely short-horizon (e.g. chat) or constrained (e.g. agentic process automa...

14. alex_prompter (Group Score: 51.7 | Individual: 51.7)

Cluster: 1 tweets | Engagement: 6514 (Avg: 281) | Type: Tech

RT @alex_prompter: 🚨 The guy who built Anthropic’s defenses against AI bioterrorism just quit.

Mrinank Sharma led Anthropic’s Safeguards R…


15. cryptopunk7213 (Group Score: 51.6 | Individual: 21.1)

Cluster: 3 tweets | Engagement: 155 (Avg: 578) | Type: Tech

xAI update was fucking epic

  • grok code will be state-of-the-art in 2-3 months

  • grok 4.20 is the #1 forecasting model (best trading AI across stocks and prediction markets)

  • xMoney launching in 1-2 months, lets any users send, deposit and transact money

imagine giving grok money to invest… an AI that makes you money is fckin awesome

  • grok imagine went from idea to SOTA in 6 months, now produced 6 billion images (6x competitors)

  • X platform is crushing - engagement at all-time highs, new users spend 55% more time on app vs. 6 months ago.

  • merging grok main and voice teams (makes sense, voice is easiest way to use ai)

now consider xAI has the largest arsenal of bleeding-edge GPUs and you can envision a (near) future where grok regains the throne.

See 2 related tweets

  • @MarioNawfal: 🇺🇸 Elon at xAI's all-hands:

"Despite starting from scratch 6 months ago, Grok Imagine now generates...

  • @MarioNawfal: Elon at xAI all-hands:

"xAI is only 2.5 years old, a toddler, and we've already hit #1 in voice, im...


16. ycombinator (Group Score: 50.8 | Individual: 41.1)

Cluster: 2 tweets | Engagement: 608 (Avg: 170) | Type: Tech

.@tensol_ai turns OpenClaw into full-time AI employees for your company.

They handle repetitive workflows across support, engineering, sales and more — running 24/7 in a secure environment, connected to your tools, with full context of your business.

Congrats on the launch @pratik_satija and @olivieropinotti!

https://t.co/OzFdfed9vg

See 1 related tweets

  • @levelsio: RT @lexfridman: Here's my conversation with Peter Steinberger (@steipete), creator of OpenClaw, an o...

17. ModelScope2022 (Group Score: 50.1 | Individual: 24.8)

Cluster: 3 tweets | Engagement: 29 (Avg: 68) | Type: Tech

Love seeing what the MiniMax team shipped with M2.5 🚀 Real-world productivity, strong agentic workflows, and impressive efficiency numbers — this is serious engineering.

SOTA across coding, search, and tool use while getting faster is a big deal. 🔥

Excited to share that the model will be coming to ModelScope soon for the open-source community. Stay tuned! 🌍✨

See 2 related tweets

  • @MiniMax_AI: Great to see MiniMax M2.5 live in @cline

Benchmarks are one thing — real dev workflows are another...

  • @MiniMax_AI: RT @ModelScope2022: Love seeing what the MiniMax team shipped with M2.5 🚀 Real-world productivity, s...

18. FireworksAI_HQ (Group Score: 47.9 | Individual: 31.7)

Cluster: 2 tweets | Engagement: 123 (Avg: 66) | Type: Tech

GLM-5 just dropped on Fireworks day 0 🇺🇲🇪🇺

▶️ 5x cheaper, 2x more throughput than Claude Opus 4.6 ✨ On-par or better on MMLU and AA-LCR 🪘 202k context window via DeepSeek Sparse Attention 🏁 OpenAI + Anthropic API compatible 🇺🇸 US-Based company & infra 🛡️ Security-first with zero data retention

Stop overpaying for frontier performance and start building now 👇

https://t.co/ac8YjrRSvF

See 1 related tweets

  • @FireworksAI_HQ: GLM-5 just dropped on Fireworks day 0 🇺🇲🇪🇺

5x cheaper, 2x more throughput than Claude Opus: on-par ...


19. Reuters (Group Score: 47.2 | Individual: 18.2)

Cluster: 3 tweets | Engagement: 19 (Avg: 112) | Type: Tech

🔊 Shares in software giants sank amid fears of competition from chatbots, implying pain for private equity’s bet on the industry. In this Viewsroom podcast, @Breakingviews columnists discuss how asset managers may take a hit whether AI fails or succeeds https://t.co/Gea9PtYfGO

See 2 related tweets

  • @Reuters: 🔊 Shares in software giants collapsed amid fears new tech will replace their services. In this Views...
  • @FT: How private equity’s big bet on software got derailed by AI https://t.co/48HupO4imk...

20. svpino (Group Score: 47.0 | Individual: 47.0)

Cluster: 1 tweets | Engagement: 3004 (Avg: 171) | Type: Tech

“sOfTwArE eNgInEeRiNg iS dEaD”

you have to be mentally challenged to think your mom will use ai to build a grocery app whenever she needs one