記事:Dear Sam Altman- There was never an era of making models bigger

LLMs have never been revolutionary or as game-changing as gurus online would have pushed you to believe

Join 32K+ people and get insights on the most important ideas in AI straight to your inbox through my free newsletter- AI Made Simple

Recently, the internet caught fire with a particular admission from Sam Altman- The Era of Large Language Models is over. According to this report by Wired-

But the company’s CEO, Sam Altman, says further progress will not come from making models bigger. “I think we’re at the end of the era where it’s going to be these, like, giant, giant models,” he told an audience at an event held at MIT late last week. “We’ll make them better in other ways.”

– Article- OpenAI’s CEO Says the Age of Giant AI Models Is Already Over

This has caused a notable stir in the online space. We have seen a notable increase in the number of AI Experts (specifically GPT Experts) since November 2022. These GPT/LLM warriors have been promising AGI and 100x productivity-based AI based on how adding trillions of parameters and more data might be coming to an end sooner than people realize. Looks like Alex Hormozi can live a peaceful existence now-

In this article, I’m here to argue something that would be considered blasphemy to many people in the AI Space- the age of mindlessly scaling models has never been here. When we look at the data and actually compare the results- it has always been clear that throwing more and more data and increasing parameter size was always doomed to fail- long before we started hitting the scaling limits that GPT-4 is starting to hit. By understanding how we could have foreseen this problem, we can avoid making these mistakes in the future- saving everyone a lot of time, money, and attention.

OpenAI, the AI research company behind popular language models like ChatGPT and Dall-E 2, has reportedly doubled its losses to $540 million in 2022 due to soaring development expenses for its chatbot. The company is now looking to raise as much as $100 billion in the coming years to fund its goal of developing artificial general intelligence (AGI), an AI advanced enough to improve its own capabilities.

-Source. The hype around these models is causing a major bubble. Make sure you don’t get caught up

Sound like a good time? Let’s get right into it.


私の無料ニュースレター「AI Made Simple」を通じて、3万2000人以上の人々と一緒に、AIにおける最も重要なアイデアに関する洞察を直接受信することができます。


しかし、同社のCEOであるサム・アルトマンは、さらなる進歩はモデルを大きくすることでは得られないと言います。「先週末にMITで開催されたイベントで、彼は聴衆に「私たちは、このような、巨大で巨大なモデルになる時代の終わりにいると思います。”他の方法でより良いものにする “と。

– 記事-OpenAIのCEO、巨大なAIモデルの時代はすでに終わったと語る

このことは、オンライン空間に顕著な波紋を投げかけている。2022年11月以降、AI Expert(具体的にはGPT Expert)の数が顕著に増えているのを確認しています。これらのGPT/LLMの戦士たちは、何兆ものパラメータとより多くのデータを追加することが、人々が思っているよりも早く終焉を迎えるかもしれないということに基づいて、AGIと100倍の生産性に基づくAIを約束してきました。アレックス・ホルモジは今、平和な日々を送ることができるようだ。


ChatGPTやDall-E 2といった人気の言語モデルを開発したAI研究会社OpenAIは、チャットボットの開発費の高騰により、2022年の損失が5億4000万ドルに倍増したと報じられています。同社は現在、自らの能力を向上させるほど高度なAIである人工一般知能(AGI)の開発という目標に向けて、今後数年間で1000億ドルもの資金を調達しようとしています。

-ソースはこちら これらのモデルの誇大広告は、大きなバブルを引き起こしています。巻き込まれないように気をつけましょう


Saturation of Benchmarks
One of the most important cornerstones of the hype behind AI was the performance increases that LLMs hit with multiple benchmarks. Every week, we found that GPT/a new Language Model was able to match/beat SOTA performance on a new benchmark or task. Who can forget the hype that we hit when we learned that GPT can pass the Bar Exam and even act as a doctor?

This led to a lot of speculation on AGI and how these models were developing so-called emergent abilities. However, peering behind the hood tells you a very different story.

In earlier years, people were improving significantly on the past year’s state of the art or best performance. This year across the majority of the benchmarks, we saw minimal progress to the point we decided not to include some in the report. For example, the best image classification system on ImageNet in 2021 had an accuracy rate of 91%; 2022 saw only a 0.1 percentage point improvement.


People involved in deep learning for a while will know an uncomfortable truth- AI performance has been increasingly saturated for a few years, way before we had investors and social media influencers blindly pushing the hype behind large language models. Machine Learning Researchers have burned more and more computation for increasingly smaller gains (sometimes under a percentage point).

AI continued to post state-of-the-art results, but year-over-year improvement on many benchmarks continues to be marginal. Moreover, the speed at which benchmark saturation is being reached is increasing.

– Stanford AI Index Report 2023

Seen from this perspective, you should be less excited about these so-called amazing architectures. When it comes to performance- sure they can hit benchmarks- but at what cost? Deploying these models at any kind of scale would have you running out of computing budgets quicker than Haaland breaking goal-scoring records. Don’t forget, the scale that makes these models powerful also makes them extremely expensive to deploy in contexts where you have to make a lot of inferences ( Amazon Web Services estimates that “In deep learning applications, inference accounts for up to 90% of total operational costs”.).

According to researchers who wrote, Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model[3], it took roughly 50.5 tons of CO2 equivalent to train the large-language model BLOOM. GPT-3 released over 500 tons of CO2 equivalent.

– Once you’re done with article, check out my article on the ethics of Copilot

However, that is far from the only issue that made GenAI far less promising than the internet would have led you to believe. Let’s now cover something that would come as a surprise to a lot of people who have graduated from ChatGPT University.







– スタンフォードAIインデックスレポート2023

このような観点から見ると、いわゆる素晴らしいアーキテクチャについては、あまり期待しないほうがよいでしょう。パフォーマンスに関しては、確かにベンチマークを達成することができますが、その代償は?これらのモデルをどのような規模で展開しても、Haalandがゴール記録を更新するよりも早く、コンピューティング予算が尽きてしまうでしょう。忘れてはならないのは、これらのモデルを強力にする規模は、多くの推論を行う必要がある状況で展開するには非常に高価になるということです(Amazon Web Servicesは、「深層学習アプリケーションでは、推論が運用コスト全体の最大90%を占める」と推定しています)。

Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model[3]を書いた研究者によると、大規模言語モデルBLOOMを訓練するために、およそ50.5トンのCO2換算が必要だった。GPT-3では、500トン以上のCO2が排出されました。

– この記事を読み終えたら、「Copilot」の倫理に関する私の記事もご覧ください。


Here’s something that would surprise you if you only read about GPT from online influencers telling you how to get ahead of 99% of people- these models are just not very good. When it comes to practically implementing Large Language Models into systems that are useful, efficient, and safe- these structures fall apart.

When it comes to business use cases, these models are often lose to very simple models. The authors of “A Comparison of SVM against Pre-trained Language Models (PLMs) for Text Classification Tasks” compared the performance of LLMs with a puny SVM for text classification in various specialized business contexts. They fine-tuned and used the following models-

These models were stacked against an SVM and some old-fashioned feature engineering. The results are in the following table-

This extends beyond just text classification. Fast AI has an exceptional write-up investigating Github Copilot. One of their stand-out insights was- “According to OpenAI’s paper, Codex only gives the correct answer 29% of the time. And, as we’ve seen, the code it writes is generally poorly refactored and fails to take full advantage of existing solutions (even when they’re in Python’s standard library).” Just to drill home how overrated these models can be for coding, here are some of the problems with these models that the amazing Luca Rossi explored in his insightful AI & The Future of Coding 🤖. In it, he attempted to create the following with in Node-

Write a Telegram bot that answers my questions like ChatGPT, using OpenAI API

-a relatively simple task, for any competen t developer

The code generated in Node didn’t work. Below is Luca’s analysis of the situation- It turns out, the AI used methods that do not exist in the Node library, probably inspired by the Python ones.

Below is Luca’s experience with Debugging-

I replaced them and things worked fine. Two considerations:

Debugging the code required me to study the openai and telegraf libraries, undoing 90% of the benefit of using the AI in the first place.
Debugging was surprisingly hard, because these were not the kind of mistakes a normal person makes. When we debug, our brain is wired for looking at things we are more likely to get wrong. When you debug AI code, instead, literally anything can be wrong (maybe over time we will figure out the most common AI mistakes), which makes the work harder. In this case, AI completely made a method up — which is not something people usually do.
Along with this, Luca has some great insights into how using AI Coders would cause a lot of duplication of work in bug fixing and system design, would destroy innovation, use outdated methods, and a few other concerns. Would highly recommend checking out his work here-

Let’s beyond ChatGPT and onto the allegedly early signs of AGI that were discovered in Microsoft’s 155 Page report- Sparks of Artificial General Intelligence: Early experiments with GPT-4. In it, we GPT-4 failing at some relatively simple tasks when it comes to a very simple task-

We see GPT-4 adding new information to the note, even though the prompt explicitly states- using exclusively the information above (and keep in mind this is Microsoft’s hype piece on GPT-4, so we don’t see the real disasters). Show this to the people that want to use GPT for doctors. If these systems are implemented recklessly, people will suffer. Regular readers know that I prefer not making sensational statements, but this is not me click-baiting you about AI’s existential threat. This is me telling you about a very real issue with these systems. If you’d like to read more about my investigation into GPT-4 and understand how the writers of the GPT-4 report ignored very real problems with their claims of AGI- read Observations on Microsoft’s Experiments with GPT-4.


ビジネスのユースケースになると、これらのモデルは、しばしば非常に単純なモデルに負けてしまう。A Comparison of SVM against Pre-trained Language Models (PLMs) for Text Classification Tasks」の著者は、様々な特殊なビジネスコンテクストにおけるテキスト分類について、LLMと貧弱なSVMのパフォーマンスを比較しました。彼らは以下のモデルを微調整して使用しました。


これは、単なるテキスト分類にとどまりません。Fast AIは、Github Copilotを調査した優れた記事を掲載しています。OpenAIの論文によると、Codexは29%の確率で正しい答えを出しています。そして、私たちが見てきたように、Codexが書くコードは一般的にリファクタリングが不十分で、既存のソリューション(たとえそれがPythonの標準ライブラリにある場合でも)を十分に活用できていない。これらのモデルがコーディングにとっていかに過大評価されうるかを叩き込むために、素晴らしいルカ・ロッシが洞察に満ちた『AI & The Future of Coding 🤖』で探求した、これらのモデルの問題点をいくつか紹介します。その中で、彼はNode-で次のようなものを作ろうとしました。

OpenAI APIを使って、ChatGPTのように私の質問に答えてくれるTelegramボットを書く






ChatGPTを越えて、マイクロソフトの155ページのレポート「Sparks of Artificial General Intelligence」で発見された、AGIの初期兆候とされるものを見ていきましょう: GPT-4による初期の実験。このレポートでは、GPT-4が比較的簡単なタスクで失敗していることが紹介されています。


Data-Centric AI
Prior to this ‘bombshell’ admission and GenAI capturing the attention of everyone, there was another trendy buzzword that was moving through the Data Field- ‘Data-Centric AI’. The premise was relatively- we had spent a lot of time building better models and not enough time on improving our data transformation processes. Take a look at this quote from Andrew Ng at an event by MIT-

AI systems need both code and data, and “all that progress in algorithms means it’s actually time to spend more time on the data,” Ng said at the recent EmTech Digital conference hosted by MIT Technology Review.

Once you’re done being wowed by the big names in the statement, tell me what part of that is truly surprising. It’s been well-known that data processing and curation is the most important part of the pipeline since the beginning. The reason that we saw this giant emphasis on models in research was two-fold-

We have big benchmarks/datasets (ImageNet for eg) for many of the standard tasks. In this case, the problem was in the architectures themselves. By standardizing datasets, we could test the effectiveness of various changes to the training pipelines.
Streetlight Effect- The streetlight effect, or the drunkard’s search principle, is a type of observational bias that occurs when people only search for something where it is easiest to look. Combining this with the confirmation bias we see the mess of mindless scaling model size without attention given to other factors- people simply funding/copying whatever is already being done (in our case tweaking architectures). If all you see is working focused on changing architectures, then people will likely create work along a similar vein.
But the unsustainable nature of simply scaling up architectures has been well known to anyone who even remotely understands the field. Once you step outside the bubble of ML Academia and Big Tech AI- you will see that organizations lack the expertise/budgets/inclination to invest into implementing large models because the ROIs are not worth it.

We additionally do not utilize GPUs for inference in production. At our scale, outfitting each machine with one or more top-class GPUs would be prohibitively expensive, … that our models are relatively small compared to state-of-the-art models in other areas of deep learning (such as computer vision or natural language processing), we consider our approach much more economical.

-This is from a writeup by the engineers at Zemanta, a leading advertising firm. Even at their level, SOTA DL isn’t really worth it. To learn more read- How to handle 300 million predictions per second over here

This is why we have seen the move away from bigger models and obscenely big data sets and instead focus on more intelligent design. Intelligent design was explicitly mentioned as one of the reasons Open Source was beating big companies like Google and Open in the infamous ‘Google has no Moat’ document-

Open-Source companies are doing things in weeks with $100 and 13B params that we struggle with at $10M and 540B.

-Read my analysis of the situation here

この「爆弾発言」によってGenAIが注目を集める以前、データ分野では「データ中心AI」という別の流行語がありました。より良いモデルの構築に多くの時間を費やし、データ変換プロセスの改善には十分な時間を割けなかった、というのがその前提でした。MITのイベントでのAndrew Ngの言葉を見てください。

AIシステムにはコードとデータの両方が必要です。「アルゴリズムの進歩は、実はデータにもっと時間をかけるべき時なのです」と、MIT Technology Reviewが主催する最近のEmTech DigitalカンファレンスでNgは述べています。


ストリートライト効果 – ストリートライト効果(酔っぱらいの検索原理)は、人が最も探しやすい場所でしか何かを探さない場合に生じる観察バイアスの一種です。これを確証バイアスと組み合わせると、他の要素に注意を払わず、無心にモデルサイズを拡大すること、つまり、すでに行われていること(この場合はアーキテクチャの調整)に資金を提供したり、コピーしたりすることが、混乱につながることがわかります。もしあなたが、アーキテクチャの変更に焦点を当てた仕事しか見ていないのなら、人々は同じような系統の仕事を作るでしょう。


-これは、大手広告会社Zemantaのエンジニアが書いたものです。彼らのレベルでも、SOTA DLは本当に価値がない。詳しくは、「1秒間に3億件の予測を処理する方法」をご覧ください。

このような理由から、より大きなモデルや膨大なデータセットから離れ、よりインテリジェントな設計にフォーカスする動きが見られるのです。インテリジェント・デザインは、悪名高い「Google has no Moat」文書の中で、オープンソースがGoogleやOpenのような大企業に勝っている理由の1つとして明確に言及されました。



Combining this together, here is a question I want to ask you- when exactly did we have an era of scaling up? At what point was it ever a good idea to implement more and more scaling- as opposed to focusing on better data selection, using ensembles/mixture of experts to reduce errors, or constraining the system to handle certain kinds of problems to avoid errors? People have been calling this unsustainable for a long time, way before general-purpose LLMs were a mainstay in ML.

Rather than an insightful statement about the future of AI, Sam Altman’s statement is a face-saving maneuver. As is becoming clear- the market is becoming increasingly saturated and OpenAI has no clear direction for making money. Even if they did corner the market, they can’t raise prices because customers can always switch to open-source models (and this assumes that the OpenAI models are meaningfully better- which is debatable). The bubble is bursting, the chickens have come home to roost, and this statement is Sam Altman is now scrambling to maintain the facade that these problems are under control.

That is it for this piece. I appreciate your time. As always, if you’re interested in reaching out to me or checking out my other work, links will be at the end of this email/post. If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here. And if you found value in this write-up, I would appreciate you sharing it with more people. It is word-of-mouth referrals like yours that help me grow.

Save the time, energy, and money you would burn by going through all those videos, courses, products, and ‘coaches’ and easily find all your needs met in one place at ‘Tech Made Simple’! Stay ahead of the curve in AI, software engineering, and the tech industry with expert insights, tips, and resources. 20% off for new subscribers by clicking this link. Subscribe now and simplify your tech journey!

Using this discount will drop the prices-

800 INR (10 USD) → 640 INR (8 USD) per Month

8000 INR (100 USD) → 6400INR (80 USD) per year

Get 20% off for 1 year

Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.

If you like my writing, I would really appreciate an anonymous testimonial. You can drop it here.

To help me understand you fill out this survey (anonymous)

Small Snippets about Tech, AI and Machine Learning over here

Check out my other articles on Medium. : https://rb.gy/zn1aiu

My YouTube: https://rb.gy/88iwdd

Reach out to me on LinkedIn. Let’s connect: https://rb.gy/m5ok2y

My Instagram: https://rb.gy/gmvuy9

My Twitter: https://twitter.com/Machine01776819

Very similar to Crypto




ビデオ、コース、製品、コーチなど、あらゆるものを見て回ることで消費する時間、エネルギー、お金を節約し、「Tech Made Simple」で簡単にすべてのニーズを1か所で満たすことができます!専門家の洞察、ヒント、リソースで、AI、ソフトウェアエンジニアリング、テック業界のカーブを先取りしてください。このリンクをクリックすると、新規購読者が20%オフになります。今すぐ購読して、技術の旅をシンプルにしましょう!


800 INR (10 USD) → 640 INR (8 USD) /月

8000 INR (100 USD) → 6400INR (80 USD)/年






Mediumで私の他の記事をチェックする。 : https://rb.gy/zn1aiu

私のYouTube: https://rb.gy/88iwdd

LinkedInで私に連絡を取ってください。つながりましょう: https://rb.gy/m5ok2y

私のInstagram: https://rb.gy/gmvuy9

私のツイッター: https://twitter.com/Machine01776819