Generative AI: A Creative New World


This article discusses Generative AI, a type of Artificial Intelligence that can create something new rather than analyze something that already exists. Generative AI has the potential to generate trillions of dollars of economic value by making knowledge and creative work more efficient and capable. It has become possible due to better models, more data, and more compute. Generative AI applications exist as plugins in existing software ecosystems, as well as standalone web apps. Examples of Generative AI applications include copywriting, vertical specific writing assistants, code generation, art generation, gaming, media/advertising, and design. Despite its potential, Generative AI still has some kinks to work out in terms of business models and technology. Nonetheless, if progress continues at the current rate, Generative AI could become deeply embedded in how humans work, create and play in the future.

本稿では、すでに存在するものを分析するのではなく、新しいものを生み出すことができる人工知能の一種である「Generative AI(ジェネレーティブAI)」について解説する。Generative AIは、知識や創造的な作業をより効率的かつ有能にすることで、何兆円もの経済価値を生み出す可能性を持っています。より優れたモデル、より多くのデータ、より多くの計算機によって可能になったのです。Generative AIアプリケーションは、既存のソフトウェア・エコシステムのプラグインとして、またスタンドアロンのウェブ・アプリケーションとして存在します。Generative AIアプリケーションの例としては、コピーライティング、業種別ライティングアシスタント、コード生成、アート生成、ゲーム、メディア/広告、デザインなどが挙げられます。ジェネレーティブAIは、その可能性にもかかわらず、ビジネスモデルや技術の面でまだ解決しなければならない問題がある。それでも、現在のペースで進歩が続けば、将来的にGenerative AIは人間の働き方、創造、遊び方に深く浸透していくかもしれません。


Humans are good at analyzing things. Machines are even better. Machines can analyze a set of data and find patterns in it for a multitude of use cases, whether it’s fraud or spam detection, forecasting the ETA of your delivery or predicting which TikTok video to show you next. They are getting smarter at these tasks. This is called “Analytical AI,” or traditional AI.

But humans are not only good at analyzing things—we are also good at creating. We write poetry, design products, make games and crank out code. Up until recently, machines had no chance of competing with humans at creative work—they were relegated to analysis and rote cognitive labor. But machines are just starting to get good at creating sensical and beautiful things. This new category is called “Generative AI,” meaning the machine is generating something new rather than analyzing something that already exists.

Generative AI is well on the way to becoming not just faster and cheaper, but better in some cases than what humans create by hand. Every industry that requires humans to create original work—from social media to gaming, advertising to architecture, coding to graphic design, product design to law, marketing to sales—is up for reinvention. Certain functions may be completely replaced by generative AI, while others are more likely to thrive from a tight iterative creative cycle between human and machine—but generative AI should unlock better, faster and cheaper creation across a wide range of end markets. The dream is that generative AI brings the marginal cost of creation and knowledge work down towards zero, generating vast labor productivity and economic value—and commensurate market cap.

The fields that generative AI addresses—knowledge work and creative work—comprise billions of workers. Generative AI can make these workers at least 10% more efficient and/or creative: they become not only faster and more efficient, but more capable than before. Therefore, Generative AI has the potential to generate trillions of dollars of economic value.





ジェネレーティブAIが取り組む分野、すなわち知識労働や創造的労働は、何十億人もの労働者を生み出しています。ジェネレーティブAIは、これらの労働者を少なくとも10%以上効率的かつ創造的にすることができます。つまり、彼らは以前よりも速く、効率的になるだけでなく、より有能になるのです。したがって、Generative AIは何兆ドルもの経済価値を生み出す可能性を秘めているのです。


Why Now?
Generative AI has the same “why now” as AI more broadly: better models, more data, more compute. The category is changing faster than we can capture, but it’s worth recounting recent history in broad strokes to put the current moment in context.

Wave 1: Small models reign supreme (Pre-2015) 5+ years ago, small models are considered “state of the art” for understanding language. These small models excel at analytical tasks and become deployed for jobs from delivery time prediction to fraud classification. However, they are not expressive enough for general-purpose generative tasks. Generating human-level writing or code remains a pipe dream.

Wave 2: The race to scale (2015-Today) A landmark paper by Google Research (Attention is All You Need) describes a new neural network architecture for natural language understanding called transformers that can generate superior quality language models while being more parallelizable and requiring significantly less time to train. These models are few-shot learners and can be customized to specific domains relatively easily.

Sure enough, as the models get bigger and bigger, they begin to deliver human-level, and then superhuman results. Between 2015 and 2020, the compute used to train these models increases by 6 orders of magnitude and their results surpass human performance benchmarks in handwriting, speech and image recognition, reading comprehension and language understanding. OpenAI’s GPT-3 stands out: the model’s performance is a giant leap over GPT-2 and delivers tantalizing Twitter demos on tasks from code generation to snarky joke writing.

Despite all the fundamental research progress, these models are not widespread. They are large and difficult to run (requiring GPU orchestration), not broadly accessible (unavailable or closed beta only), and expensive to use as a cloud service. Despite these limitations, the earliest Generative AI applications begin to enter the fray.

Wave 3: Better, faster, cheaper (2022+) Compute gets cheaper. New techniques, like diffusion models, shrink down the costs required to train and run inference. The research community continues to develop better algorithms and larger models. Developer access expands from closed beta to open beta, or in some cases, open source.

For developers who had been starved of access to LLMs, the floodgates are now open for exploration and application development. Applications begin to bloom.

Wave 4: Killer apps emerge (Now) With the platform layer solidifying, models continuing to get better/faster/cheaper, and model access trending to free and open source, the application layer is ripe for an explosion of creativity.

Just as mobile unleashed new types of applications through new capabilities like GPS, cameras and on-the-go connectivity, we expect these large models to motivate a new wave of generative AI applications. And just as the inflection point of mobile created a market opening for a handful of killer apps a decade ago, we expect killer apps to emerge for Generative AI. The race is on.


Generative AIは、より優れたモデル、より多くのデータ、より多くの計算能力といった、より広範なAIと同じ「今なぜ」なのかを持っています。このカテゴリーは私たちの理解を超える速さで変化していますが、現在の状況を把握するために、最近の歴史を大まかに振り返ってみる価値はあるでしょう。

波 1: 小型モデルが君臨(2015年以前) 5年以上前、小型モデルは言語を理解するための「最先端」と考えられていた。これらの小型モデルは分析タスクに優れており、納期予測から詐欺の分類まで様々な業務に導入されるようになった。しかし、汎用的な生成タスクには十分な表現力を持ちません。人間レベルの文章やコードを生成することは、まだ夢物語に過ぎない。

Wave 2: スケールアップ競争(2015-今日) Google Researchによる画期的な論文(Attention is All You Need)では、より並列化可能で学習時間が大幅に短縮されながら、優れた品質の言語モデルを生成できるトランスフォーマーという自然言語理解用の新しいニューラルネットワークアーキテクチャについて述べています。これらのモデルは数発の学習で、特定のドメインに比較的容易にカスタマイズすることができる。


このように基礎的な研究が進んでいるにもかかわらず、これらのモデルは普及していません。大規模で実行が難しい(GPUオーケストレーションが必要)、広くアクセスできない(利用不可またはクローズドベータのみ)、クラウドサービスとして利用するには高価である、などです。これらの制限にもかかわらず、最も初期のGenerative AIアプリケーションは、戦いに参入し始める。

波3:より良い、より速い、より安い(2022年以降) コンピュートはより安くなる。拡散モデルのような新しい技術により、学習と推論に必要なコストが削減される。研究コミュニティはより優れたアルゴリズムとより大規模なモデルを開発し続ける。開発者のアクセスは、クローズド・ベータからオープン・ベータ、場合によってはオープン・ソースへと拡大する。


第4波:キラーアプリの出現(現在) プラットフォームレイヤーが固まり、モデルはより良く、より速く、より安くなり続け、モデルへのアクセスは無料とオープンソースの傾向にあるため、アプリケーションレイヤーは創造性を爆発させるための機が熟している。



Market Landscape
Below is a schematic that describes the platform layer that will power each category and the potential types of applications that will be built on top.


Text is the most advanced domain. However, natural language is hard to get right, and quality matters. Today, the models are decently good at generic short/medium-form writing (but even so, they are typically used for iteration or first drafts). Over time, as the models get better, we should expect to see higher quality outputs, longer-form content, and better vertical-specific tuning.
Code generation is likely to have a big impact on developer productivity in the near term as shown by GitHub CoPilot. It will also make the creative use of code more accessible to non developers.
Images are a more recent phenomenon, but they have gone viral: it’s much more fun to share generated images on Twitter than text! We are seeing the advent of image models with different aesthetic styles, and different techniques for editing and modifying generated images.
Speech synthesis has been around for a while (hello Siri!) but consumer and enterprise applications are just getting good. For high-end applications like film and podcasts the bar is quite high for one-shot human quality speech that doesn’t sound mechanical. But just like with images, today’s models provide a starting point for further refinement or final output for utilitarian applications.
Video and 3D models are coming up the curve quickly. People are excited about these models’ potential to unlock large creative markets like cinema, gaming, VR, architecture and physical product design. The research orgs are releasing foundational 3D and video models as we speak.
Other domains: There is fundamental model R&D happening in many fields, from audio and music to biology and chemistry (generative proteins and molecules, anyone?).
The below chart illustrates a timeline for how we might expect to see fundamental models progress and the associated applications that become possible. 2025 and beyond is just a guess.

Here are some of the applications we are excited about. There are far more than we have captured on this page, and we are enthralled by the creative applications that founders and developers are dreaming up.

Copywriting: The growing need for personalized web and email content to fuel sales and marketing strategies as well as customer support are perfect applications for language models. The short form and stylized nature of the verbiage combined with the time and cost pressures on these teams should drive demand for automated and augmented solutions.
Vertical specific writing assistants: Most writing assistants today are horizontal; we believe there is an opportunity to build much better generative applications for specific end markets, from legal contract writing to screenwriting. Product differentiation here is in the fine-tuning of the models and UX patterns for particular workflows.
Code generation: Current applications turbocharge developers and make them much more productive: GitHub Copilot is now generating nearly 40% of code in the projects where it is installed. But the even bigger opportunity may be opening up access to coding for consumers. Learning to prompt may become the ultimate high-level programming language.
Art generation: The entire world of art history and pop cultures is now encoded in these large models, allowing anyone to explore themes and styles at will that previously would have taken a lifetime to master.
Gaming: The dream is using natural language to create complex scenes or models that are riggable; that end state is probably a long way off, but there are more immediate options that are more actionable in the near term such as generating textures and skybox art.
Media/Advertising: Imagine the potential to automate agency work and optimize ad copy and creative on the fly for consumers. Great opportunities here for multi-modal generation that pairs sell messages with complementary visuals.
Design: Prototyping digital and physical products is a labor-intensive and iterative process. High-fidelity renderings from rough sketches and prompts are already a reality. As 3-D models become available the generative design process will extend up through manufacturing and production—text to object. Your next iPhone app or sneakers may be designed by a machine.
Social media and digital communities: Are there new ways of expressing ourselves using generative tools? New applications like Midjourney are creating new social experiences as consumers learn to create in public.




コード生成は、GitHub CoPilotが示すように、近い将来、開発者の生産性に大きな影響を与える可能性がある。また、開発者以外の人にとっても、コードの創造的な利用がより身近なものになることだろう。
その他の領域 オーディオや音楽から生物学や化学まで、多くの分野で基礎的なモデルの研究開発が行われています(ジェネレーティブなタンパク質や分子はありますか?)


コード生成。現在のアプリケーションは、開発者の生産性を大幅に向上させる。GitHub Copilotは、インストールされているプロジェクトにおいて、コードの40%近くを生成している。しかし、より大きなチャンスは、消費者がコーディングにアクセスできるようにすることかもしれません。プロンプトを学ぶことは、究極のハイレベル・プログラミング言語となるかもしれない。


Anatomy of a Generative AI Application
What will a generative AI application look like? Here are some predictions.

Intelligence and model fine-tuning
Generative AI apps are built on top of large models like GPT-3 or Stable Diffusion. As these applications get more user data, they can fine-tune their models to: 1) improve model quality/performance for their specific problem space and; 2) decrease model size/costs.

We can think of Generative AI apps as a UI layer and “little brain” that sits on top of the “big brain” that is the large general-purpose models.

Form Factor
Today, Generative AI apps largely exist as plugins in existing software ecosystems. Code completions happen in your IDE; image generations happen in Figma or Photoshop; even Discord bots are the vessel to inject generative AI into digital/social communities.

There are also a smaller number of standalone Generative AI web apps, such as Jasper and Copy.ai for copywriting, Runway for video editing, and Mem for note taking.

A plugin may be an effective wedge into bootstrapping your own application, and it may be a savvy way to surmount the chicken-and-egg problem of user data and model quality (you need distribution to get enough usage to improve your models; you need good models to attract users). We have seen this distribution strategy pay off in other market categories, like consumer/social.

Paradigm of Interaction
Today, most Generative AI demos are “one-and-done”: you offer an input, the machine spits out an output, and you can keep it or throw it away and try again. Increasingly, the models are becoming more iterative, where you can work with the outputs to modify, finesse, uplevel and generate variations.

Today, Generative AI outputs are being used as prototypes or first drafts. Applications are great at spitting out multiple different ideas to get the creative process going (e.g. different options for a logo or architectural design), and they are great at suggesting first drafts that need to be finessed by a user to reach the final state (e.g. blog posts or code autocompletions). As the models get smarter, partially off the back of user data, we should expect these drafts to get better and better and better, until they are good enough to use as the final product.

Sustained Category Leadership
The best Generative AI companies can generate a sustainable competitive advantage by executing relentlessly on the flywheel between user engagement/data and model performance. To win, teams have to get this flywheel going by 1) having exceptional user engagement → 2) turning more user engagement into better model performance (prompt improvements, model fine-tuning, user choices as labeled training data) → 3) using great model performance to drive more user growth and engagement. They will likely go into specific problem spaces (e.g., code, design, gaming) rather than trying to be everything to everyone. They will likely first integrate deeply into applications for leverage and distribution and later attempt to replace the incumbent applications with AI-native workflows. It will take time to build these applications the right way to accumulate users and data, but we believe the best ones will be durable and have a chance to become massive.

Hurdles and Risks
Despite Generative AI’s potential, there are plenty of kinks around business models and technology to iron out. Questions over important issues like copyright, trust & safety and costs are far from resolved.

Eyes Wide Open
Generative AI is still very early. The platform layer is just getting good, and the application space has barely gotten going.

To be clear, we don’t need large language models to write a Tolstoy novel to make good use of Generative AI. These models are good enough today to write first drafts of blog posts and generate prototypes of logos and product interfaces. There is a wealth of value creation that will happen in the near-to-medium-term.

This first wave of Generative AI applications resembles the mobile application landscape when the iPhone first came out—somewhat gimmicky and thin, with unclear competitive differentiation and business models. However, some of these applications provide an interesting glimpse into what the future may hold. Once you see a machine produce complex functioning code or brilliant images, it’s hard to imagine a future where machines don’t play a fundamental role in how we work and create.

If we allow ourselves to dream multiple decades out, then it’s easy to imagine a future where Generative AI is deeply embedded in how we work, create and play: memos that write themselves; 3D print anything you can imagine; go from text to Pixar film; Roblox-like gaming experiences that generate rich worlds as quickly as we can dream them up. While these experiences may seem like science fiction today, the rate of progress is incredibly high—we have gone from narrow language models to code auto-complete in several years—and if we continue along this rate of change and follow a “Large Model Moore’s Law,” then these far-fetched scenarios may just enter the realm of the possible.

Call for Startups
We are at the beginning of a platform shift in technology. We have already made a number of investments in this landscape and are galvanized by the ambitious founders building in this space.

If you are a founder and would like to meet, shoot us a note at [email protected] and [email protected]

We can’t wait to hear your story.

PS: This piece was co-written with GPT-3. GPT-3 did not spit out the entire article, but it was responsible for combating writer’s block, generating entire sentences and paragraphs of text, and brainstorming different use cases for generative AI. Writing this piece with GPT-3 was a nice taste of the human-computer co-creation interactions that may form the new normal. We also generated illustrations for this post with Midjourney, which was SO MUCH FUN!



ジェネレーティブAIアプリケーションは、GPT-3やStable Diffusionなどの大規模なモデルの上に構築されています。これらのアプリケーションは、より多くのユーザーデータを取得すると、モデルを微調整することができます。1) 特定の問題空間に対するモデルの品質と性能を向上させる、2) モデルのサイズとコストを削減する。

Generative AIアプリケーションは、大規模な汎用モデルである「大きな脳」の上に位置するUIレイヤーであり「小さな脳」であると考えることができるのです。

今日、Generative AIアプリは、主に既存のソフトウェア・エコシステムのプラグインとして存在しています。コード補完はIDEで、画像生成はFigmaやPhotoshopで、Discordボットもデジタル/ソーシャルコミュニティにジェネレーティブAIを注入する器となっています。

また、コピーライティングのJasperやCopy.ai、ビデオ編集のRunway、メモ取りのMemなど、独立したGenerative AIウェブアプリも少数ながら存在する。


今日、ほとんどのGenerative AIのデモは、「ワンアンドドーン」です。あなたが入力を提供し、機械が出力を吐き出し、あなたはそれを保持するか、捨てて再挑戦することができます。出力されたものを修正したり、微調整したり、レベルアップさせたり、バリエーションを生成したりすることができるのです。


最高のジェネレーティブAI企業は、ユーザーエンゲージメント/データとモデルパフォーマンスの間のフライホイールで絶え間なく実行することによって、持続可能な競争上の優位性を生み出すことができます。勝つためには、1)優れたユーザーエンゲージメントを持つ → 2)ユーザーエンゲージメントをより良いモデルパフォーマンスに変える(プロンプトの改善、モデルの微調整、ラベル付きトレーニングデータとしてのユーザーの選択) → 3)優れたモデルパフォーマンスを使ってユーザーの成長とエンゲージメントを促進する、このフライホイールを動かす必要があるのです。彼らは、すべての人にすべてに対応しようとするのではなく、特定の問題領域(例:コード、デザイン、ゲーム)に入り込む可能性があります。まず、活用と配布のためにアプリケーションに深く統合し、その後、既存のアプリケーションをAIネイティブのワークフローで置き換えることを試みるでしょう。これらのアプリケーションを正しい方法で構築し、ユーザーとデータを蓄積するには時間がかかりますが、最高のものは耐久性があり、巨大化する可能性があると信じています。



はっきり言って、Generative AIをうまく活用するために、トルストイ小説を書くための大規模な言語モデルは必要ない。これらのモデルは今日、ブログ記事の初稿を書いたり、ロゴや製品インターフェースのプロトタイプを生成したりするのに十分な性能を持っているのです。近中期的に起こる価値創造が豊富にあるのです。

このGenerative AIアプリケーションの第一波は、iPhoneが登場した頃のモバイルアプリケーションの状況に似ています。どこかギミックで薄っぺらく、差別化やビジネスモデルが不明確です。しかし、これらのアプリケーションの中には、未来の可能性を垣間見ることができる興味深いものがあります。機械が複雑な機能を持つコードや鮮やかな画像を作り出すのを目の当たりにしたら、私たちの仕事や創作活動において機械が基本的な役割を果たさない未来を想像するのは難しいでしょう。


数十年先の夢を見ることができれば、Generative AIが私たちの仕事、創造、そして遊びに深く浸透している未来を容易に想像することができます。例えば、自分で書いたメモ、想像できるものは何でも3Dプリント、テキストからピクサー映画へ、夢を描くと同時に豊かな世界を生み出すRoblox的ゲーム体験。このような体験は、今日ではSFのように思えるかもしれませんが、数年の間に狭い言語モデルからコードのオートコンプリートに至るまで、その進歩のスピードは非常に速く、このまま「ラージモデル・ムーアの法則」に沿って変化を続ければ、これらの空想的なシナリオは可能性の域に達するかもしれないのです。


もしあなたが創業者で、お会いしたいとお考えでしたら、[email protected][email protected] までご連絡ください。


PS: この記事はGPT-3との共同執筆です。GPT-3は記事全体を書き出したわけではありませんが、ライターズ・ブロックと戦い、文章全体や段落を生成し、生成AIのさまざまな使用例をブレインストーミングする役割を担っています。GPT-3でこの記事を書いたことで、人間とコンピュータの共創的な相互作用が新しい常態を形成する可能性があることを実感しました。また、この記事のためにMidjourneyでイラストを作成しましたが、これはとても楽しかったです