Di's Blog

Is AI a Bubble?

Posted on 2024-05-11

Random words:

What's true about New York City: People come and go, they don't stay.

Back to the topic:

When we talk about investment, we talk about economic values. The current situation of AI is very similar to Cisco’s in 2000. Cisco as an internet company spread the capacity of the World Wide Web, but sooner people realized that there is no economic value in internet company, instead, opportunities are in e-commerce etc. AI is a tool very similar to web tech. Currently, with heightened expectations, people are allocating investments and capital expenditure in AI model development, however, end-user demand is unclear and revenue is relatively minimal. This situation makes AI look like a bubble from a very long term perspective.

Stepping closer to it, there is still room in the market to party. GPUs for training and inference are increasingly on demand. First round of beneficiaries are Cloud and Ad. Second round could be hardware or something else. Although it looks like a Capitalism’s scam which is getting more money to the big tech, as small open-source models are released, moats are expected to be disintegrated and distributed. I’ve seen more and more enterprises going to have Gen AI integrated to their business or operation now. Enterprise is going to be continuously transformed to be more efficient and productive, as well as human life with this long lasting attention on AI. This kind of long lasting attention and consistent innovation are something different from internet tech in 2000 and will probably create a momentum against bubble.

2024 May - What I Have Read

Posted on 2024-05-01

Substack

To me, the best model going forward is going to be based on the weighted performance per parameter and training token count. Ultimately, a model keeps getting better the longer you train it. Most open model providers could train longer, but it hasn’t been worth their time. We’re starting to see that change.

The most important models will represent improvements in capability density, rather than shifting the frontier.

In some ways, it’s easier to make the model better by training longer compared to anything else, if you have the data.

The core difference between open and closed LLMs on these charts is how undertrained open LLMs often are. The only open model confirmed to be trained on a lot of tokens is DBRX.

― The End of the “Best Open LLM” - Interconnects [Link]

Good analysis of the direction of open LLM development in 2023 and 2024. In 2023, models were progressing in MMLU by leveraging more compute budgets to handle scaled active parameters and training tokens. In 2024, the progressing direction is slightly changed to be orthogonal to previous - which is improving on MMLU while keeping compute budgets constant.

The companies that have users interacting with their models consistently have moats through data and habits. The models themselves are not a moat, as I discussed at the end of last year when I tried to predict machine learning moats, but there are things in the modern large language model (LLM) space that open-source will really struggle to replicate. Concretely, that difference is access to quality and diverse training prompts for fine-tuning. While I want open-source to win out for personal philosophical and financial factors, this obviously is not a walk in the park for the open-source community. It’ll be a siege of a castle with, you guessed it, a moat. We’ll see if the moat holds.

― Model commoditization and product moats - Interconnects [Link]

The goal of promoting scientific understanding for the betterment of society has a long history. Recently I was pointed to the essay The Usefulness of Useless Knowledge by Abraham Flexner in 1939 which argued how basic scientific research without clear areas for profit will eventually turn into societally improving technologies. If we want LLMs to benefit everyone, my argument is that we need far more than just computer scientists and big-tech-approved social scientists working on these models. We need to continue to promote openness to support this basic feedback loop that has helped society flourish over the last few centuries.

The word openness has replaced the phrase open-source among most leaders in the open AI movement. It’s the easiest way to get across what your goals are, but it is not better in indicating how you’re actually supporting the open ecosystem. The three words that underpin the one messy word are disclosure (the details), accessibility (the interfaces and infrastructure), and availability (the distribution).

― We disagree on what open-source AI should mean - Interconnects [Link]

Google: “A Positive Moment” [Link]

The report of Google Search’s death is exaggerated so far. In fact, search advertising has grown faster at Google than at Microsoft. User searching behavior is harder to change than people expected. Also, Google is leading the development of AI powered tools for Search: 1) “circle to search” is feature allowing a search from an image, text, or video without switching apps. 2) “Point your camera, ask a question” is a feature allowing for multisearch with both images and text for complex questions given an image to the tool. Overall, SGE (Search Generative Experience) is revolutionizing search experience (“10 blue links”) by introducing a dynamic AI-enhanced experience. So far from I observed AI powers Google Search rather than weakens it.

Amazon: Wild Margin Expansion - App Economy Insights [Link]

Amazon’s margin expansion: AWS hit $100 B run rate with a 38% operating margin; Ads is surging; delivery costs have been reduced.

The biggest risk is not correctly projecting demand for end-user AI consumption, which would threaten the utilization of the capacity and capital investments made by tech firms today. This would leave them exposed at the height of the valuation bubble, if and when it bursts, just like Cisco’s growth story that began to unravel in 2000. After all, history may not repeat, but it often rhymes.

At the Upfront Ventures confab mentioned earlier, Brian Singerman, a partner at Peter Thiel’s Founders Fund, was asked about contrarian areas worth investing in given the current landscape. His response: “Anything not AI”.

― AI’s Bubble Talk Takes a Bite Out Of The Euphoria - AI Supremacy [Link]

When we talk about investment, we talk about economic values. Current situation of AI is very similar to Cisco’s in 2000. Cisco as an internet company spread the capacity of the World Wide Web, but sooner people realized that there is no economic value in internet company, instead, opportunities are in e-commerce etc. AI is a tool very similar to web tech. Currently, with heightened expectations, people are allocating investments and capital expenditure in AI model development, however, end-user demand is unclear and revenue is relatively minimal. This situation makes AI look like a bubble from a very long term perspective.

Steve Jobs famously said that Apple stands at the intersection of technology and liberal arts. Apple is supposed to enhance and improve our lives in the physical realm, not to replace cherished physical objects indiscriminately.

― Apple’s Dystopian iPad Video - The Rational Walk Newsletter [Link]

Key pillars of the new strategy (on gaming):

Expanding PC and cloud gaming options.

Powerful consoles (still a core part of the vision).

Game Pass subscriptions as the primary access point.

Actively bringing Xbox games to rival platforms (PS5, Switch).

Exploring mobile gaming with the potential for handheld hardware.

Microsoft’s “every screen is an Xbox” approach is a gamble and may take a long time to pay off. But the industry is bound to be device-agnostic over time as it shifts to the cloud and offers cross-play and cross-progression. It’s a matter of when not if.

― Microsoft: AI Inflection - App Economy Insights [Link]

Highlights: Azure’s growth accelerated sequentially thanks to AI services and was the fastest-growing of the big three (Amazon AWS, Google Cloud, Microsoft Azure). On Search, Microsoft is losing market share to Alphabet. Capex on AI grows roughly 80% YoY. On gaming, it’s diversifying approaches from selling consoles. Copilot and the Office succeed with Enterprise customers.

To founders, my advice is to remain laser-focused on building products and services that customers love, and be thoughtful and rational when making capital allocation decisions. Finding product-market fit is about testing and learning from small bets before doubling down, and it is often better to grow slower and more methodically as that path tends to lead to a more durable and profitable business. An axiom that doesn’t seem to be well understood is that the time it takes to build a company is also often its half-life.

― 2023 Annual Letter - Chamath Palihapitiya [Link]

This is a very insightful letter about how economic and tech trends of 2023 have shaped their thinking and investment portfolio. What I have learned from this letter:

Tech industry has shifted their focus from unsustainable “growth at any cost” to more prudent forms of capital allocation. This results in laying off employees and slashing projects that are not relevant to the core business.
Rising of interest rate is one of the reasons of bank crisis. During zero interest rate decade, banks sought higher rates of return by purchasing longer duration assets while the value of them are negatively correlated to interest rate. As those caused losses are known by the public, a liquidity crisis ensued.
The advancement of Gen AI has lowered the barriers of starting a software company, and lowered capital requirement in Bio Tech and material sciences, and changed the process of building companies fundamentally, and empowered new entrants to challenge established businesses.
- The key question is: where will value creation and capture take place? when and where should capital be allocated and company should be started? Some author’s opinions:
  - It’s premature to declare winners now. Instead, author suggested people should deeply understand the underlying mechanisms that will be responsible for value creation over next few years.
  - There are at least two areas of value creation now
    1. Proprietary data
      
      Example: recent partnership between Reddit and Google
    2. Infrastructure used to run AI application
      
      For apps built on top of language models, responsiveness is a critical lynchpin. However GPUs are not well-suited to run inference.
      
      Example: Author’s investment in Groq’s LPU for inference
Heightened geopolitical tensions due to Russia-Ukraine conflict, Israel and Hamas, escalating tensions between China and Taiwan, resulted in a de-globalization trend and also a strategic shift in the US. US legislative initiatives aims to fuel a domestic industrial renaissance by incentivizing reshoring and fostering a more secure and resilient supply chain. They include CHIPS Act, Infrastructure Investment, Job Act, Inflation Reduction Act, etc.
- The author highlights the opportunity for allocators and founders: companies can creatively and strategically tap into different pools of capital-debt, equity, and government funding.

OpenAI’s strategy to get its technology in the hands of as many developers as possible — to build as many use cases as possible — is more important than the bot’s flirty disposition, and perhaps even new features like its translation capabilities (sorry). If OpenAI can become the dominant AI provider by delivering quality intelligence at bargain prices, it could maintain its lead for some time. That is, as long as the cost of this technology doesn’t drop near zero.

A tight integration with Apple could leave OpenAI with a strong position in consumer technology via the iPhone and an ideal spot in enterprise via its partnership with Microsoft.

― OpenAI Wants To Get Big Fast, And Four More Takeaways From a Wild Week in AI News - Big Technology [Link]

As GPT-4o is 2x faster and 50% cheaper, this discourages competitors to develop LLMs to compete and encourages companies to build with OpenAI’s model for their business. This shows that OpenAI wants to get big fast. However, making GPT-4o free disincentivizes users from subscribing the Plus version.

There is a tight and deep bond between OpenAI and Apple. The desktop app has been debuted on Mac and Apple will build OpenAI’s GPT Tech into mobile iOS.

“You can borrow someone else’s stock ideas but you can’t borrow their conviction. True conviction can only be obtained by trusting your own research over that of others. Do the work so you know when to sell. Do the work so you can hold. Do the work so you can stand alone.”

Investing isn’t about blindly following the herd. It’s about carving your own path, armed with knowledge, patience, and a relentless pursuit of growth and learning.

― Hedge Funds’ Top Picks in Q1 - App Economy Insights [Link]

As I’ve dug into this in more detail, I’ve become convinced that they are doing something powerful by searching over language steps via tree-of-thoughts reasoning, but it is much smaller of a leap than people believe. The reason for the hyperbole is the goal of linking large language model training and usage to the core components of Deep RL that enabled success like AlphaGo: self-play and look-ahead planning.

To create the richest optimization setting, having the ability to generate diverse reasoning pathways for scoring and learning from is essential. This is where Tree-of-Thoughts comes in. The prompting from ToT gives diversity to the generations, which a policy can learn to exploit with access to a PRM.

Q seems to be using PRMs to score Tree of Thoughts reasoning data that then is optimized with Offline RL. This wouldn’t look too different from existing RLHF toolings that use offline algorithms like DPO or ILQL that do not need to generate from the LLM during training. The ‘trajectory’ seen by the RL algorithm is the sequence of reasoning steps, so we’re finally doing RLHF in a multi-step fashion rather than contextual bandits!

Let’s Verify Step by Step: a good introduction to PRMs.

― The Q* hypothesis: Tree-of-thoughts reasoning, process reward models, and supercharging synthetic data - Interconnects [Link]

It’s well known on the street that Google DeepMind has split all projects into three categories: Gemini (the large looming model), Gemini-related in 6-12months (applied research), and fundamental research, which is oddly only > 12 months out. All of Google DeepMind’s headcount is in the first two categories, with most of it being in the first.

Everyone on Meta’s GenAI technical staff should spend about 70% of the time directly on incremental model improvements and 30% of the time on ever-green work.

A great read from Francois Chollet on links between prompting LLMs, word2vec, and attention. One of the best ML posts I’ve read in a while.

Slides from Hyung Won Chung’s (OpenAI) talk on LLMs. Great summary of intuitions for the different parts of training. The key point: We can get further with RLHF because the objective function is flexible.

― The AI research job market shit show (and my experience) - Interconnects [Link]

10 Lessons From 2024 Berkshire Hathaway Annual Shareholder Meeting - Capitalist Letters [Link]

What I’ve learned from this article:

Why did Berkshire trimmed its APPL position?

No concern about Apple’s earnings potential, make sense to take some profits as value is now too high.
Right way to look at share buybacks

A business should pay dividends only if it cannot make good use of the excess capital it has. Good use capital means the Return of Equity, which is on average 12% for American companies. If the company is able to allocate capital better than shareholders themselves and provide them with above average returns, it should retain the earnings and allocate capital itself.

Buybacks only makes sense at the right price and buying back shares just to support stock price is not the best action ti take for shareholders. All investment decisions should be price dependent.
How would he invest small sums of money?

At the time of market crashes or economic downturns, you find exceptional companies trading at ridiculously cheap prices and that’s your opportunity, When you find those companies fairly priced or overvalued and you look for special situations while holding onto your positions in those exceptional companies.
Views on capital allocation

Study picking businesses, not stocks.
Investing in foreign countries

America has been a great country for building wealth and capitalist democracy is the best system of governance ever invented.
Advice on job picking

Remember Steve Jobs’ famous words in the Stanford Commencement speech he gave before his death: “Keep looking, don’t settle!”
On the importance of culture

In Berkshire culture, shareholders feel themselves as the owners of the businesses. Greg Abel will keep the culture alive in the post-Buffett period and this will automatically attract top talent to a place where they are given full responsibility and trust.
When to sell stocks
1. A bigger opportunity comes up, 2. something drastically changes in the business, and 3. to raise money
Effects of consumer behavior on investment decisions

Two types of businesses have durable competitive advantage: 1) Lowest cost suppliers of products and services, 2) suppliers of unique products and services.
How to live a good life? “I’ve written my obituary the way I’ve lived my life”‘ - Charlie Munger

NVIDIA: Industrial Revolution - App Economy Insights [Link]

Primary drivers of Data Center Revenue: 1) Strong demand (up 29% sequentially) for the Hopper GPU computing platform used for training and inferencing with LLMs, recommendation engines, and GenAl apps, 2) InfiniBand end-to-end solutions (down 5% sequentially due to timing of supply) for networking. NVIDIA started shipping the Spectrum-X Ethernet networking solutions optimized for Al.

In the earning call, three major customer categories are provided: 1) cloud service providers (CSPs) including hyperscalers Amazon Microsoft and Google. 2) enterprise usage: Tesla expanded training Al cluster to 35000 H100 GPUs and used NVIDIA Al for FSD V12. 3) consumer internet companies: Meta’s Llama 3 powering Meta Al was trained on a cluster of 24000 H100 GPUs.

Huang explained in the earning call that AI is not a chip problem only but also a systems problem now. They build AI factories.

For further growth, Blackwell platform is coming, Spectrum-X networking is expanding, new software tools like NIMs is developing.

A lot of current research focuses on LLM architectures, data sources prompting, and alignment strategies. While these can lead to better performance, such developments have 3 inter-related critical flaws-

They mostly work by increasing the computational costs of training and/or inference.

They are a lot more fragile than people realize and don’t lead to the across-the-board improvements that a lot of Benchmark Bros pretend.

They are incredibly boring. A focus on getting published/getting a few pyrrhic victories on benchmarks means that these papers focus on making tweaks instead of trying something new, pushing boundaries, and trying to address the deeper issues underlying these processes.

― Revolutionizing AI Embeddings with Geometry [Investigations] - Devansh [Link]

Very few AI research work don’t have # 1 and # 3 flaws and they are really good hard-core work. Time is required to verify whether they are generalizable, widely applicable or not. Especially nowadays the process of scientific research is very different from previous years where there was usually a decade between starting your work and publishing it.

This article highlights some publications in complex embedding and looked into how they improved embeddings by using complex numbers. Current challenges in embedding are 1) sensitivity to outliers 2) limited capacity in capture complex relationship in unstructured text, 3) inconsistency in pairwise rankings of similarities, and 4) computational cost. The next generation complex embedding is benefitting from the following pillars: 1) complex geometry provides richer space to capture nuanced relationships and handle outliers, 2) orthogonality allows each dimension to be independent and distinct, 3) contrastive learning can be used to minimize the distance between similar pairs and maximize the distance between dissimilar pairs. Complex embeddings have a lot of advantages: 1) increasing representation capacity with two components (real and imaginary) of complex numbers, 2) complex geometry allows for orthogonality and thus improves generalization, and also allows use to reach stable convergence quickly, 3) robust features can be captured which improves robustness, and 4) solved limitation of cosine similarity (saturation zones which lead to vanishing gradients during optimization) by angle optimization in complex space.

Llama 3 8B might be the most interesting all-rounder for fine-tuning as it can be fine-tuned no a single GPU when using LoRA.

Phi-3 is very appealing for mobile devices. A quantized version of it can run on an iPhone 14.

― How Good Are the Latest Open LLMs? And Is DPO Better Than PPO? [Link]

Good paper review article. Highlights key discussions:

Mixtral 8x22B: The key idea is to replace each feed-forward module in a transformer architecture with 8 expert layers. It achieves lower active parameters (cost) and higher performance (MMLU).
Llama 3: The main difference between Llama 3 and Llama 2 are 1) vocab size has been increased, 2) used grouped-query attention, 3) used both PPO & DPO. The key research finding is that the more data the better performance, no matter what model size is.

“Llama 3 8B might be the most interesting all-rounder for fine-tuning as it can be fine-tuned no a single GPU when using LoRA.”
Phi-3: Key characteristics are 1) it’s based on Llama architecture, 2) trained on 5x fewer tokens than Llama 3, 3) used the same tokenizer with a vocab size of 32064 as Llama2, much smaller than Llama 3 vocab size, 4) has only 3.8B parameters, less than half the size of Llama 3 8B, 5) secret sauce is dataset quality over quantity - it’s trained on heavily filtered web data and synthetic data.

“Phi-3 is very appealing for mobile devices. A quantized version of it can run on an iPhone 14.”
OpenELM: key characteristics are 1) 4 relatively small sizes: 270M, 450M,1.1B, and 3B, 2) instruct version trained with rejection sampling and DPO, 3) slightly better than OLMo in performance, even though trained on 2x fewer tokens, 4) main architecture teak - a layer-wise scaling strategy, 5) sampled a relatively smaller subset of 1.8T tokens from various public datasets, but no clear rationale for subsampling, 6) one main research finding is that there is no clear difference between LoRA and DoRA for parameter efficient fine-tuning.

About the layer-wise scaling strategy: 1) there are N transformer blocks in a model, 2) layers are gradually widened from the early to the later transformer blocks, so for each block: a) number of heads are increased, b) dimension of each layer is increased.
DPO vs PPO: The main difference between DPO and PPO is that “DPO does not require training a separate reward model but uses a classification-like objective to update LLM directly”.

Key findings of the paper and best practices suggested: 1) PPO is generally better than DPO if you use it correctly. DPO suffers from out-of-distribution data, which means instruction data is different from preference data. The solution could be to “add a supervised instruction fine-tuning round on the preference dataset before following up with DPO fine-tuning.”, 2) If you use DPO, make sure to perform SFT on preference data first, 3) “iterative DPO which involves labeling additional data with an existing reward model is better than DPO on existing preference data.”, 4) “If you use PPO, the key is to use large batch sizes, advantage normalization, and parameter update via exponential moving average.”, 5) though PPO is generally better, DPO is more straightforward and will still be a popular go-to option, 6) both can be used. Recall the pipeline behind Llama3: pretraining -> SFT -> rejection sampling -> PPO -> DPO.

Google I/O AI keynote updates 2024 - AI Supremacy [Link]

Streaming Wars Visualized - App Economy Insights [Link]

This Week in Visuals - App Economy Insights [Link]

Gig Economy Shakeup - App Economy Insights [Link]

Articles

Musings on building a Generative AI product - LinkedIn Engineering Blog [Link]

This is a very good read about developing Gen AI product for business by using pre-trained LLM. This article elaborates how this product is designed, how each part works specifically, what works and what does not work, what has been improving, and what has been struggling. Some takeaways for me are

Supervised fine tuning step was done by embedding-based retrieval (EBR) powered by an in-memory database to inject response examples into prompts.
An organizational structure was designed to ensure communication consistency: one horizontal engineering pod for global templates and styles, and several vertical engineering pods for specific tasks such as summarization, job fit assessment, interview tips, etc.
Tricky work:
1. Developing end to end automatic evaluation pipeline.
2. Skills in dynamically discover and invoke APIs / agents.
  
  This requires input and output to be ‘LLM friendly’ - JSON or YAML schemes.
3. Supervised fine tuning by injected responses of internal database.
  
  As evaluation becoming more sophisticated, prompt engineering needs to be improved to reach high quality/evaluation scores. The difficulty is that quality scores shoot up fast then plateau so it’s hard to reach a very high score in the late improvement stage. This makes prompt engineering more like an art rather than science.
4. Tradeoff of capacity and latency
  
  Chain of Thoughts can improve quality and accuracy of responses, but increase latency. TimeToFirstToken (TTFT) & TimeBetweenTokens (TBT) are important to utilization but need to be bounded to limit latency. Besides, they also intend to implement end to end streaming and async non-blocking pipeline.

The concept of open source was devised to ensure developers could use, study, modify, and share software without restrictions. But AI works in fundamentally different ways, and key concepts don’t translate from software to AI neatly, says Maffulli.

But depending on your goal, dabbling with an AI model could require access to the trained model, its training data, the code used to preprocess this data, the code governing the training process, the underlying architecture of the model, or a host of other, more subtle details.

Which ingredients you need to meaningfully study and modify models remains open to interpretation.

both Llama 2 and Gemma come with licenses that restrict what users can do with the models. That’s anathema to open-source principles: one of the key clauses of the Open Source Definition outlaws the imposition of any restrictions based on use cases.

All the major AI companies have simply released pretrained models, without the data sets on which they were trained. For people pushing for a stricter definition of open-source AI, Maffulli says, this seriously constrains efforts to modify and study models, automatically disqualifying them as open source.

― The tech industry can’t agree on what open-source AI means. That’s a problem. ― MIT Technology Review [Link]

This article argues that the definitions of open-source AI are problematic. ‘Open’ models either have restriction on usage or don’t release details of training data. This does not fit traditional definition of ‘open source’. However, people argue that for the special case of AI, we need different definition of open source. As long as the definition remains vague, it’s problematic, because big tech will define open-source AI to be what suits it.

Everything I know about the XZ backdoor [Link]

Some great high-level technical overview of XZ backdoor [Link] [Link] [Link] [Infographic] [Link] [Link]

A backdoor in xz-utils (used for lossless compression) was recently revealed by Andres Freund (Principle SDE at Microsoft). The backdoor only shows up when a few specific criteria are met at least: 1) running a distro that uses glibc, 2) have version 5.6.0 or 5.6.1 xz installed or liblzma installed. There is a malicious script called build-to-host.m4 which checks for various conditions like the architecture of the machine. If those conditions check, the payload is injected into the source tree. The intention of payload is still under investigation. Lasse Collin, one of the maintainer of the repo, has posted an update and is working on carefully analyzing the situation. The author Evan Boehs in the article present a timeline of the attack and online investigators’ discoveries of Jia Tan identity (from IP address, LinkedIn, commit timings, etc), and raises our awareness of the human costs of open source.

Having a crisp mental model around a problem, being able to break it down into steps that are tractable, perfect first-principle thinking, sometimes being prepared (and able to) debate a stubborn AI — these are the skills that will make a great engineer in the future, and likely the same consideration applies to many job categories.

― Why Engineers Should Study Philosophy ― Harvard Business Review [Link]

Human comes into a new stage of learning: smartly asking AI questions to get answers as accurate as possible. So prompt engineering is a very important skill in AI era. In order to master prompt engineering, we need to have divide and conquer mindset, perfect first-principle thinking, critical thinking, and skepticism.

If we had infinite capacity for memorisation, it’s clear the transformer approach is better than the human approach - it truly is more effective. But it’s less efficient - transformers have to store so much information about the past that might not be relevant. Transformers (🤖) only decide what’s relevant at recall time. The innovation of Mamba (🐍) is allowing the model better ways of forgetting earlier - it’s focusing by choosing what to discard using Selectivity, throwing away less relevant information at memory-making time.

― Mamba Explained [Link]

A very in-depth explanation of Mamba architecture. So the main difference between Transformer and Mamba is that Transformer stores all past information and decides what is relevant at recall time. While Mamba uses Selectivity to decide what to discard earlier. Mamba ensures both efficiency and effectiveness (space complexity reduces from O(n) to O(1), time complexity reduces from O(n^2) to O(n)). If Transformer has high effectiveness and low efficiency due to large state, and RNN has high efficiency and low effectiveness due to small state, Mamba is in between - Mamba selectively and dynamically compress data into the state.

The Power of Prompting ― Microsoft Research Blog [Link]

Basically this study demonstrates that GPT-4 is able to outperform a leading model that was fine-tuned specifically for medical application by Medprompt - a composition of several prompting strategies. This shows that fine-tuning might not be necessary in the future though it can boost performance, it is resource-intensive and cost-prohibitive. Simple prompting strategies could serve to transform generalist models into specialists and extending benefits of models to new domains and applications. Similar study was also done in finance domain by JP Morgan with similar results.

Previously, we made some progress matching patterns of neuron activations, called features, to human-interpretable concepts. We used a technique called “dictionary learning”, borrowed from classical machine learning, which isolates patterns of neuron activations that recur across many different contexts.

In turn, any internal state of the model can be represented in terms of a few active features instead of many active neurons. Just as every English word in a dictionary is made by combining letters, and every sentence is made by combining words, every feature in an Al model is made by combining neurons, and every internal state is made by combining teatures.

The features are likely to be a faithful part of how the model internally represents the world, and how it uses these representations in its behavior.

― Mapping the Mind of a Large Language Model - Anthropic [Link]

This is an amazing work towards AI safety by Anthropic. The main goal is to understand the inner workings of AI models and identify how millions of concepts are represented inside Claude Sonnet, so that developers can better control AI safety. Previous progress of this work was to match pattern of neuron activations (“features”) to human-interpretable concepts by technique called “dictionary learning”. Now they are scaling up the technique to the vastly larger AI language models. Below is a list of key experiments and findings.

Extracted millions of features from the middle layer of Claude 3.0 Sonnet. Features have a depth, breadth, and abstraction reflecting Sonnet’s advanced capabilities.
Find more abstract features - responding to bugs in code, discussion of gender biases in professions, etc.
Measure a “distance” between features based on which neurons appeared in their activation patterns. They find that features with similar concept are close to each other. This demonstrates internal organization of concepts in AI model correspond to human notions of similarity.
By artificially amplifying or suppressing features, they see how Claude’s responses change. This shows that features can be used to change how a model acts.
For the purpose of AI safety, they find features corresponding to the capabilities with misuse potential (code backdoors, developing bio-weapons), different forms of biases (gender discrimination, racist claims about crime), and potentially problematic AI behavior (power-seeking, manipulation, secrecy)
For previous concern about sycophancy, they also find a feature associated with sycophantic praise.

This study proposed a good approach to ensure AI safety: use the technique described here to monitor AI systems for dangerous behaviors and to debias outcomes.

To qualify as a “Copilot+ PC” a computer needs distinct CPUs, GPUs, and NPUs (neural processing units) capable of >40 trillion operations per second (TOPS), and a minimum of 16 GB RAM and a 256 GB SSD.

All of those analysts who assumed Wal-Mart would squish Amazon in e-commerce thanks to their own mastery of logistics were like all those who assumed Microsoft would win mobile because they won PCs. It turns out that logistics for retail are to logistics for e-commerce as operating systems for a PC are to operating systems for a phone. They look similar, and even have the same name, but require fundamentally different assumptions and priorities.

I then documented a few seminal decisions made to demote windows, including releasing Office on iPad as soon as he took over, explicitly re-orienting Microsoft around services instead of devices, isolating the Windows organization from the rest of the company, killing Windows Phone, and finally, in the decision that prompted that Article, splitting up Windows itself. Microsoft was finally, not just strategically but also organizationally, a services company centered on Azure and Office; yes, Windows existed, and still served a purpose, but it didn’t call the shots for the rest of Microsoft’s products.

That celebration, though, is not because Windows is differentiating the rest of Microsoft, but because the rest of Microsoft is now differentiating Windows. Nadella’s focus on AI and the company’s massive investments in compute are the real drivers of the business, and, going forward, are real potential drivers of Windows.

This is where the Walmart analogy is useful: McMillon needed to let e-commerce stand on its own and drive the development of a consumer-centric approach to commerce that depended on centralized tech-based solutions; only then could Walmart integrate its stores and online services into an omnichannel solution that makes the company the only realistic long-term rival to Amazon.

Nadella, similarly, needed to break up Windows and end Ballmer’s dreams of vertical domination so that the company could build a horizontal services business that, a few years later, could actually make Windows into a differentiated operating system that might, for the first time in years, actually drive new customer acquisition.

― Windows Returns - Stratechery [Link]

Chatbot Arena results are in: Llama 3 dominates the upper and mid cost-performance front (full analysis) ― Reddit [Link]

Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora [Link]

YouTube and Podcasts

I don’t have an answer to peace in the Middle East, I wish I did, but I do have a very strong view that we are not going to get to peace when we are apologizing or denying crimes against humanity and crime mass rape of women. That’s not the path to peace, the path to peace is not saying this didn’t happen, the path to peace is saying this happened no matter what side of the fence you are on no matter what side of the world you are on, if you are the far right the far left, anywhere on the world, we are not going to let this happen again and we are going to get to peace to make sure. - Sheryl Sandberg

― In conversation with Sheryl Sandberg, plus open-source AI gene editing explained - All-In Podcast [Link]

U.N. to Study Reports of Sexual Violence in Israel During Oct. 7 Attack [Link]

Western media concocts ‘evidence’ UN report on Oct 7 sex crimes failed to deliver [Link]

It’s crazy that what is happening right now in some of the colleges is not to protest sexual violence as a tool of war by Hamas. This kind of ignorance or denial of sexual violence is horrible. People are so polarized to black and white that if something does not fit into their view, they are going to reject it. There are more than two sides to the Middle East story, one of them is sexual violence - mass rape, genital mutilation of men and women, women tied to trees naked bloody leg spread…

There is a long history of the involvement of women’s bodies in Wars. It’s only 30 years ago, people started to say rape is not a tool of War and should be prosecuted as a war crime against humanity. The feminist, human rights, and civil rights groups made this happen. Now it happened again in Gaza according to the report released by U.N., however there are a lot difficulties in proving and testifying the truth e.g. they couldn’t locate a single victim, or they don’t have the victim rights to take pictures. But victims are dead and they cannot speak up. Denying the fact of sexual violence is just unacceptable. And there is such a great documentary shedding lights on the unspeakable sexual violence committed on Oct 7, 2023 that I think everyone should watch.

Good news is that the testimony of eyewitness meets the criteria of any international or global court. So crimes can be proven by any eyewitness for sure.

John Schulman - Reinforcement Learning from Human Feedback: Progress and Challenges [Link]

John Schulman is a research scientist and cofounder of OpenAI, focusing on Reinforcement Learning (RL) algorithms. He gave a talk on making AI more truthful on Apr 24, 2023 in UCB. The ideas and discussions are still helpful and insightful today.

In this talk, John discussed the issue of hallucination with large language models. He claims that behavior cloning or supervised learning is not enough to fix the hallucination problem, instead, reinforcement learning from human feedback (RLHF) can help improve the model’s truthfulness by 1) adjusting output distribution so model is allowed to express uncertainty, challenge premise, admit error, and 2) learning behavior boundaries. In his conceptual model, fine-tuning leads the model to hallucinate when it lacks knowledge. Retrieval and citing external sources can help improve verifiability. John discusses models that can browse the web to answer technical questions, citing relevant sources.

John mentioned three open problems in LLM: 1) how to train models to express uncertainty in natural language, 2) go beyond what human labelers can easily verify (“scalable oversight”), and 3) optimizing for true knowledge rather than human approval.

The 1-Year Old AI Startup That’s Rivaling OpenAI — Redpoint’s AI Podcast [Link]

A great interview with the CEO of Mistral Arthur Mensch on the topic of sovereignty and open models as a business strategy. Here are some highlighted points from Arthur:

Open-source is going to solidify in the future. It is an infrastructure technology and at the end of the day it should be modifiable and owned by customers. Now Mistral has two offerings, open source one and commercial one, and the aim is to find out the business model to sustain the open source development.
The things that Mistral is best at 1) training model, and 2) specializing models.
The way they think about partnership strategy is to look at what enterprises would need, where they were operating, where the developers were operating, and figure out the channels that would facilitate adoption and spread. To be a multiplatform solution and to replicate the solution to different platforms is a strategy that Mistral is following.
There is still an efficiency upper bound to be pushed. Other than compute to spend on pre-training, there is still research to do on improving model efficiency and strength. On architecture side, we can be more efficient than plain Transformer which spends same amount of compute on every token. Mistral is making model faster. By making model faster, we open up a lot of applications that involve an LLM as a basic brick and then we can figure out how to do planning, explorations, etc. By increasing efficiency, we open up areas of research.
Meta has more GPUs than Mistral do. But Mistral has a good concentration of GPU (number of GPU per person). This is the way to be as efficient as possible to come up with creative ways of training models. Also unit economics need to be considered to make sure that $\$1$ that you spend on training compute eventually accrues to more than $\$1$ revenue.
Transformer is not an optimal architecture. It’s been out there for 7 years now. Everything is co-adapted to it such as training methods, debug methods, the algorithms, and hardware. It’s challenging to find a better one and also beat the baseline. But there are a lot of research on modification of attention to boost memory efficiencies and a lot of things can be done in that direction and similar directions.
About AI regulations and EU AI Act, Arthur states that it does not solve the actual problem of how to make AI safe. Because making AI safe is a hard problem (stochastic model), different from the way we evaluate software before. It’s more like a product problem rather than a regulation problem. We need to rethink continuous integration, verifications, etc and make sure everything is happening as it should be.
Mistral recently released Le Chat to help enterprise start incorporating AI. It gives an assistant that is contextualized on their enterprise data. It’s a tool to be closer to the end user to get feedback for the developer platform and also a tool to get the enterprise into GenAI.

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI [Link]

Synthetic data is the next rage of LLM. Soumith pointed out that synthetic data is where we as humans already have good symbolic models off, we need to impart that knowledge to neural networks, and we figured out the synthetic data is a vehicle to impart this knowledge to it. Related to synthetic data but in an unusual way, there is new research on distilling GPT-4 by creating synthetic data from GPT-4, creating mock textbooks inspired by Phi-2 and then fine tuning open source models like Lambda.

Open source means different things to different people and we haven’t had a community norm definition yet at this very early stage of LLM. When being asked about open source, people in this field are used to highlight the definition of it in advance. In the open source topic, Soumith pointed out that the most beneficial value of open is it makes the distribution very wide and available with no friction so that people can do transformative things in a way that is very accessible.

Berkshire Hathaway 2024 Annual Meeting Movie: Tribute to Charlie Munger [Link]

First year that the annual meeting movie is made public. First year that the annual meeting is without Charlie. Already started to miss his jokes.

I think the reason why the car could have been completely reimagined by Apple is that they have a level of credibility and trust that I think probably no other company has, and absolutely no other tech company has. I think this was the third Steve Jobs story that I left out but in 2001, I launched a 99 cent download store and Steve Jobs just ran total circles around us, but the reason he was able to is he had all the credibility to go to the labels and get deals done for licensing music that nobody could get done before. I think that is an example of what Apple’s able to do which is to use their political capital to change the rules. So if the thing that we could all want is safer roads and autonomous vehicles, there are regions in every town and city that could be completely converted to level 5 autonomous zones. If I had to pick one company that had the credibility to go and change those rules, it’s them. Because they could demonstrate that there was a methodical safe approach to doing something. So the point is that even in these categories that could be totally reimagined, it’s not for a lack of imagination, again it just goes back to a complete lack of will. I understand because if you had 200B dollars of capital on your balance sheet, I think it’s probably easy to get fat and lazy. - Chamath Palihapitiya

― In conversation with Sam Altman — All-In Podcast [Link]

If you are a developer, the key thing to understand is where does model innovation end and your innovation begin, because if you get that wrong you will end up doing a bunch of stuff that the model will just obsolete in a few months. - David Sacks

The incentive for these folks is going to be push this stuff into the open source. Because if you solve a problem that’s operationally necessary for your business but it isn’t the core part of your business, what incentive do you have to really keep investing in this for the next 5 to 10 years to improve it. You are much better off release it in the open source, let the rest of the community take it over so that it’s available to everybody else, otherwise you are going to be stuck supporting it, and then if and when you ever wanted to switch out a model, GPT-4o, Claude, Llama, it’s going to be costly. The incentive to just push towards open source in this market if you will is so much meaningful than any other market. - Chamath Palihapitiya

I think the other thing that is probably true is a big measure at Google on the search page in terms of search engineer performance was the bounceback rate, meaning someone does a search, they go off to another site and they come back because they didn’t get the answer they wanted. Then one box launched which shows a short answer on the top, which basically keeps people from having a bad search experience, because they get the result right away. So a key metric is they are going to start to discover which vertical searches will provide the user a better experience than them jumping off to a third party page to get the same content. And then they will be able to monetize that content that they otherwise were not participating in the monetization of. So I think the real victim in all this is that long tale of content on the internet that probably gets cannibalized by the snippet one box experience within the search function. And then I do think that the revenue per search query in some of those categories actually has the potential to go up not down. You keep people on the page so you get more search volume there, you get more searches because of the examples you gave. And then when people do stay, you now have the ability to better monetize that particular search query, because you otherwise would have lost it to the third party content page. Keeping more of the experience integrated they could monetize the search per query higher and they are going to have more queries, and then they are going to have the quality of the queries go up. Going back to our earlier point about precision vs accuracy, my guess is there’s a lot of hedge fund type folks doing a lot of this Precision type of analysis trying to break apart search queries by vertical and try to figure out what the net effect will be of having better AI driven box and snippets. And my guess is that is why there is a lot of buying activity happening. I can tell you Meta and Amazon do not have an Isomorphic Lab and Waymo sitting inside their business, that suddenly pops to a couple hundred billion of market cap and Google does have a few of those. - David Friedberg

One thing I would say about big companies like Google or Microsoft is that the power of your monopoly determines how many mistakes you get to make. So think about Microsoft completely missed iPhone, remember they screwed up the whole smartphone era and it didn’t matter. Same thing here with Google, they completely screwed up AI. They invented the Transformer, completely missed LLMs. Then they had that fiasco where they have black George Washington. It doesn’t matter, they can make 10 mistakes but their monopoly is so strong, that they can finally get it right by copying the innovator, and they are probably going to be come 5T dollar company. - David Sacks

― GPT-4o launches, Glue demo, Ohalo breakthrough, Druckenmiller’s bet, did Google kill Perplexity? — All-In Podcast [Link]

Great conversations and insightful discussions as usual. Love it.

When you are over earning so massively, the rational thing to do for other actors in the arena is to come and attack that margin, and give it to people for slightly cheaper slightly faster slightly better so you can take share. So I think what you’re seeing and what you will see even more now is this incentive for Silicon Valley who has been really reticent to put money into chips, really reticent to put money into hardware. They are going to get pulled into investing this space because there is no choice. - Chamath Palihapitiya

Why? It’s not that intel was a worse company, but it’s that everything else caught up. And the economic value went to things that sat above them in the stack, then it want to Cisco for a while right, then after Cisco, it went to the browser companies for a little bit, then it went to the app companies, then it went to the device companies, then it went to the mobile companies. So you see this natural tendency for value to push up the stack over time. For AI, we’ve done the step one which is now you’ve given all this value to NVIDIA and now we are going to see it being reallocated. - Chamath Palihapitiya

The reason why they are asking these questions is that if you go back to the doom dot come boom in 1999, you can see that Cisco had this incredible run. And if you overlay the stock price of Nvidia, it seems to be following that same trajectory. And what happened with Cisco is that when the doc come crash came in 2000, Cisco stock lost a huge part of its value. Obviously Cisco is still around today and it’s a valuable company, but it just hasn’t ever regained the type of market cap it had. The reason this happened is because Cisco got commoditized. So the success and market cap of that company attracted a whole bunch of new entrance and they copied Cisco’s products until they were total commodities. So the question is whether that happened to Nvidia. I think the difference here is that at the end of the day Network equipment which Cisco produced was pretty easy to copy, whereas if you look at Nvidia, these GPU cores are really complicated to make. So it’s a much more complicated product to copy. And then on top of that, they are already in the R&D cycle for the next chip. So I think you can make the case that Nvidia has a much better moat than Cisco. - David Sacks

I think Nvidia is going to get pulled into competing directly with the hyperscalers. So if you were just selling chips, you probably wouldn’t, but these are big bulky actual machines, then all of a sudden you are like well why don’t I just create my own physical plant and just stack these things, and create racks and racks of these machines. It’s not a far stretch especially because Nvidia actually has the software interface that everybody uses which is CUDA. I think it’s likely that Nvidia goes on a full frontal assault against GCP and Amazon and Microsoft. That’s going to really complicate the relationship that those folks have with each other, but I think it’s inevitable because how do you defend an enormously large market cap, you are forced to go into businesses that are equally lucrative. Now if I look inside of compute and look at the adjacent categories, they are not going to all of a sudden start a competitor to TikTok or a social network, but if you look at the multi hundred billion revenue businesses that are adjacent to the markets that Nvidia enables, the most obvious ones are the hyperscalers. So they are going to be forced to compete otherwise their market cap will shrink and I don’t think they want that, and then it’s going to create a very complicated set of incentives for Microsoft and Google and Meta and Apple and all the rest. And that’s also going to be an accelerant, they are going to pump so much money to help all of these upstarts. - Chamath Palihapitiya

Economy is bad without recognizing that it is an inflationary experience whereas economists use the definition of “economic growth” being gross product, and so if gross product or gross revenue is going up they are like oh the economy is healthy we are growing. But the truth is we are funding that growth with leverage at the national level the federal level and at the household a domestic level. We are borrowing money to inflate the revenue numbers , and so the GDP goes up but the debt is going higher, and so the ability for folks to support themselves and buy things that they want to buy and continue to improve their condition in life has declined if things are getting worse… The average American’s ability to improve their condition has largely been driven by their ability to borrow not by their earnings. - David Friedberg

Scarlett Johansson vs OpenAI, Nvidia’s trillion-dollar problem, a vibecession, plastic in our balls [Link]

It’s a fun session and it made my day :). Great discussions about Nvidia’s business, America’s negative economic sentiment, harm of plastics, etc.

Building with OpenAI What’s Ahead [Link]

Papers and Reports

Large Language Models: A Survey [Link]

This is a must-read paper if you would like to have a comprehensive overview of SOTA LLMs, technical details, applications, datasets, benchmarks, challenges, and future directions.

Little Guide to Building Large Language Models in 2024 - HuggingFace [Link]

Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks [Link]

Bloomberg fine-tuned GPT-3.5 on their financial data only to find that GPT-4 8k, without specialized finance fine-tuning, beat it on almost all finance tasks. So there is really a moat? Number of parameters matters and data size matters, and they all require compute and money.

Jamba: A Hybrid Transformer-Mamba Language Model [Link] [Link]

Mamba paper has been rejected while fruits are reaped fast: MoE-Mamba, Vision Mamba, and Jamba. It’s funny to see the asymmetric impact in ML sometimes, e.g. FlashAttention has <500 citations and is used everywhere. Github repos used by 10k+ has <100 citations, etc.

KAN: Kolmogorov-Arnold Networks [Link] [authors-note]

This is a mathematically beautiful idea. The main difference between traditional MLP and KAN is that KAN has learnable activation function on weights, so all weights in KAN are non-linear. KAN outperforms MLP in accuracy and interpretability. Whether in the future KAN is able to replace MLP depends on whether there could be suitable learning algorithms like SGD, AdamW, etc and whether it will be GPU friendly.

The Platonic Representation Hypothesis [Link]

Interesting paper to read if you like philosophy. This paper argues that there is a platonic representation as a result of convergence of AI models towards a shared statistical model of reality. They show that there is a growing similarity in data representation across different model architectures, training objectives, and data modalities, as the model size, data size, and task diversity are growing. They also proposed three hypothesis for the representation convergence: 1) The multitask scaling hypothesis, 2) The capacity hypothesis, and 3) The simplicity bias hypothesis. And it definitely worths reading the counterexamples and limitations.

Frontier Safety Framework - Google DeepMind [Link]

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model [Link]

One main improvement: Multi-head latent attention via compressed latent KV requires smaller amount of KV cache per token but achieves stronger performance. Heads can be compressed differently (taking different portion of compressed latent states), and keys and values can be compressed differently.

What matters when building vision-language models [Link]

The Unreasonable Ineffectiveness of the Deeper Layers [Link]

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models [Link]

This paper published by Google DeepMind proposes language model called RecurrentGemma that can match or exceed the performance of transformer-based models while being more memory efficient.

Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach - Google’s Tech Report of LearnLM [Link]

Chameleon: Mixed-Modal Early-Fusion Foundation Models [Link]

This paper published by Meta proposed a mixed model which uses Transformer architecture under the covers but applies some innovations such as query-key normalization to fix the imbalance between the text and image tokens and other innovations as well.

Simple and Scalable Strategies to Continually Pre-train Large Language Models [Link]

Tricks for successful continued pretraining:

Re-warming and re-decaying the learning rate.
Adding a small portion (e.g., 5%) of the original pretraining data (D1) to the new dataset (D2) to prevent catastrophic forgetting.
Note that smaller fractions like 0.5% and 1% were also effective.

Cautious about their validity on model with larger sizes.

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study [Link]

Algorithmic Progress in Language Models [Link]

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws [Link]

Efficient Multimodal Large Language Models: A Survey [Link]

Good overview of multimodal LLMs.

Financial Statement Analysis with Large Language Models [Link]

LoRA Learns Less and Forgets Less [Link]

Lessons from the Trenches on Reproducible Evaluation of Language Models [Link]

Challenges and best practices in evaluating LLMs.

Agent Planning with World Knowledge Model [Link]

GitHub Repo

Google Research Tune Playbook - GitHub [Link]

ML Engineering - GitHub [Link]

LLM from Scratch [Link]

Prompt Engineering Guide [Link] [Link]

ChatML + chat templates + Mistral v3 7b full example [Link]

Finetune pythia 70M [Link]

Llama3 Implemented from Scratch [Link]

News

Intel Inside Ohio [Link]

Intel Ohio One Campus Video Rendering [Link]

Intel Corp has committed $\$28$B to build a “mega fab” called Ohio One which could be the biggest chip factory on Earth. The Biden administration has agreed to provide Intel with $\$19.5$B in loans and grants to support finance the project.

EveryONE Medicines: Designing Drugs for Rare Diseases, One at a Time [Link]

Startup EveryONE Medicine aims to develop drugs designed based on genetic information for individual children who have rare, life-threatening neurological diseases. Since the number of patients with diseases caused by rare mutation is significant, the market share is large if EveryONE can scale its process. Although the cost won’t be the same as a standard drugmaker that runs large clinical trials, the challenge is safety without a standard clinical-testing protocol. To be responsible to patients, the initial drugs will have a temporary effect and a wide therapeutic window, so the potential toxicity will be minimized or stopped if there is.

Voyager 1’s Communication Malfunctions May Show the Spacecraft’s Age [Link]

In Nov 2023, NASA’s over 46-year-old Voyager 1 spacecraft started sending nonsense to Earth. Voyager 1 was initially intended to study Jupiter and Saturn and was built to survive only 5 years of flight, however the trajectory was forged further and further into space and so the mission converted from a two-planet mission to an interstellar mission.

In Dec 2023, the mission team restarted the Flight Data Subsystem (FDS) but failed to return the subsystem to functional state. On Mar 1 2023, they sent a command “poke” to the probe and received a response on Mar 3. On Mar 10, the mission team finally determined the response carried a readout of FDS memory. By comparing the readout with those received before the issue, the team confirmed that 3% of FDS memory was corrupted. On Apr 4, the team concluded the affected code was contained on a computer chip. To solve the problem, the team decided to divide these affected code into smaller sections and to insert those smaller sections into other operative places in the FDS memory. During Apr 18-20, the team sent out the orders to move some of the affected code and received responses with intelligible systems information.

Editing the Human Genome with AI [Link]

Berkeley based startup Profluent Bio used an AI based protein language model to create and train on an entirely new library of Cas proteins that do not exist in nature today and eventually find one called ‘OpenCRISPR-1’ that is able to replace or improve the ones that are on the market today. The goal of this AI model is to learn what sequence of DNA generated what structure of protein that’s really good at gene editing. The new library of Cas proteins is created by simulation of trillions of letters. They made ‘OpenCRISPR-1’ publicly available under an open source license so anyone can use this particular Cas protein.

Sony and Apollo in Talks to Acquire Paramount [Link]

Paramount’s stock declined 44% in 2022 and another 12% in 2023. It’s experiencing declining revenue as consumers abandon traditional pay-TV and it’s losing streaming business. Berkshire sold its entire Paramount shares in March 2023 and soon Sony Pictures and Apollo Globals Management reached out to Paramount board expressing interest of acquisition. Now Paramount decided to open negotiation with them after exclusive talks with Hollywood studio Skydance. This deal would break the Paramount and potentially transform the media landscape if successful. Otherwise an office of the CEO as the replacement of CEO Bob Bakish will be preparing a long term plan for the company.

AlphaFold 3 predicts the structure and interactions of all of life’s molecules [Link]

Previously, Google DeepMind AlphaFold project took 3D images of proteins and the DNA sequence that codes for those proteins and then they built a predictive model that predicted the 3D structure of protein base on DNA sequence. What is difference in AlphaFold 3 is that all small molecules are included. The way how small molecules are bind together with the protein is part of the predictive model. This is a breakthrough in that off target effect could be minimized by taking consideration of other molecules’ interactions in the biochemistry environment. Google has a drug development subsidiary called Isomorphic Labs. They kept all of IP for AlphaFold 3. They published a web viewer for non-commercial scientists to do fundamental research but only Isomorphic Labs can make it for commercial use.

Introducing GPT-4o and making more capabilities available for free in ChatGPT [Link]

I missed the live announcement but watched the recording. GPT-4o is amazing.

One of the interesting technical difference made is tokenizer delta. GPT-4 and GPT-4-Turbo both had a tokenizer with a vocabulary of 100k tokens. GPT-4o has a tokenizer with 200k tokens to work better for native multimodality and multilingualism. The more tokens the more efficient in generating characters.

“Our goal is to make it effortless for people to go anywhere and get anything,” said Dara Khosrowshahi, CEO of Uber. “We’re excited that this new strategic partnership with Instacart will bring the magic of Uber Eats to even more consumers, drive more business for restaurants, and create more earnings opportunities for couriers.”

― Uber Eats to Power Restaurant Delivery on Instacart [Link]

Project Astra: Our vision for the future of AI assistants [Link]

Google Keynote (Google I/O 24’) [Link]

This developer conference is about Google’s AI related product updates. Highlighted features: 1) AI Overview for search 2) Ask Photos, 3) 2M context window, 4) Google Workspace, 5) NotebookLM, 6) Project Astra, 7) Imagen 3, 8) Music AI Sandbox, 9) Veo, 10) Trillium TPU, 11) Google Serach, 12) Asking Questions with Videos, 13) Gemini interacting with Gmail and data, 14) Gemini AI Teammate, 15) Gemini App, and upgrades, 16) Gemini Trip Planning.

Leike went public with some reasons for his resignation on Friday morning. “I have been disagreeing with OpenAI leadership about the company’s core priorities for quite some time, until we finally reached a breaking point,” Leike wrote in a series of posts on X. “I believe much more of our bandwidth should be spent getting ready for the next generations of models, on security, monitoring, preparedness, safety, adversarial robustness, (super)alignment, confidentiality, societal impact, and related topics. These problems are quite hard to get right, and I am concerned we aren’t on a trajectory to get there.”

― OpenAI created a team to control ‘superintelligent’ AI — then let it wither, source says [Link]

Other News:

Encampment Protesters Set Monday Deadline for Harvard to Begin Negotiations [Link]

Israel Gaza war: History of the conflict explained [Link]

Cyber Stuck: First Tesla Cybertruck On Nantucket Has A Rough Day [Link]

Apple apologizes after ad backlash [Link]

Apple nears deal with OpenAI to put ChatGPT on iPhone: Report [Link] [Link]

Reddit announces another big data-sharing AI deal — this time with OpenAI [Link]

Apple Will Revamp Siri to Catch Up to Its Chatbot Competitors [Link]

OpenAI strikes deal to bring Reddit content to ChatGPT [Link]

Zuck's Strategy behind Open Source

Posted on 2024-04-29

Random words:

Music teacher never answered my question: why should Triangle be included in a piece of music while there are already 10+ instruments and sounds loud in there? I accidentally got the answer from my dance teacher. She said: “different people have different hearing capabilities and thus different understanding of music, what dancers are doing is actually to interpret or reproduce music.”

Back to the topic:

There is always a lot to learn about strategic thinking from Zuck. Here are some of his smart strategies behind open source I’ve learned:

According to this interview, Zuck’s point of open source is to avoid concentration of AI while he didn’t ignore the harmful consequences of open source saying that it’s our responsibility to do a good job of reducing harm. There are several benefits of open source, one is that people could figure out cheaper ways to develop models so it won’t cost too much resource. The other benefit is that they could enable more efficient developments and vertical use cases in a lot of different systems. Take Google and Apple for example, their mobile ecosystems restricted what developers could build or what features they could launch on them.
For companies like Meta with well-established network effect, they really don’t need to have the best model. AI’s content creation potential benefits Meta’s platforms, even if the models are not exclusively theirs. This is the most reasonable reason from business perspective and was stated in an earnings call.
By open-sourcing models, Meta started developer communities which can contribute to whatever the ecosystem Meta built and help solidify the advantage of it. Most recent example is the open model of Horizon OS which powers its VR headsets. It allows developers and creators to take advantage of these technologies to create MR experiences and grow business on it. Then Meta Quest Store can be quickly established.
Models themselves are not a moats. Moats are built through data and habits. Open source eventually makes economic value of foundational model disintegrated. There will be no value in foundational model economically and there is probably less point for VC to plow billions of dollars into a foundational model development startup. The potential economic values are in 100K+ developers iteratively and quickly training and deploying the open source models for specific business use cases. Inference will be way more important than training. So attention and money will be less concentrated to products like OpenAI GPT series and Nvidia training GPU but more on Cloud platforms with inference GPU for personal and business usage.

Concerns About AI

Posted on 2024-04-21

I have read a lot of news and articles, and watched a few interviews regarding AI safety these days. The future of AI is promising. People are working towards more powerful AI and personalized AI assistants or agents. At the same time, AI causes problems. A very obvious downside of AI is people could use AI to do harmful things, but I think we are good to work together and prevent that from different angles such as legal aspect or open source software. I worry more about things that are not easily avoided and cannot be seen at least in these years when AI is still immature - which is being too early to rely on AI and thus deviating from truth. For example, it is too soon for us to lose faith in jobs like teachers, historians, journalists, writers, etc, but I’m concerned we are already losing faith in those jobs because of the development of AI and some violations of copyrighted work. As we have seen AI could have a wrong understanding of facts, have biased opinions, and make things up that don’t exist, we could fight for the truth but the dead cannot speak for themselves. It would be pathetic if humans lived in hallucinations in the future, and I don’t know if there’s any good practice to prevent it. It’s like Pandora’s box is opened and complications cannot be stopped. But we should at least think seriously about the potential impact of AI on society and human consciousness and possible unexpected consequences.

2024 April - What I Have Read

Posted on 2024-04-01

Articles

As Columbia Business School professor Rita McGrath points out, it’s about identifying “the strategic inflection points” at which the cost of waiting exceeds to cost acting — in other words, identifying the most strategic point to enter a market or adopt a technology, balancing the risks and opportunities based on market readiness, technological maturity and organizational capacity.

This speaks to the growing adoption of agile, “act, learn, build” approaches over traditional “prove-plan-execute” orientations. The popularity of techniques like discovery-driven planning, the lean startup, and other agile approaches and propagated this philosophy in which, rather than building bullet-proof business cases, one makes small steps, learning from them, and deciding whether to invest further.

― “6 Strategic Concepts That Set High-Performing Companies Apart”, Harvard Business Review [Article]

It’s a very good read. It provided real world business examples such as Nvidia’s partnership with ARM Holdings and Amazon’s Alexa offering for the strategic concept “borrow someone’s road”, Microsoft’s decision to make Office available on Apple’s iOS devices in 2014 and Microsoft’s partnership with Adobe, Salesforce, and Google for the strategic concept “Parter with a third party”, Deere & Co’s decision on openly investing in precision agriculture technologies for the strategic concept “reveal your strategy”, Mastercard’s “Beyond Cash” initiative in 2012 for the strategic concept “be good”, Ferrari’s strategic entry into the luxury SUV market for the strategic concept “let the competition go”, and Tesla’s modular approach to battery manufacturing for the strategic concept “adopt small scale attacks”.

Gig work is structured in a way that strengthens the alignment between customers and companies and deepens the divide between customers and workers, leading to systemic imbalances in its service triangle.

Bridging the customer-worker divide can result in higher customer trust and platform commitment, both by the customer and the worker.

To start, platforms need to increase transparency, reduce information asymmetry, and price their services clearly, allowing customers to better understand what they are paying for rather than only seeing an aggregated total at the end of the transaction. This, in turn, can help customers get used to the idea that if workers are to be paid fairly, gig work cannot be a free or low-cost service.

Gig workers might be working through an app, but they are not robots, and they deserve to be treated respectfully and thoughtfully. So tip well, rate appropriately, and work together to make the experience as smooth as possible both for yourself and for workers.

― “How Gig Work Pits Customers Against Workers”, Harvard Business Review [Article]

This is a good article for better understanding how gig work structured differently than other business model, and what the key points are for better business performance and triangle relationships.

TCP/IP unlocked new economic value by dramatically lowering the cost of connections. Similarly, blockchain could dramatically reduce the cost of transactions. It has the potential to become the system of record for all transactions. If that happens, the economy will once again undergo a radical shift, as new, blockchain-based sources of influence and control emerge.

“Smart contracts” may be the most transformative blockchain application at the moment. These automate payments and the transfer of currency or other assets as negotiated conditions are met. For example, a smart contract might send a payment to a supplier as soon as a shipment is delivered. A firm could signal via blockchain that a particular good has been receivedor the product could have GPS functionality, which would automatically log a location update that, in turn, triggered a payment. We’ve already seen a few early experiments with such self-executing contracts in the areas of venture funding, banking, and digital rights management.

The implications are fascinating. Firms are built on contracts, from incorporation to buyer-supplier relationships to employee relations. If contracts are automated, then what will happen to traditional firm structures, processes, and intermediaries like lawyers and accountants? And what about managers? Their roles would all radically change. Before we get too excited here, though, let’s remember that we are decades away from the widespread adoption of smart contracts. They cannot be effective, for instance, without institutional buy-in. A tremendous degree of coordination and clarity on how smart contracts are designed, verified, implemented, and enforced will be required. We believe the institutions responsible for those daunting tasks will take a long time to evolve, And the technology challenges especially security are daunting.

― “The Truth About Blockchain”, Harvard Business Review [Article]

This is the second Blockchain related article I have read from Harvard Business Review. Different authors have different perspectives. Unlike the previous article with a lot of concerns and cautions about Web3, this article seems more optimistic. It proposed a framework for adopting blockchain to revolutionize modern business, and a guidance to Blockchain investment. It points out that Blockchain has great potentials in boosting the efficiency and reducing the cost for all transactions and then explained the reason why the adoption of Blockchain would be slow by making a comparison with TCP/IP, which took more than 30 years to reshape the economy by dramatically lowering the cost of connections. This is an interesting comparison: e-mail enabled bilateral messaging as the first application of TCP/IP, while bitcoin enables bilateral financial transactions as the first application of Blockchain. It reminds me about what people (Jun Lei, Huateng Ma, Lei Ding, etc) were thinking about internet mindset and business model back in 2000s.

In the end, the authors proposed a four-quadrant framework for adopting Blockchain step by step. The four quadrants are created by two dimensions: novelty (equivalent to the amount of efforts required to ensure users understand the problem) and complexity (equivalent to the amount of coordination and collaboration required to produce values). With the increase of both dimensions, the adoption will require more institutional change. An example of “low novelty and low complexity” is simply adding bitcoin as an alternative transaction method. An example of “low novelty and high complexity” is building a new, fully formed cryptocurrency system which requires wide adoption from every monetary transaction party and consumers’ complete understanding of cryptocurrency. An example of “high novelty and low complexity” is building a local private network on which multiple organizations are connected via a distributed ledger. An example of “high novelty and high complexity” is building “smart contracts”.

News

Does Amazon’s cashless Just Walk Out technology rely on 1,000 workers in India? [Link]

Amazon insists Just Walk Out isn’t secretly run by workers watching you shop [Link]

An update on Amazon’s plans for Just Walk Out and checkout-free technology [Link]

It’s been reported that there are over 1000 Indian workers behind the cameras of Just Walk Out. It sounds dystopian and reminds me of “Snowpiercer” movie in 2013. In 2022, about 700 of every 1000 Just Walk Out sales had to be reviewed by Amazon’s team in India, according to The Information. Amazon spokesperson explained that the technology is made by AI (computer vision and deep learning) while it does rely on human moderators and data labelers. Amazon clarified that it’s not true that Just Walk Out relies on human reviewers. They said object detection and receipt generation are completely AI powered, so no human watching live videos. But human are responsible for labeling and annotation for data preparation, which also requires watching videos.

I guess the technology was not able to complete the task end-to-end by itself without supervision or it’s still on the developing stage? I believe it could be Amazon’s strategy to build and test Just Walk Out, Amazon Dash Cart, and Amazon One at the same time while improving AI system, since they are “just getting started”. As Amazon found out that customers prefer Dash Cart in large stores, it has already expanded Dash Cart to all Amazon Fresh stores as well as third-party grocers. And customers prefer Just Walk Out in small stores, so it’s available now in 140+ thrid-party locations. Customers love Amazon One’s security and convenience regardless the scale of stores, so it’s now available at 500+ Whole Foods Market stores, some Amazon stores, and 150+ third-party locations.

Data centres consume water directly to prevent information technology equipment from overheating. They also consume water indirectly from coal-powered electricity generation.

The report said that if 100 million users had a conversation with ChatGPT, the chatbot “would consume 50,000 cubic metres of water – the same as 20 Olympic-sized swimming pools – whereas the equivalent in Google searches would only consume one swimming pool”.

― China’s thirsty data centres, AI industy could use more water than size of South Korea’s population by 2030: report warns [Link]

The rapid growth of AI could dramatically increase demand on water resources. AI eats tokens, consumes compute, and drinks water.

15 Graphs That Explain the State of AI in 2024 [Link]

Stanford Institute for Human-Centered Artificial Intelligence (HAI) published 2024’s AI Index report [Link]. 502-page reading journey started.

― Leaked emails reveal why Mark Zuckerberg bought Instagram [Link]

Zuckerberg’s discussion of Instagram acquisition back in 2012 proved his corporate strategic foresights. He was aiming to buy the time and network effect, rather than simply neutralizing competitors or improving products. He bought Instagram for $\$1$B, today it is worth $\$500$B. It’s very impressive.

Introducing Meta Llama 3: The most capable openly available LLM to date [Link]

Llama 3: Scaling open LLMs to AGI [Link]

Meta released early versions of Llama 3. Pretrained and instruction-fine-tuned Llama3 with 8B and 70B parameters are now open-source. Its 405B version is still training.

Llama 3 introduces Grouped Query Attention (GQA), which reduces the computational complexity of processing large sequences by grouping attention queries. Llama 3 also had extensive pre-training involving over 15 trillion tokens, including a significant amount of content in different languages, enhancing its applicability across diverse linguistic contexts. Post-training techniques include finetuning and rejection sampling which refine the model’s ability to follow instructions and minimize error.

Cheaper, Better, Faster, Stronger - Continuing to push the frontier of AI and making it accessible to all. [Link]

Mistral AI’s Mixtral 8x22B has a Sparse Mixture-of-Experts (SMoE) architecture, which maximize efficiency by activating only 44B out of 176B parameters. The model’s architecture ensures that only the most relevant “experts” are activated during specific tasks. The experts are individual neural networks as apart of SMoE model. They are trained to become proficient at particular sub-tasks out of the overall task. Since only a few experts are engaged for any given input, this design reduces computational complexity.

GPT-4 Turbo and GPT-4 [Link]

GPT-4-Turbo has significantly enhanced its multimodal capabilities by incorporating AI vision technology. This model is able to analyze videos, images, and audios. Its tokenizer now has a larger 128000 token context window, which maximizes its memory.

The race to lead A.I. has become a desperate hunt for the digital data needed to advance the technology. To obtain that data, tech companies including OpenAI, Google and Meta have cut corners, ignored corporate policies and debated bending the law, according to an examination by The New York Times.

Tech companies are so hungry for new data that some are developing “synthetic” information. This is not organic data created by humans, but text, images and code that A.I. models produce — in other words, the systems learn from what they themselves generate.

― How Tech Giants Cut Corners to Harvest Data for A.I. [Link]

OpenAI developed a speech recognition tool ‘Whisper’ to transcribe the audio from YouTube videos, generating text data for AI system. Google employees know OpenAI had harvested YouTube videos for data but they didn’t stop OpenAI because Google had also used transcripts of YouTube videos for training AI models. Google’s rules about the legal usage of YouTube videos is vague and OpenAI’s employee were wading into a legal gray area.

As many tech companies such as Meta and OpenAI reached the stage of data shortage, OpenAI started to train AI models by using synthetic data synthesized by two different AI models, one produces the data, the other judges the information.

Grok-1.5 Vision Preview [Link]

Musk released the preview of first multimodal model Grok-1.5V. It is able to understand both textual and visual information. One unique feature is that it adopts Rust, JAX, and Kubernetes to construct its distributed training architecture.

One page of the Microsoft presentation highlights a variety of “common” federal uses for OpenAI, including for defense. One bullet point under “Advanced Computer Vision Training” reads: “Battle Management Systems: Using the DALL-E models to create images to train battle management systems.” Just as it sounds, a battle management system is a command-and-control software suite that provides military leaders with a situational overview of a combat scenario, allowing them to coordinate things like artillery fire, airstrike target identification, and troop movements. The reference to computer vision training suggests artificial images conjured by DALL-E could help Pentagon computers better “see” conditions on the battlefield, a particular boon for finding — and annihilating — targets.

OpenAI spokesperson Liz Bourgeous said OpenAI was not involved in the Microsoft pitch and that it had not sold any tools to the Department of Defense. “OpenAI’s policies prohibit the use of our tools to develop or use weapons, injure others or destroy property,” she wrote. “We were not involved in this presentation and have not had conversations with U.S. defense agencies regarding the hypothetical use cases it describes.”

Microsoft told The Intercept that if the Pentagon used DALL-E or any other OpenAI tool through a contract with Microsoft, it would be subject to the usage policies of the latter company. Still, any use of OpenAI technology to help the Pentagon more effectively kill and destroy would be a dramatic turnaround for the company, which describes its mission as developing safety-focused artificial intelligence that can benefit all of humanity.

― Microsoft Pitched OpenAI’s DALL-E as Battlefield Tool for U.S. Military [Link]

Other than what has mentioned in the news, by cooperating with Department of Defense, AI can understand how human battle and defense, which is hard to learn from current textual and visual information from the internet. So it’s possible that this is the first step of AI troop.

Microsoft scientists developed what they call a qubit virtualization system. This combines quantum error-correction techniques with strategies to determine which errors need to be fixed and the best way to fix them.

The company also developed a way to diagnose and correct qubit errors without disrupting them, a technique it calls “active syndrome extraction.” The act of measuring a quantum state such as superposition typically destroys it. To avoid this, active syndrome extraction instead learns details about the qubits that are related to noise, as opposed to their quantum states, Svore explains. The ability to account for this noise can permit longer and more complex quantum computations to proceed without failure, all without destroying the logical qubits.

― Microsoft Tests New Path to Reliable Quantum Computers 1,000 physical qubits for each logical one? Try a dozen, says Redmond [Link]

Think about it in the sense of another broad, diverse category like cars. When they were first invented, you just bought “a car.” Then a little later, you could choose between a big car, a small car, and a tractor. Nowadays, there are hundreds of cars released every year, but you probably don’t need to be aware of even one in ten of them, because nine out of ten are not a car you need or even a car as you understand the term. Similarly, we’re moving from the big/small/tractor era of AI toward the proliferation era, and even AI specialists can’t keep up with and test all the models coming out.

― Too Many Models [Link]

This week, the speed of releasing LLMs becomes about 10 per week. This article provides good explanation about why we don’t need to keep up with it or test all released models. Car is a good analogy to AI model nowadays. There are all kinds of brands and sizes, and designed for different purposes. Hundreds of cars are released every year, but you don’t need to know them. Majority of the models are not groundbreaking but whenever there is big step, you will be aware of it.

Although not necessary to catch up all the news, we at least need to be aware of the main future model features - where modern and future LLMs are heading to: 1) multimodality 2) recall capability 3) reasoning.

ByteDance Exploring Scenarios for Selling TikTok Without Algorithm [Link]

ByteDance is internally exploring scenarios for selling TikTok’s US business to non-tech industry without the algorithm if they exhausted all legal options to fight legislation of the ban. Can’t imagine who without car expertise is going to buy a car without engine.

Developers and creators can take advantage of all these technologies to create mixed reality experiences. And they can reach their audiences and grow their businesses through the content discovery and monetization platforms built into Meta Horizon OS, including the Meta Quest Store, which we’ll rename the Meta Horizon Store.

Introducing Our Open Mixed Reality Ecosystem [Link]

Everyone knows how smart Zuck is in the idea of open-source.

Other news:

Elon Musk says Tesla will reveal its robotaxi on August 8th [Link]

SpaceX launches Starlink satellites on record 20th reflight of a Falcon 9 rocket first stage [Link]

Deploy your Chatbot on Databricks AI with RAG, DBRX Instruct, Vector Search & Databricks Foundation Models [Link]

Adobe’s ‘Ethical’ Firefly AI Was Trained on Midjourney Images [Link]

Exclusive: Microsoft’s OpenAI partnership could face EU antitrust probe, sources say [Link]

Meta AI adds Google Search results [Link]

Our next-generation Meta Training and Inference Accelerator [Link]

Meta’s new AI chips run faster than before [Link]

Anthropic-cookbook: a collection of notebooks / recipes showcasing some fun and effective ways of using Claude [Link]

Amazon deploys 750,000+ robots to unlock AI opportunities [Link]

Apple’s four new open-source models could help make future AI more accurate [Link]

The Mystery of ‘Jia Tan,’ the XZ Backdoor Mastermind [Link]

YouTube

If someone whom you don’t trust or an adversary gets something more powerful, then I think that that could be an issue. Probably the best way to mitigate that is to have good open source AI that becomes the standard and in a lot of ways can become the leader. It just ensures that it’s a much more even and balanced playing field.

― Mark Zuckerberg - Llama 3, $10B Models, Caesar Augustus, & 1 GW Datacenters [Link]

What I learned from this interview: The future of Meta AI would be a kind of AI general assistant product where you give it complicated tasks and then it goes away and does them. Meta will probably build bigger clusters. No one has built 1GW data center yet but building it could just be a matter of time. Open source can be both bad and good. People can use LLM to do harmful things, while what Mask worries more about is the concentration of AI, where there is an untrustworthy actor having the super strong AI. Open source software can make AI not getting stuck in one company but can be broadly deployed to a lot of different systems. People can set standards on how it works and AI can get checked and upgraded together.

It is clear that inference was going to be a scaled problem. Everyone else had been looking at inference as you take one chip, you run a model on it, it runs whatever. But what happened with AlphaGo was we ported the software over, and even though we had 170 GPUs vs 48 TPUs, the 48 TPUs won 99 out of 100 games with the exact same software. What that meant was compute was going to result in better performance. And so the insight was - let’s build scaled inference.

(Nvidia) They have the ecosystem. It’s a double-sided market. If they have a kernel-based approach they already won. There’s no catching up. The other way that they are very good is vertical integration and forward integration. What happens is Nvidia over and over again decides that they want to move up the stack, and whatever the customers are doing, they start doing it.

Nvidia is incredible at training. And I think the design decision that they made including things like HBM, were really oriented around the world back then, which was everything is about training. There weren’t any real world application. None of you guys were really building anything in the wild where you needed super fast inference.

What we saw over and over again was you would spend 100% of your compute on training, you would get something that would work well enough to go into production, and then it would flip to about 5%-10% training and 90%-95% inference. But the amount of training would stay the same, the inference would grow massively. And so every time we would have a success at Google, all of a sudden, we would have a disaster, we called it the success disaster, where we can’t afford to get enough compute for inference.

HBM is this High Bandwidth Memory which is required to get performance, because the speed at which you can run these applications depends on how quickly you can read that into memory. There’s a finite supply, it’s only for data centers, so they can’t reach into the supply for mobile or other things, like you can with other parts. Also Nvidia is the largest buyer of super caps in the world and all sorts of other components. The 400 gigabit cables, they’ve bought them all out. So if you want to compete, it doesn’t matter how good of a product you design, they’ve bought out the entire supply chain for years.

The biggest difference between training and inference is when you are training, the number of tokens that you are training on is measured in month, like how many tokens can we train on this month. In inference, what matters is how many tokens you can generate per millisecond or a couple milliseconds.

It’s fair to say that Nvida is the exemplar in training but really isn’t yet the equivalent scaled winner in inference.

In order to get the latency down, we had to design a completely new chip architecture, we had to design a completely new networking architecture, an entirely new system, an entirely new runtime, an entirely new compiler, and entirely new orchestration layer. We had to throw everything away and it had to be compatible with PyTorch and what other people actually developing in.

I think Facebook announced that by the end of this year, they are going to have the equivalent of 650000 H100s. By the end of this year, Grok will have deployed 100000 of our LPUs which do outperform the H100s on a throughput and on a latency basis. So we will probably get pretty close to the equivalent of Meta ourselves. By the end of next year, we are going to deploy 1.5M LPUs, for comparison, last year Nvidia deployed a total of 500000 H100s. So 1.5M means Grok will probably have more inference GenAI capacity than all of the hyperscalers and clouds service providers combined. So probably about 50% of the inference compute in the world.

I get asked a lot should we be afraid of AI and my answer to that is, if you think back to Galileo, someone who got in a lot of trouble. The reason he got in trouble was he invented the telescope, popularized it, and made some claims that we were much smaller than everyone wanted to believe. The better the telescope got the more obvious it became that we were small. In a large sense, LLMs are the telescope for the mind, it’s become clear that intelligence is larger than we are and it makes us feel really really small and it’s scary. But what happened over time was as we realized the universe was larger than we thought and we got used to that, we started to realize how beautiful it was and our place in the universe. And I think that’s what’s going to happen. We’re going to realize intelligence is more vast than we ever imagined. And we are going to understand our place in it, and we are not going to be afraid of it.

― Conversation with Groq CEO Jonathan Ross [Link]

This is a very insightful conversation especially in the part of comparison of training and inference. The answer to the final question is fascinating to end the conversation. A great takeaway that “intelligence is a telescope for the mind, in that we realize that we are small, while then also opportunity to see intelligence is vast and to not be afraid of it.”.

Meta Announces Llama 3 at Weights & Biases’ Conference - Weights & Biases [Link]

Economic value is getting disintegrated, there no value in foundational models economically. So then the question is who can build on top of them the fastest. Llama was announced last Thursday, 14 hours later Groq actually had that model deployed in the Groq Cloud, so that 100K+ developers could start building on it. That’s why that model is so popular. It puts the closed models on their heels. Because if you can’t both train and deploy iteratively and quickly enough, these open source alternatives will win, and as a result the economic potential that you have to monetize those models will not be there. - Chamath Palihapitiya

By open-sourcing these models they limit competition because VCs are no longer going to plow half a billion dollars into a foundational model development company, so you limit the commercial interest and the commercial value of foundational models. - David Friedberg

AI is really two markets - training and inference. And inference is going to be 100 times bigger than training. And Nvidia is really good at training and very miscast at inference. The problem is that right now we need to see a capex build cycle for inference, and there are so many cheap and effective solutions, Groq being one of them but there are many others. And I think why the market reacted very negatively was that it did not seem that Facebook understood that distinction, that they were way overspending and trying to allocate a bunch of GPU capacity towards inference that didn’t make sense. - Chamath Palihapitiya

You want to find real durable moats not these like legal arrangement that try to protect your business through these types of contracts. One of the reasons why the industry moves so fast is best practices get shared very quickly, and one of the ways that happens is that everybody is moving around to different companies (average term of employment is 18-36 months). There are people who violate those rules (taking code to the new company etc), and that is definitely breaking the rules, but you are allowed to take with you anything in your head, and it is one of the ways that best practices sort of become more common. - David Sacks

Meta’s scorched earth approach to AI, Tesla’s future, TikTok bill, FTC bans noncompetes, wealth tax - All-In Podcast [Link]

Are LLMs Hitting A Wall, Microsoft & Alphabet Save The Market, TikTok Ban - Big Technology Podcast [Link]

Papers and Reports

ReALM: Reference Resolution As Language Modeling Link]

Apple proposed the ReALM model with 80M, 250M, 1B, and 3B parameters. It can be used on mobile devices and laptops. The task of ReALM is “Given relevant entities and a task the user wants to perform, we wish to extract the entity (or entities) that are pertinent to the current user query. “. The relevant entities can be on-screen entities, conversational entities, and background entities. The analysis shows that ReALM beats MARRs and has similar performance with GPT-4.

Bigger is not Always Better: Scaling Properties of Latent Diffusion Models [Link]

CodeGemma: Open Code Models Based on Gemma [Link]

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention [Link]

Google introduced the next generation transformer - infini-transformer. It’s able to take infinite length of input without the requirement of more memory or computation. Unlike vanilla attention mechanism in traditional transformer which reset their attention memory after each context window to manage new data, infini-attention retains a compressive memory and builds in both masked local attention and long-term linear attention mechanisms. The model compresses and reuses key-value states across all segments, allowing it to pull relevant information from any part of the document.

AI agents are starting to transcend their digital origins and enter the physical world through devices like smartphones, smart glasses, and robots. These technologies are typically used by individuals who are not AI experts. To effectively assist them, Embodied AI (EAI) agents must possess a natural language interface and a type of “common sense” rooted in human-like perception and understanding of the world.

OpenEQA: Embodies Question Answering in the Era of Foundation Models [Link] [Link]

The OpenEQA introduced by Meta is the first open vocab benchmark dataset for the formulation of Embodied Question Answering (EQA) task of understanding environment either by memory or by active exploration, well enough to answer questions in natural language. Meta also provided an automatic LLM-powered evaluation protocol to evaluate the performance of SOTA models like GPT-4V and see whether it’s close to human-level performance.

OpenEQA looks like the very first step towards “world model” and I’m excited that it’s coming. The dataset contains over 1600 high-quality human generated questions drawn from over 180 real-world environments. If the future AI agent can answer N questions over N real-world environments, where N is approximately infinity, we can call it God intelligence. But we are probably not able to achieve that “world model” at least with my limited imagination, because it requires un-infinite compute resources and there can be ethical issues. However, if we take one step back, instead of creating “world model”, a “human society model” or “transformation model”, etc, sounds more possible. Limiting question to a specific pain point problem and limiting environment according to it would both save resources and contribute AI’s value to human society.

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework [Link] [Link]

OpenELM is a small language model (SLM) tailored for on-device applications. The models range from 270M to 3B parameters, which are suitable for deployment on mobile devices and PCs. The key innovation is called “layer-wise scaling architecture”. It allocates fewer parameters to the initial transformer layers and gradually increases the number of parameters towards the final layers. This approach optimizes compute resources while remaining high accuracy. Inference of OpenELM can be run on Intel i9 workstation with RTX 4090 GPU and an M2 Max MacBook Pro.

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone [Link] [Link]

Microsoft launched Phi-3 family including mini (3.8B), small (7B), and medium (14B). These models are designed to run efficiently on both mobile devices and PCs. All models leverage a transformer decoder architecture. The performance is comparable to larger models such as Mixtral 8x7B and GPT3.5. It supports a default of 4K context length but is expandable to 128K through LongRope technology. The models are trained on web data and synthetic data, using two-phase approach which enhances both general knowledge and specialized skills (e.g. logical reasoning), and fine tuned in specific domains. Mini (3.8B) is especially optimized for mobile usage, requiring 1.8GB memory when compressed to 4-bits and processing 12+ tokens per second on mobile devices such as iPhone 14.

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time [Link] [Link]

2024 Generative AI Prediction Report from CB insights [Link]

Stable Diffusion 3 [Link]

Stable Diffusion 3 API now available as Stable Assistant effort looms [Link]

Stable Diffusion 3: Research Paper [Link]

Substack

You don’t get paid for working hard.

You get paid based on how hard you are to replace.

You get paid based on how much value you deliver.

Focus on being able to produce value and money will follow.

― Andrew Lokenauth

What he’s saying is so true - Don’t work so hard and end up losing yourself.

There is a popular saying on Wall Street. While IPO means Initial Public Offering, it also means “It’s Probably Overpriced” (coined by Ken Fisher).

I don’t invest in brand-new IPOs during the first six months. Why? Shares tend to underperform out of the gate for new public companies and often bottom around the tail end of the lock-up period, with anticipation of selling pressure from insiders. It’s also critical to gain insights from the first few quarters to form an opinion about the management team.

Do they forecast conservatively?

Do they consistently beat their guidance?

If not, it might be a sign that they are running out of steam and may have embellished their prospects in the S-1. But we need several quarters to understand the dynamic at play.

― Rubrik IPO: Key Takeaways - App Economy Insights [Link]

An analysis of Rubrik, a Microsoft-backed cybersecurity company going public. I’ve got some opinions from the author in terms of company performance and strategic investment.

Intel Unleashes Enterprise AI with Gaudi 3 - AI Supremacy [Link]

Intel is a huge beneficiary of Biden’s CHIPS Act. In late March 2024, Intel will receive up to $\$8.5$ billion in grants and $\$11$ billion in loans from the US government to produce cutting-edge semiconductors.

US Banks: Uncertain Year - App Economy Insights [Link]

Formula 1’s recent surge in popularity and revenue isn’t simply a product of fast cars and daring drivers. The Netflix docuseries Drive to Survive, which premiered in March 2019 and is already in its sixth season, has played a transformative role in igniting global interest and fueling unprecedented growth for the sport.

The docuseries effectively humanized the sport, attracting new fans drawn to the high-stakes competition, team rivalries, and compelling personal narratives.

― Formula 1 Economics - App Economy Insights [Link]

Netflix Engagement Machine - App Economy Insights [Link]

Recent business highlights: 1) focus on drama and storylines around sports, 2) subscribers can download exclusive games on the App Store for free, since Nov 2021, and Netflix is exploring game monetization through in-app purchases or ads, 3) for Premium Subscription Video on Demand, churn plummets YoY, 4) the $\$6.99$/month ad-supported plan was launched in Nov 2023, memberships grew 65% QoQ and monetization is still lagging, 5) started limiting password sharing within one household.

Boeing vs Airbus - App Economy Insights [Link]

Boeing 737 MAX’s two fatal crashes due to faulty software have eroded public trust. In addition to quality issues, Boeing is facing severe production delays. Airbus on the other hand has captured significant market share from Boeing. Airbus is heavily investing in technologies such as hydrogen-powered aircraft and sustainable aviation fuels. Airbus is also investing in the A321XLR and potential new widebody aircraft.

We disagree on what open-source AI should mean - Interconnects [Link]

This is a general trend we have observed a couple of years ago. We called is Mosaic’s Law where a model of a certain capability will require 1/4th the dollars every year from hw/sw/algo advances. This means something that is $\$100$m today -> $\$25$m next year -> $\$6$m in 2 yrs -> $\$1.5$m in 3 yrs. ― Naveen Rao on X [Link]

DBRX: The new best open model and Databricks’ ML strategy - Interconnects [Link]

In the test of refusals, it shows that the inference system seems to contain an added filtering in the loop to refuse illegal requests.

Llama 3: Scaling open LLMs to AGI - Interconnects [Link]

Tesla: Robotaxi Pivot - App Economy Insights [Link]

Q1 FY24 is bad and probably the worst. This means Tesla is going to get better in the rest of the year. It sounds that Elon is more clear and focused on his plan. And promises are met though there are some delays [Master Plan, Part Deux].

Recent business highlights: 1) cancelling Model 2 and focusing on Robotaxis and next-gen platform (Redwood), 2) laying off 10%+, 3) FSD price cuts and EV (Model 3 and Model Y) price cuts, 4) Recall Cybertruck due to safety issues, 5) North American Charging Standard (NACS) is increasingly adopted by major automakers, 6) reached ~1.2B miles driven by FSD beta, 7) energy storage deployment increased sequentially.

What is competitive in the market: 1) competitive pressure from BYD, 2) OpenAI’s Figure 01 robot and Boston Dynamics’s next-gen Atlas are competing with Optimus.

With its open-source AI model Llama, Meta learned that the company doesn’t have to have the best models — but they need a lot of them. The content creation potential benefits Meta’s platforms, even if the models aren’t exclusively theirs.

Like Google with Android, Meta aims to build a platform to avoid being at the mercy of Apple or Google’s ecosystems. It’s a defensive strategy to protect their advertising business. The shift to a new vision-based computing experience is an opportunity to do so.

Meta has a head start in the VR developer community compared to Apple. A more open app model could solidify this advantage.

By now, Meta has a clear playbook for new products:

Release an early version to a limited audience.

Gather feedback and start improving it.

Make it available to more people.

Scale and refine.

Monetize.

He also shared some interesting nuggets:

Roughly 30% of Facebook posts are AI-recommended.

Over 50% of Instagram content is AI-recommended.

― Meta: The Anti-Apple - App Economy Insights [Link]

Recent business highlights: 1) announced an open model for Horizon OS - which powers its VR headsets, 2) Meta AI is now powered by Llama 3, 3) whether not TikTok will still exist in US does not matter since the algorithm will not be sold, then it will benefit any competitor company such as Meta.

2024 March - What I Have Read

Posted on 2024-03-01

Books

Humans do inspiration; machines do validation.

Math is good at optimizing a known system; humans are good at finding a new one. Put another way, change favors local maxima; innovation favors global disruption.

― “Lean Analytics, Use Data to Build a Better Startup Faster”

Sometimes people who are doing hard data work may forget to step back and look at the big picture. This is a common mistake because we can definitely go from scientific data analysis to actionable insight for making better business decision. But we need to have some additional thoughts about whether the decision is a global optima or it’s just local due to the limited sample, restricted project goal, or restricted team scope.

Articles

When I decided to end my voluntary immersion in the driver community, I could not shake the feeling that the depersonalization of app workers is a feature, not a bug, of an economic model born of and emboldened by transformations that are underway across the global economy. This includes increasingly prevalent work arrangements characterized by weak employer-worker relations (independent contracting), strong reliance on technology (algorithmic management, platform-mediated communication), and social isolation (no coworkers and limited customer interactions).

As forces continue to erode traditional forms of identity support, meaningful selfdefinition at work will increasingly rely on how we collectively use and misuse innovative technologies and business models.
For example, how can companies deploy algorithmic management in a way that doesn’t threaten and depersonalize workers? How can focusing on the narratives that underlie and animate identities help workers reimagine what they really want and deserve out of a career coming out of the pandemic and the Great Resignation? Will increasingly immersive and realistic digital environments like the metaverse function as identity playgrounds for workers in the future? How will Web3 broadly, and the emergence of novel forms of organizing specifically (e.g., decentralized autonomous organizations or DAOs), affect the careers, connections, and causes that are so important to workers? What role can social media platforms, online discussion forums, and other types of virtual water coolers play in helping independent workers craft and sustain a desirable work identity? In short, how can we retain the human element in the face of increasingly shrewd resource management tactics?”

― “Dehumanization is a Feature of Gig Work, Not a Bug”, Harvard Business Review, The Year in Tech 2024 [Article]

This reminds me that last year when I was on vacation in LA, I’ve talked to a driver worked for both Lyft and Uber in LA. He complained Lyft’s route recommendation algorithm is shitty, not helpful at all, a waste of time, while Uber is better in comparison. At that time I realized how important it is to strengthen employer worker relations and gather feedback from workers or clients on the product improvement. This is a great article where the author raises his concerns of dehumanization of workers in the future. While technology is advancing and economy is transforming, we don’t expect people to forget who they are in their daily basis work.

Bringing a new technology to market presents a chicken-or-egg problem: The product needs a supply of complementary offerings, but the suppliers and complementors don’t exist yet, and no entrepreneur wants to throw their lot in with a technology that isn’t on the market yet.

There are two ways of “solving” this problem. First, you can time the market, and wait until the ecosystem matures— though you risk waiting a long time. Second, you can drive the market, or supply all the necessary inputs and complements yourself.

― “Does Elon Musk Have a Strategy?”, Harvard Business Review, The Year in Tech 2024 [Article]

Here are two examples. To drive the market, Musk supplies both electric vehicles and charging stations. Zuckerberg proposed the concept of metaverse and changed his company’s name.

This is where Musk’s Wall Street critics might say he’s weakest. Many of his businesses don’t articulate a clear logic, which is demonstrated by the unpredictable way these businesses ultimately reach solutions or products.

Musk has spelled out some of his prior logic in a set of “Master Plans,” but most of the logical basis for exactly how he will succeed remains ambiguous. But this isn’t necessarily Musk’s fault or due to any negligence per se: When pursuing new technologies, particularly ones that open up a new market, there is no one who can anticipate the full set of possibilities of what that technology will be able to do (and what it will not be able to do).

― “Does Elon Musk Have a Strategy?”, Harvard Business Review, The Year in Tech 2024 [Article]

Elon is interesting, but I have to say that we human need this type of person to leap to the future.

What could he possibly want with Twitter? The thing is, over the last decade, the technological landscape has changed, and how and when to moderate speech has become a critical problem-and an existential problem for social media companies. In other words, moderating speech has looked more and more like the kind of big, complex strategic problem that captures Musk’s interest.

― “Does Elon Musk Have a Strategy?”, Harvard Business Review, The Year in Tech 2024 [Article]

This is a great article which profiles Elon Musk specifically in his strategies and vision. It answered my question confusing me for two years: what was Musk thinking on buying Twitter? The answer is: Musk’s vision is not in pursuit of a specific type of solution but is in pursuit of a specific type of problem. If we go back to 2016, Igor as a CEO of Disney decided not to buy Twitter because he looked at Twitter as the solution: a global distribution platform, while concerned the quality of speech on it is a problem. Musk was looking for challenges and complexities while Igor was preventing them and looking for solutions.

YouTube

“One of the things that I think OpenAI is doing that is the most important of everything that we are doing is putting powerful technology in the hands of people for free as a public good. We don’t run ads on our free version. We don’t monetize it in other ways. I think that kind of ‘open’ is very important, and is a huge deal for how we fulfill the mission. “― Sam

“For active learning, the thing is it truly needs a problem. It needs a problem that requires it. It is very hard to do research about the capability of active learning if you don’t have a task. You will come up with an artificial task, get good results, but not really convince anyone. “, “Active learning will actually arrive with the problem that requires it to pop up.”― Ilya

“To build an AGI, I think it’s going to be Deep Learning plus some ideas, and self-play would be one of these ideas. Self-play has such properties that can surprise us in truly novel ways. Almost all self-play systems produce surprising behaviors that we didn’t expect. They are creating solutions to problems.”, “Not just random surprise but to find the surprising solution to a problem.”― Ilya

“Transferring from simulation to the real world is definitely possible and it’s been exhibited many times by many different groups. It’s been especially successful in vision. Also OpenAI in the summer has demonstrated robot hand which was trained entirely in simulation. “, “The policy that it learned in simulation was trained to be very adaptive. So adaptive that when you transfer if could very quickly adapt to the physical world.”― Ilya

“The real world that I would imagine is one where humanity are like the board members of a company where the AGI is the CEO. The picture I would imagine is you have some kind of different entities, countries or cities, and the people who live there vote for what the AGI that represents them should do. You could have multiple AGIs, you would have an AGI for a city, for a country, and it would be trying to in effects take the democratic process to the next level.” “(And the board can always fire the CEO), press the reset button, re-randomize the parameters.”― Ilya

“It’s definitely possible to build AI system which will want to be controlled by their humans.”, “It will be possible to program an AGI to design it in such a way that it will have a similar deep drive that it will be delighted to fulfill, and the drive will be to help humans flourish.”― Ilya

“I don’t know if most people are good. I think that when it really counts, people can be better than we think.”― Ilya

Sam Altman: OpenAI, GPT-5, Sora, Board Saga, Elon Musk, Ilya, Power & AGI | Lex Fridman Podcast [Sam]

Ilya Sutskever: Deep Learning | Lex Fridman Podcast [Ilya]

I watched Lex’s interview with Sam Altman uploaded on March 18, 2024, and an older interview with Ilya Sutskever happened 3 years ago. Elon’s lawsuit against OpenAI frustrated Sam but Sam is optimistic about the future and everything he is going to release in the next few months. Sam answered the questions “what does open mean in OpenAI” that ‘open’ mainly means putting powerful tech in the hands of people for free as a public good, but not necessarily mean open-source. He said there can be open-source models or closed-source models. About the transition between non-profit to capped for-profit, Sam said OpenAI is not setting a precedent for startup to mimic it but he suggested most startups should go for for-profit directly if they pursue profitability in the beginning.

Ilya’s interview is more interesting to me because he talked a lot about vision, tech, philosophy in Deep Learning. It’s impressive that he had such thoughts 3 years ago.

News

Speech is one kind of liability for companies using generative AI. The design of these systems can create other kinds of harms—by, say, introducing bias in hiring, giving bad advice, or simply making up information that might lead to financial damages for a person who trusts these systems.

Because AIs can be used in so many ways, in so many industries, it may take time to understand what their harms are in a variety of contexts, and how best to regulate them, says Schultz.

― The AI Industry Is Steaming Toward A Legal Iceberg [Link]

The Section 230 of Communications Decency Act of 1996 has protected internet platforms from being held liable for the things we say on them, but it doesn’t cover speech that a company’s AI generates. It’s likely that in the future companies use AI will be liable for whatever it does. It could be a driver of pushing companies to take effort to avoid problematic AI output, and reduce “hallucinations” (when GenAI makes stuff up).

A very obvious downside is people could use AI to do harmful things, but I think people are good to work together and prevent that from different angles such as legal aspect or open source software. I worry more about things that are potential or cannot be seen at least in these years when AI is still immature - which is being too early to rely on AI and deviating from truth. For example, it is too soon for people to lose faith in jobs like teachers, historians, journalists, writers, etc, but I’m concerning people are already losing faith in those jobs because of the development of AI and some violations of copyrighted work. As we have seen that AI could have wrong understanding of facts, have biased opinions, and make things up that don’t exist, we lives could fight for the truth but the dead cannot talk.

China’s latest EV is a ‘connected’ car from smart phone and electronics maker Xiaomi [Link]

Xiaomi started EV manufacturing since 2021 and launched its first EV “SU7” on March 28th 2024. It has the following reasons of success: 1) efficient technology manufacturing in a large scale. Though Xiaomi has no experience in auto field, it is a supply chain master, and has perfect partnership with various suppliers. 2) affordable price. SU7’s start price is 215900 yuan while Tesla’a model 3 is 245900 yuan. 3) customer experience oriented innovation. SU7 model can link to over 1000 Xiaomi devices as well as Apple’s devices. In addition, Xiaomi aims to connect its cars with its phones and home appliances in a “Human x Car x Home” ecosystem.

“The world is just now realizing how important high-speed inference is to generative Al,” said Madra. “At Groa, we’re giving developers the speed, low latency, and efficiency they need to deliver on the generative Al promise. I have been a big fan of Groq since I first met Jonathan in 2016 and I am thrilled to join him and the Groq team in their quest to bring the fastest inference engine to the world.”

“Separating GroqCloud and Groq Systems into two business units will enable Groq to continue to innovate at a rapid clip, accelerate inference, and lead the Al chip race, while the legacy providers and other big names in Al are still trying to build a chip that can compete with our LPU,” added Ross.

― Groq® Acquires Definitive Intelligence to Launch GroqCloud [Link]

Al chip startup Groq acquired Definitive Intelligence to launch GroqCloud business unit led by Definitive Intelligence’s CEO Sunny Madra. Groq is also forming a Groq Systems business unit by infusing engineering resources from Definitive Intelligence, which aims to greatly expanding its customer and developer ecosystem.

Groq’s founder Janathan Ross is the inventor of the Google Tensor Processing Unit (TPU), Google’s custom Al accelerator chip used to run models. Groq is creating a Language Processing Unit (LPU) inference engine, which is claimed to be able to run LLM at 10x speed. Now GroqCloud provides customers the Groq LPU inference engine via the self-serve playground.

The House voted to advance a bill that could get TikTok banned in the U.S. on Wednesday. In a 352-65 vote, representatives passed the bipartisan bill that would force ByteDance, the parent company of TikTok, to either sell the video-sharing platform or prohibit it from becoming available in the U.S.

― What to Know About the Bill That Could Get TikTok Banned in the U.S. [Link]

TikTok is considered as critical threats to US national security because it is owned by ByteDance and required to collaborate with the Chinese Communist Party (CCP). If the bill is passed then ByteDance has to either sell the platform within 180 days or face a ban. TikTok informed users that Congress is planning a total ban of TikTok and encouraged users to speak out against the ban. Shou Zi Chew said the ban would put more than 300000 American jobs at risk.

San Francisco-based Anthropic introduced three new AI models — Claude 3 Opus, Sonnet and Haiku. The literary names hint at the capabilities of each model, with Opus being the most powerful and Haiku the lightest and quickest. Opus and Sonnet are available to developers now, while Haiku will arrive in the coming weeks, the company said on Monday.

― AI Startup Anthropic Launches New Models for Chatbot Claude [Link]

Waymo’s progress in California comes after General Motors-owned Cruise and Apple bowed out of the autonomous vehicle business in California, while Elon Musk’s Tesla has yet to develop an autonomous vehicle that can safely operate without a human driver at the controls.

― Waymo approved by regulator to expand robotaxi service in Los Angeles, San Francisco Peninsula [Link]

Elon Musk requires “FSD” demo for every prospective Tesla buyer in North America [Link]

Full Self Driving era seems to start, but Tesla’s FSD system does not turn cars into autonomous vehicles, so drivers still need to be attentive to the road and ready to steer or brake at any time while using FSD or FSD Beta. Will FSD help with Tesla’s stock?

SpaceX Starship disintegrates after completing most of third test flight [Link]

SpaceX’s Starship rocket successfully completed a repeat of stage separation during initial ascent, open and close its payload door in orbit, the transfer of super-cooled rocket propellant from one tank to another during spaceflight. But it skipped Raptor engine re-ignition test, failed re-entry to the atmosphere, and flying the rocked back to Earth. Overall, completion of many of the objectives represented progress in the development of spacecraft for the business and SpaceX and NASA’s moon program.

Open Release of Grok-1 [Link]

Musk founded xAI in March 2023 aiming to “understand the true nature of the universe”. It released the weights and network architecture of 314B Grok-1 on March 17, 2024. It’s under the Apache 2.0 license meaning it allows for commercial use. The model can be found in Github.

GB200 has a somewhat more modest seven times the performance of an H100, and Nvidia says it offers four times the training speed.

Nvidia is counting on companies to buy large quantities of these GPUs, of course, and is packaging them in larger designs, like the GB200 NVL72, which plugs 36 CPUs and 72 GPUs into a single liquid-cooled rack for a total of 720 petaflops of AI training performance or 1,440 petaflops (aka 1.4 exaflops) of inference. It has nearly two miles of cables inside, with 5,000 individual cables.

And of course, Nvidia is happy to offer companies the rest of the solution, too. Here’s the DGX Superpod for DGX GB200, which combines eight systems in one for a total of 288 CPUs, 576 GPUs, 240TB of memory, and 11.5 exaflops of FP4 computing.

Nvidia says its systems can scale to tens of thousands of the GB200 superchips, connected together with 800Gbps networking with its new Quantum-X800 InfiniBand (for up to 144 connections) or Spectrum-X800 ethernet (for up to 64 connections).

― Nvidia reveals Blackwell B200 GPU, the ‘world’s most powerful chip’ for AI [Link] [keynote]

Two B200 GPUs combined with one Grace CPU is a GB200 Blackwell Superchip. Two GB200 superchip is one Blackwell compute node. 18 Blackwell compute notes contain 36 CPU + 72 GPUs, becoming one larger virtual GPU - GB200 NVL72.

Nvidia also offers packages for companies such as DGX Superpod for DGX GB200 which combines 8 such GB200 NVL72. 8 GB200 NVL72 combined with xx becomes one GB200 NVL72 compute rack. And the AI factory or full data center in the future would consists about 56 GB200 NVL72 compute racks, which is in total around 32000 GPUs.

The Blackwell superchip will be 4 times faster and 25 times energy efficient than H100.

OpenAI is expected to release a ‘materially better’ GPT-5 for its chatbot mid-year, sources say[Link]

On March 14 (local time), during a meeting with the Korean Silicon Valley correspondent group, CEO Altman mentioned, “I am not sure when GPT-5 will be released, but it will make significant progress as a model taking a leap forward in advanced reasoning capabilities. There are many questions about whether there are any limits to GPT, but I can confidently say ‘no’.” He expressed confidence that if sufficient computing resources are invested, building AGI that surpasses human capabilities is entirely feasible.

Other news:

Elon Musk sues OpenAI for abandoning its mission to benefit humanity [Link]

A major AT&T data leak posted to the dark web included passcodes, Social Security numbers [Link]

Apple accused of monopolizing smartphone markets in US antitrust lawsuit [Link]

Amazon Invests $2.75 Billion in AI Startup Anthropic [Link]

Adam Neumann looks to buy back WeWork for more than $500M: sources [Link]

NVIDIA Powers Japan’s ABCI-Q Supercomputer for Quantum Research [Link]

Lilac Joins Databricks to Simplify Unstructured Data Evaluation for Generative AI [Link]

Papers and Reports

Scaling Instructable Agents Across Many Simulated Worlds [Link]

Google DeepMind SIMA Team is working on the Scalable, Instructable, Multiworld Agent (SIMA) project. The goal is to develop an agent that follows instructions to complete tasks in any 3D environments. So far they are making progress on making AI agent understand the environment from computer screen, and use keyboard-and-mouse controls to interact with environment, follow language instructions, and play the video game to maximize the win-rate.

OpenAI has similar work called OpenAI Universe, which aims to train and validate AI agent on performing real world tasks. They started from video game environment as well. Although the goals of these two project sound similar, the minor difference is that OpenAI Universe intended to develop a platform where AI is able to interact with games, websites, and applications, while SIMA aims to develop an AI agent or maybe a robot to interact with the real world.

Announcing HyperGAI: a New Chapter in Multimodal Gen AI [Link]

Introducing HPT: A Groundbreaking Family of Leading Multimodal LLMs [Link]

The startup HyperGAI aims to develop models for multimodal understanding and multimodal generation. They released HPT air and HPT pro. HPT pro outperforms GPT-4V and Gemini Pro on the MMbench and SEED-Image benchmark.

Mora: Enabling Generalist Video Generation via A Multi-Agent Framework [Link]

Sora is the first video generation model, however it is not open-source. Lehigh University and Microsoft Research developed Mora to address the gap of no other video generation models to parallel with Sora in performance. Mora introduces an innovative multi-agent framework. As a result, Mora marks a considerable advancement in video generation from text prompts. The evaluation shows that Mora competes with Sora on most of the tasks, but not as refined as Sora in tasks such as changes in the video content, and video connectivity.

Substack

Streaming Giants Earnings - App Economy Insights [Link]

CrowdStrike has repeated in its investor presentations how it wants to be the leading ‘Security Cloud’ and emulate other category-defining cloud platforms:

Workday (HR Cloud).

Salesforce (CRM Cloud).

ServiceNow (Service Management Cloud).

Public cloud software companies are overwhelmingly unprofitable businesses. However, in FY24, Salesforce (CRM) demonstrated that margins can expand quickly once the focus turns to the bottom line (see visual). And when the entire business is driven by recurring subscription revenue and highly predictable unit economics, you are looking at a finely-tuned cash flow machine.

― CrowdStrike: AI-Powered Security - App Economy Insights [Link]

GRANOLAS: Europe’s Darlings - App Economy Insights [Link]

Oracle became TikTok’s cloud provider in 2020 for US users. With the risk of a TikTok ban in America, we’ll look at the potential revenue impact.

Catz believes this growth (of OCI revenue) is driven by:

Price Performance.

Full-stack technology for mission-critical workloads.

AI capabilities focused on business outcomes.

Deployment flexibility.

Multi-cloud offerings.

― Oracle: Cloud & AI Focus - App Economy Insights [Link]

Oracle services for enterprise software and cloud solutions: 1) cloud suite (cloud applications and services), 2) data analytics, 3) autonomous database, 4) enterprise resource planning (ERP) to improve operational efficiencies and integrated solutions to streamline complex business functions.

Key news highlights: 1) Oracle acquired Cerner in June 2022 which is a leading provider of electronic health records (EHR) and other healthcare IT solutions used by hospitals and health systems. Oracle is expanding cloud services including the upcoming launch of Ambulatory Clinic Cloud Application Suite for Cerner customers. 2) The adoption of Oracle Cloud Infrastructure (OCI) are across different segments: cloud natives customers such as Zoom, Uber, ByteDance looking for high price performance and integrated security and privacy, AI/ML customers looking for key differentiation, compute performance, and networking design, generative AI customers looking for control, data security, privacy, and governance. 3) TikTok is probably an essential component of the growth of OCI Gen2 infrastructure cloud services. 4) Oracle signed big Generation 2 Cloud infrastructure contract with Nvidia. 5) Oracle is a critical customer in Sovereign AI. It’s starting to win business per country for sovereign cloud, especially the cloud companies in Japan.

NVIDIA ‘AI Woodstock’ - App Economy Insights [Link]

We spent $700,000 on [our five-second Super Bowl ad] in total and yet earned over 60 million social media impressions.” [link]

― Duolingo: Gamified Learning - App Economy Insights [Link]

Duolingo launched the Duolingo Max subscription tier ($168/year), with Gen AI features enabling a more conversational and listening approach. Duolingo has leveraged AI in two areas: 1) using AI to create content, which allows it to experiment faster, 2) using AI to power spoken conversation with characters.

What is coming: Duolingo launched Math and Music courses into its app in 2023.

Articles

OpenAI and Elon Musk [Link]

Read arguments between OpenAI and Elon. Learned that Elon once believed there is 0% probability for OpenAI to succeed and wanted OpenAI to become for-profit so it can be merged to Tesla and being controlled by Elon himself.

2024 February - What I Have Read

Posted on 2024-02-01

Podcasts

We know from our past experiences that big things start small. The biggest oak starts from an acorn. If you want to do anything new, you’ve got to be willing to let that acorn grow into a little sapling and then into a small tree and maybe one day it will be a big business on its own.

He was a free thinker whose ideas would often run against the conventional wisdom of any community in which he operated.

I’ve always actually found something to be very true, which is most people don’t get those experiences because they never ask. I have never found anybody who didn’t want to help me when I’ve asked them for help. I have never found anyone who said no or hung up the phone when I called. I just asked. And when people ask me, I try to be as responsive, to pay back that debt of gratitude. Most people never pick up the phone and call. Most people never ask. That is what separates the people that do things from the people that just dream about them. You’ve got to act and you’ve got to be willing to fail. You’ve got to be ready to crash and burn with people on the phone, with starting a company, with whatever. If you’re afraid of failing, you won’t get very far.

His company and its computer into something aspirational. He links this machine made a few months earlier, a few months ago by some disheveled California misfits to Rolls Royce, the 73 year old paragon of sophisticated industrial manufacturing and elite consumer taste. He even calls Apple a world leader, an absolutely unprovable claim that rockets the little company into the same league as IBM, which was then the industry’s giant. He was an extraordinary speaker and he wielded that tool to great effect.

People that are learning machines and they refuse to quit are incredibly hard to beat.

When you have something that’s working, you do not talk about it. You shut up because the more you talk about it, the more broadcasting you do about it, the more it encourages competition.

The only purpose for me in building a company is so that the company can make products. One is a means to the other. Over a period of time, you realize that building a very strong company and a very strong foundation of talent and culture in a company is essential to keep making great products. The company’s one of the most amazing inventions of humans, this abstract construct that’s incredibly powerful. Even so, for me, it’s about the products. It’s about working together with really fun, smart, creative people and making wonderful things. It is not about the money. What a company is, then, is a group of people who can make more than just the next big thing. It is a talent. It is a capability. It is a culture. It is a point of view. And it is a way of working together to make the next thing and the next one and the next one.

In that case, Steve would check it out, and the information he’d glean would go into the learning machine that was his brain. Sometimes, that’s where it would sit and nothing would happen. Sometimes, on the other hand, he’d concoct a way to combine it with something else that he’d seen or perhaps to twist it in a way to benefit an entirely different project altogether. This was one of his great talents, the ability to synthesize separate developments and technologies into something previously unimaginable.

I felt I had let the previous generations of entrepreneurs down, that I had dropped the baton as it was being passed to me. I met with David Packard and Bob Noyce, and tried to apologize for screwing up so badly. I was a very public failure and even thought about running away from the Valley. But something slowly began to dawn on me. I still love what I did. The turn of events at Apple had not changed that one bit. I had been rejected, but I was still in love, and so I decided to start over.

― Founders #265 Becoming Steve Jobs: The Evolution of a Reckless Upstart into a Visionary Leader [Link]

It helps if you can be satisfied with an inner scorecard, I would also say it’s probably the only – the single only way to have a happy life.

I wanted money. It could make me independent then I could do with what I wanted to do with my life. And the biggest thing I wanted to do was work for myself. I didn’t want other people directing me. The idea of doing what I wanted to do every day was very important to me.

I like to work by myself where I could spend my time thinking about things I wanted to think about. Washington was upsetting at first, but I was in my own world all the time. I could be sitting in a room thinking or could be writing around flinging things and thinking.

Walt Disney seldom dabbled. Everyone who knew him remarked on his intensity. When something intrigued him, he focused himself entirely on it as if it were the only thing that mattered.

Intensity is the price of excellence.

People ask me where they should go to work, and I always tell them to go work for whom they most admire.

That’s like saving sex for your old age, do what you love and work for whom you admire the most, and you’ve given yourself the best chance in life you can.

You’ll get very rich if you thought of yourself as having a card with only 20 punches in a lifetime, and every financial decision used up one punch. You will resist the temptation to dabble. You make more good decisions, and you would make more big decisions.

Instead, he said, basically, when you get to my age, you’ll really measure your success in life by how many of the people you want to have love you actually do love you. I know people who have a lot of money, and they get testimonial dinners and they get hospital wings named after them. But the truth is that nobody in the world loves them. If you get to my age and life and nobody thinks well of you, I don’t care how big your bank account is.

Your life is a disaster. That’s the ultimate test of how you’ve lived your life. The trouble with love is you can’t buy it. You can buy sex. You can buy testimonial dinners. You can buy pamphlets that say how wonderful you are, but the only way to get love is to be lovable. It is very irritating if you have a lot of money. You’d like to think you could write a check. I’ll buy $1 million worth of love, please, but it doesn’t work that way.

The more you give love away, the more you get.

― Founders #100 Warren Buffett [The Snowball] [Link]

The biggest threat to dynastic family continuity was enrichment and success.

Almost all of the dynasties started as outsiders.

Those on the margins often come to control the center.

Great industrial leaders are always fanatically committed to their jobs. They are not lazy or amateurs.

A man always has two reasons for the things he does, a good one and the real one.

Do it yourself, insist on quality, make something that will benefit society, and pick a mission that is bigger than yourself.

It is impossible to create an innovative product, unless you do it yourself, pay attention to every detail, and then to test it exhaustively. Never entrust your creation of a product to others, for that will inevitably lead to failure and cause you deep regret.

― Founders #307 The World’s Great Family Dynasties: Rockefeller, Rothschild, Morgan, & Toyada [Link]

Amazon’s single-threaded leadership: “The basic premise is that for each project, there is a single leader whose focus is that project and that project alone. And that leader oversees teams of people whose attention is similarly focused on that one project.”

Similar idea in Peter Thiel’s book Zero to One: “The best thing I did as a manager at PayPal was to make every person in the company responsible for doing just one thing. Every employee’s one thing was unique, and everyone knew I would evaluate him only on that one thing. I had started doing this just to simplify the task of managing people, but then I noticed a deeper result. Defining roles reduced conflict.”

“When your dependencies keep growing, it’s only natural to try speeding things up by improving your communication. We finally realize that all of this cross-team communication didn’t really need refinement at all. It needed to be eliminated. It wasn’t just that we had the wrong solution in mind. Rather, we’ve been trying to solve the wrong problem altogether.”

Jeff’s vision was that we needed to focus on loosely coupled interaction via machines through well-defined APIs rather than via humans through e-mails and meetings. This would free each team to act autonomously and move faster.

From his 2016 shareholder letter, Jeff suggested that most decisions should probably be made with somewhere around 70% of the information you wish you had. If you wait for 90%, in most cases, you’re probably being slow. Plus, either way, you need to be good at quickly recognizing and correcting bad decisions. If you’re good at course correcting, being wrong, may be less costly than you think, whereas being slow is going to be expensive for sure.

“The best way to fail and inventing something is by making it somebody’s part-time job. And so the problem that they were trying to solve and the vision they had was how to move faster and remove dependencies, but what they also realized once this was in place, that ownership and accountability are much easier to establish under the single-threaded leader model.”

“Most large organizations embrace the idea of invention but are not willing to suffer the string of failed experiments necessary to get there.” “Long-term thinking levers are existing abilities and lets us do new things we couldn’t otherwise contemplate. Long-term orientation interacts well with customer obsession. If we can identify a customer need and if we can further develop conviction that the need is meaningful and durable, our approach permits us to work patiently for multiple years to deliver a solution.”

Invention works well where differentiation matters. Differentiation with customers is often one of the key reasons to invent.

Working backwards exposes skill sets that your company needs but does not have yet. So the longer that your company works backwards, the more skills it develops and the more skills it develops, the more valuable it becomes over time.

Founders force the issue. Not outsourcing means it’s going to be more expensive, going to spend a lot of or money. It’s going to take longer to get a product out there. But at the end of that, if we are successful, we have a set of skills that we lacked beforehand, then we can go out and do this over and over again.

― Founders #321 Working with Jeff Bezos [Link]

“My passion has been to build an enduring company where people were motivated to make great products. Everything else was secondary. Sure, it was great to make a profit because that’s what allowed you to make great products. But the products, not the profits, were the motivation. Sculley flipped these priorities to where the goal was to make money.”

“It’s a subtle difference, but it ends up meaning everything, the people you hire, who gets promoted, what you discuss in meetings. Some people say, give the customer what they want, but that’s not my approach. Our job is to figure out what they’re going to want before they do. I think Henry Ford once said, ‘If I asked customers what they wanted, they would have told me, a faster horse.’ People don’t know what they want until you show it to them. That’s why I never rely on market research. Our task is to read things that are not yet on the page. Edwin Land of Polaroid talked about the intersection of the humanities and science. I like that intersection. There’s something magical about that place.”

“There are a lot of people innovating, and that’s not the main distinction of my career. The reason Apple resonates with people is that there’s a deep current of humanity in our innovation. I think great artists and great engineers are similar in that they both have a desire to express themselves. In fact, some of the best people working on the original Mac were poets and musicians on the side.”

“In the ‘70s, computers became a way for people to express their creativity. Great artists like Leonardo da Vinci and Michelangelo were also great at science. Michelangelo knew a lot about how to quarry stone, not just how to be a sculptor. At different times in the past, there were companies that exemplified Silicon Valley. It was Hewlett-Packard for a long time. Then in the semiconductor era, it was Fairchild and Intel. I think that it was Apple for a while, and then that faded. And then today, I think it’s Apple and Google and a little more so Apple. I think Apple has stood the test of time. It’s been around for a while, but it’s still at the cutting edge of what’s going on.”

“It’s easy to throw stones at Microsoft, and yet I appreciate what they did and how hard it was. They were very good at the business side of things. They were never as ambitious product-wise as they should have been. Bill likes to portray himself as a man of the product, but he’s really not. He’s a businessperson. Winning business was more important than making great products. He ended up the wealthiest guy around. And if that was his goal, then he achieved it. But it’s never been my goal. And I wonder in the end if it was his goal.”

“I admire him for the company he built. It’s impressive, and I enjoyed working with him. He’s bright and actually has a good sense of humor. But Microsoft never had the humanities and liberal arts in its DNA. Even when they saw the Mac, they couldn’t copy it well. They totally didn’t get it. I have my own theory about why decline happens at companies. The company does a great job, innovates and becomes a monopoly or close to it in some field. And then the quality of the product becomes less important. The company starts valuing great salesmen because they’re the ones who can move the needle on revenues, not the product engineers and designers.”

“So the salespeople end up running the company. When the sales guys run the company, the product guys don’t matter so much, and a lot of them just turn off. It happened at Apple when Sculley came in, which was my fault. Apple was lucky, and it rebounded. I hate it when people call themselves entrepreneurs when what they’re really trying to do is launch a startup and then sell or go public so they can cash in and move on. They’re unwilling to do the work it takes to build a real company, which is the hardest work in business. That is how you really make a contribution and add to the legacy of those who went before.”

“You build a company that will stand for something a generation or two from now. That’s what Walt Disney did and Hewlett and Packard and the people who built Intel. They created a company to last, not just to make money. That’s what I want Apple to be. I don’t think I run roughshod over people. But if something sucks, I tell people to their face. It is my job to be honest. I know what I’m talking about, and I usually turn out to be right. That’s the culture I try to create. We are brutally honest with each other, and anyone can tell me they think I’m full of s***, and I can tell them the same.”

“And we’ve had some rip-roaring arguments where we were yelling at each other and it’s some of the best times I’ve ever had. I feel totally comfortable saying, ‘Ron, that story looks like s,’ in front of everyone else. Or I might say, ‘God, we really fed up the engineering on this,’ in front of the person that’s responsible. That’s the ante for being in the room. You’ve got to be able to be super honest. Maybe there’s a better way, a gentlemen’s club, where we all wear ties and speak in soft language and velvet code words. But I don’t know that way because I’m middle class from California.”

“I was hard on people sometimes, probably harder than I needed to be. I remember the time when my son was six years old, coming home, I had just fired somebody that day. And I imagined what it was like for that person to tell his family and his young son that he had lost his job. It was hard, but somebody has got to do it. I figured that it was always my job to make sure that the team was excellent. And if I didn’t do it, nobody was going to do it. You always have to keep pushing to innovate.”

“Bob Dylan could have sung protest songs forever and probably made a lot of money, but he didn’t. He had to move on. And when he did, by going electric in 1965, he alienated a lot of people. His 1966 Europe tour was his greatest. He would come on and do a set of acoustic guitars and the audience loved him. Then he would do an electric set and the audience booed. There was one point where he was about to sing Like a Rolling Stone and someone from the audience yells, “Judas,” and Dylan says, ‘Play it f***ing loud,’ and they did. The Beatles were the same way. They kept evolving, moving, refining their art. That is what I’ve always tried to do. Keep moving. Otherwise, as Dylan says, ‘If you’re not busy being born, you’re busy dying.’”

“What drove me? I think most creative people want to express appreciation for being able to take advantage of the work that’s been done by others before us. I didn’t invent the language or mathematics I use. I make little of my own food, none of my own clothes. Everything I do depends on other members of our species and the shoulders that we stand on. And a lot of us want to contribute something back to our species and to add something to that flow. It’s about trying to express something in the only way that most of us know how. We try to use the talents we do have to express our deep feelings, to show our appreciation of all the contributions that came before us, and to add something to that flow. That is what has driven me.”

“He was not a model boss or human being, tightly packaged for emulation. Driven by demons, he would drive those around him to fury and despair. But his personality and passions and products were all interrelated. His tale is thus both instructive and cautionary, filled with lessons about innovation, character, leadership, and values.”

“I don’t focus too much on being pragmatic. Logical thinking has its place but really go on intuition and emotion. I began to realize that an intuitive understanding and consciousness was far more significant than abstract thinking and intellectual logical analysis.”

“Whatever he was interested in, he would generally carry to an irrational extreme.”

Charlie Munger says, “In business, we often find that the winning system goes almost ridiculously far in maximizing or minimizing one or a few variables.”

“He made me do something I didn’t think I could do. It was the brighter side of what would become known as his reality distortion field. If you trust him, you can do things,” Holmes said. “If he decided that something should happen, then he’s just going to make it happen.”

“I taught him that if you act like you can do something, then it will work. I told him, pretend to be completely in control and people will assume that you are.”

“Jobs had a bravado that helped him get things done, occasionally by manipulating people. He could be charismatic, even mesmerizing, but also cold and brutal. Jobs was awed by Wozniak’s engineering wizardry and Wozniak was awed by Jobs’ business strive. I never wanted to deal with people and step on toes. But Steve could call up people he didn’t know and make them do things.”

“In order to do a good job of those things that we decide to do, we must eliminate all the unimportant opportunities.”

“The world is a very malleable place. If you know what you want and you go forward with maximum energy and drive and passion, the world will often reconfigure itself around you much more quickly and easily than you would think.”

The reality distortion field was a confounding combination of a charismatic rhetorical style, indomitable will and an eagerness to bend any fact to fit the purpose at hand.

“Jobs is a strong world elitist artist, who doesn’t want his creations mutated inauspiciously by unworthy programmers. It would be as if someone off the street added some brush strokes to a Picasso painting or changed the lyrics to a Bob Dylan song.”

“If you want to live your life in a creative way, you have to not look back too much. You have to be willing to take whatever you’ve done and whoever you were and throw them away. The more the outside world tries to reinforce an image of you, the harder it is to continue to be an artist, which is why a lot of times artists have to say, ‘Bye, I have to go. I’m going crazy, and I’m getting out of here.’ And then they go hybrid somewhere. Maybe later, they reemerge a little differently.”

― Founders #214 Steve Jobs: The Exclusive Biography [Link]

Articles

The qubit in superposition has some probability of being 1 or 0, but it represents neither state, just like our quarter flipping into the air is neither heads nor tails, but some probability of both. A quantum computer can use a collection of qubits in superpositions to play with different possible paths through a calculation. If done correctly, the pointers to incorrect paths cancel out, leaving the correct answer when the qubits are read out as Os and 1s.

Grover’s algorithm, a famous quantum search algorithm, could find you in a phone book of 100 million names with just 10,000 operations. If a classical search algorithm just spooled through all the listings to find you, it would require 50 million operations, on average.

Qubits have to be carefully shielded, and operated at very cold temperatures-sometimes only fractions of a degree above absolute zero. A major area of research involves developing algorithms for a quantum computer to correct its own errors, caused by glitching qubits.

Some researchers, most notably at Microsoft, hope to sidestep this challenge by developing a type of qubit out of clusters of electrons known as a topological qubit. Physicists predict topological qubits to be more robust to environmental noise and thus less error-prone, but so far they’ve struggled to make even one.

Teams in both the public and private sector are betting so, as Google, IBM, Intel, and Microsoft have all expanded their teams working on the technology, with a growing swarm of startups such as Xanadu and QuEra in hot pursuit. The US, China, and the European Union each have new programs measured in the billions of dollars to stimulate quantum R&D. Some startups, such as Rigetti and lonQ, have even begun trading publicly on the stock market by merging with a so-called special-purpose acquisition company, or SPAC-a trick to quickly gain access to cash.

Chemistry simulations may be the first practical use for these prototype machines, as researchers are figuring out how to make their qubits interact like electrons in a molecule. Daimler and Volkswagen have both started investigating quantum computing as a way to improve battery chemistry for electric vehicles. Microsoft says other uses could include designing new catalysts to make industrial processes less energy intensive, or even pulling carbon dioxide out of the atmosphere to mitigate climate change. Tech companies like Google are also betting that quantum computers can make artificial intelligence more powerful.

Big Tech companies argue that programmers need to get ready now. Google, IBM, and Microsoft have all released open source tools to help coders familiarize themselves with writing programs for quantum hardware. IBM offers online access to some of its quantum processors, so anyone can experiment with them. Launched in 2019, Amazon Web Services offers a service that connects users to startup-built quantum computers made of various qubit types over the cloud. In 2020, the US government launched an initiative to develop a K-12 curriculum relating to quantum computing. That same year, the University of New South Wales in Australia offered the world’s first bachelor’s degree in quantum engineering.

― Wired Guide to Quantum Computing [Link]

This article is pretty comprehensive in describing quantum computing mechanism and techniques. One interesting fact is that quantum computers are on the verge of breaking into bank accounts and breaking encryption and cryptography. Shor’s algorithm has been proven mathematically that if you had a large enough quantum computer, you could find the prime factor of large numbers - the basis of RSA encryption, the most commonly used thing on the internet. Although we are far away from being able to have a quantum computer big enough to execute Shor’s algorithm on that scale, cryptography research has already been preparing for quantum computers’ code-breaking capabilities.

News

Neuralink’s brain-computer interface, or BCI, would allow people to control a computer or mobile device wirelessly “just by thinking about it,” according to the company’s website.

The goal of the new technology is to allow paralyzed people the ability to control a computer cursor or keyboard using just their thoughts.

Beyond helping paralyzed patients regain some mobility and communicate without typing, Neuralink’s longer-term goals include helping restore full mobility and sight.

― First human to receive Neuralink brain implant is ‘recovering well,’ Elon Musk says [Link]

Biderman notes that the leak is likely harmful in terms of reducing trust between companies like Meta and the academics they share their research with. “If we don’t respect people’s good faith attempts to disseminate technology in ways that are consistent with their legal and ethical obligations, that’s only going to create a more adversarial relationship between the public and researchers and make it harder for people to release things,” she notes.

― Meta’s powerful AI language model has leaked online — what happens now? [Link]

Meta is taking the lead of open-source LLM by releasing the AI language model LLaMA. Some say open source is necessary to ensure AI safety and faster LLM progress. Others argue that there will be more personalized spam and phishing due to the fact of the model has already leaked on 4chan, and a wave of malicious use of AI. There are pros and cons of open sourcing LLM, just like last year OpenAI open sourced Stable Diffusion which has a lot of bad potential influences. But while every is making AI models private, there has to be someone who makes it public, then everyone goes public. The good and necessary thing is that open source software can help decentralize AI power.

The OpenAI chief executive officer is in talks with investors including the United Arab Emirates government to raise funds for a wildly ambitious tech initiative that would boost the world’s chip-building capacity, expand its ability to power AI, among other things, and cost several trillion dollars, according to people familiar with the matter. The project could require raising as much as $5 trillion to $7 trillion, one of the people said.

― Sam Altman Seeks Trillions of Dollars to Reshape Business of Chips and AI [Link]

“Sora has a deep understanding of language, enabling it to accurately interpret prompts and generate compelling characters that express vibrant emotions,” OpenAI writes in a blog post. “The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

“[Sora] may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark. The model may also confuse spatial details of a prompt, for example, mixing up left and right, and may struggle with precise descriptions of events that take place over time, like following a specific camera trajectory.”

― OpenAI’s newest model Sora can generate videos — and they look decent [Link]

The predictor in this Joint Embedding Predictive Architecture serves as an early physical world model: You don’t have to see everything that’s happening in the frame, and it can tell you conceptually what’s happening there.

― V-JEPA: The next step toward Yann LeCun’s vision of advanced machine intelligence (AMI) Link]

OpenAI released amazing technology again! Compared to other release language models, Sora seems to start to have the capability of understanding physical world, but OpenAI acknowledged that that might not be true. In the meantime, Meta developed V-JEPA, which is not focusing on linking language to videos, but learning the cause and effect from videos and gaining the capability of understand and reason the object-object interactions in the physical world.

Our next-generation model: Gemini 1.5 [Link]

Google’s Gemini 1.5 Pro employs a Mixture-of-Experts (MoE) architecture which helps the model to process large datasets by activating relevant neural network segments. It’s capable of managing up to 1M tokens - equivalent to 700000 words, one hour of video, or 11 hours of audio. What’s exciting is that it leverages a transformer-based architecture with a specifically designed long context window, which allows it to remember and process vast amounts of information. It’s able to achieve tasks like summarizing lectures from lengthy videos. It’s really able to retrieve ‘needles’ from a ‘haystack’ of millions of tokens across different structures of data sources with accuracy of 99%.

Other news:

Nvidia Is Now More Valuable Than Amazon And Google [Link]

Nvidia Hits $2 Trillion Valuation on Insatiable AI Chip Demand [Link]

Elon Musk Says Neuralink’s First Brain Chip Patient Can Control Computer Mouse By Thought [Link]

Capital One to Acquire Discover, Creating a Consumer Lending Colossus [Link]

White House touts $11 billion US semiconductor R&D program [Link]

Meta to deploy custom-designed Artemis AI processor alongside commercial GPUs [Link]

Substack

Alphabet Cloud Rebound - App Economy Insights [Link]

Google Cloud (GCP and Workspace) revenue growth reaccelerated by 4 percentage points, while AWS and Azure show softer momentum. Key business highlights: 1) Gemini in search for faster Search Generative Experience (SGE), 2) Conversational AI tool Bard now powered by Gemini Pro and will be powered by Gemini Ultra, 3) YouTube now has over 100M subscribers across Music and Premium, 4) Cloud driven by AI - Vertex AI platform and Duet AI agents, leads to expand relationships with many leading brands (e.g. Hugging Face, McDonald’s, Motorola Mobility, Verizon. ), 5) Waymo reached over 1M fully autonomous ride-hailing trips, 6) Isomorphic Labs partnered with Eli Lilly and Novartis to apply AI to treat diseases.

AI specific business highlights: 1) Google is transforming searching behavior of customers: Search Generative Experience (SGE) is introducing a dynamic AI enhanced search experience, 2) Gemini includes Gemini Nano, Gemini Pro, and Gemini Ultra. Gemini Nano is optimized for on-device tasks and already available on Pixel 8 phone. Gemini Pro is currently in early preview through Cloud and specific apps. Gemini Ultra will be released later in 2024, 3) the conversational AI - Bard - might be exclusive to Tensor-powered Pixel phones and will be accessible through voice commands or double-tapping device side buttons. Bard will also be integrated with apps (e.g. Gmail, Maps, Drive) and Camera on Android phones.

Amazon: Ads Take the Cake - App Economy Insights [Link]

Key updates on Amazon business: 1) Infrastructure: Amazon has developed customized ML chips e.g. Trainium for training and Inferentia for inference. Additionally, it offers Graviton for generalized CPU chips, and launched Trainium2 with four times training performance. 2) Model: Bedrock is the LLM as a Service, allowing customers to run foundational models, customize them and create agents for automated tasks and workflows. 3) Apps: Amazon Q is a workplace-focused generative AI chatbot. It’s designed for business to assist with summarizing docs and answering internal questions. It’s built with high security and privacy, and integrated with Slack, Gmail, etc.

What else to watch: 1) Cloud: Gen AI benefits Amazon (AWS) as well as existing market leaders in cloud infrastructure. 2) Project Kuiper is an initiative to increase global broadband access through a constellation of 3,236 satellites in low Earth orbit (LEO). Amazon is on track of launching it in the first half of 2024 and will start beta testing in the second half of the year. 3) Prime Video (with ads) remains a large and profitable business. 4) Investment in live sports as a critical customer acquisition strategy.

What is coming: 1) Rufus - a Gen AI-powered shopping assistant with conversational AI capabilities 2) amazon’s advertising revenue is catching up Meta and Google, with Prime Video a probable accelerator.

Meta: The Zuck ‘Playbook’ - App Economy Insights [Link]

The Zuck Playbook: 1) Massive compute investment, 2) open-source strategy, 3) future-focused research, 4) data and feedback utilization, 5) experimentation culture, 6) growth before monetization.

Meta’s business segments: 1) Family of Apps (Facebook, Instagram, Messenger, and WhatsApp), 2) Reality Labs (virtual reality hardware and supporting software).

Key business highlights: 1) 1B+ revenue in Q4 2023 for the first time with Quest, and Quest 3 is off to a strong start, 2) established the right feedback loops with Stories and Reels to test new features and products, 3) Ray-Ban Meta smart glasses is off to a strong start. 4) Reels and Threads are growing. 5) Llama 3 and AGI. Zuck is aiming to position Meta as a leader in the field of AI without necessarily monopolizing control over it.

Microsoft: AI at Scale - App Economy Insights [Link]

Key business highlights: 1) AI’s impact on Azure’s growth: most essential revenue growth drivers are Azure OpenAI and OpenAI APIs, 2) small language models: Orca 2 leverages Meta’s Llama 2 base models, fine tuned with synthetic data, and Phi 2 is a transformer-based SLM designed for cloud and edge deployment, 3) new custom AI chips: Microsoft’s first custom chips - Maia and Cobalt. Maia 100 GPU is tailored for AI workloads, Cobalt 100 powers general cloud services, 4) rebranding Bing Chat as Copilot. 5) introducing a new key (Copilot key) on keyboard to Windows 11 PCs.

What’s else in Microsoft’s portfolio: 1) Azure AI services gain more new customers, 2) Github Copilot revenue accelerated, 3) Microsoft 365 Copilot show faster customer adoption, 4) LinkedIn, 5) Search - not gaining market share in Search, 6) Gaming: with acquisition of Activision Blizzard, hundreds of millions of gamers are added in to the ecosystem. Innovation of cloud gaming improves player experience. 7) Paid Office 365 commercial seats.

The Digital Markets Act (DMA) is a European Union regulation designed to promote fair competition and innovation in the digital sector by preventing large tech companies from monopolizing the market. It aims to ensure consumers have more choices and access to diverse digital services by regulating the practices of platforms acting as digital “gatekeepers.”

― Apple App Store Shenanigans - App Economy Insights [Link]

Recent news highlights: 1) as DMA compliance, Apple will allow for third-party stores and payment systems to App Store in Europe, so the developers can avoid 30% fee from Apple, 2) EU fines Apple €500M for unfairly competing with Spotify by restricting it from linking out to its own website for subscriptions. These anticompetitive practices by favoring its services over rivals have a bad impact on Apple’s reputation. 3) revenue from China is continuously declining.

What is coming: 1) Vision Pro has 600+ native apps and games, and is supported by mixed streaming (Disney+, Prime Video). But Netflix and YouTube have held back. TikTok launched a native Vision Pro App tailored for an immersive viewing experience. 2) AI.

In a context where the crux of the thesis is the durability of the demand for NVIDIA’s AI solutions, inference will likely become more crucial to future-proof the business.

Jensen Huang previously described a ‘generative AI wave’ from one category to the next:

→ Startups and CSPs.

→ Consumer Internet.

→ Software platforms.

→ Enterprise and government.

Huang continues to see three massive tailwinds:

Transition from general-purpose to accelerated computing

Generative AI.

A whole new industry (think ChatGPT, Midjourney, or Gemini).

History tells us that highly profitable industries tend to attract more competition, leading to mean reversion for the best performers.

― Nvidia at Tipping Point - App Economy Insights [Link]

AI system operate through two core stages: training and inference. Nvidia dominates the Training segment with its robust GPU but faces stiffer competition in the Inference segment with Intel, Qualcomm, etc.

Nvidia’s equity portfolio: Arm Holdings (ARM), Recursion Pharmaceuticals (RXRX), SoundHound AI (SOUN), TuSimple (TSPH), Nano-X Imaging (NNOX), showing Nvidia’s expansive approach to AI.

Microsoft developed custom AI chips Maia and Cobalt to lessen reliance on Nvidia and benefit OpenAI. This shows a desire for self-reliance across Nvidia’s largest customers, which could challenge the company’s dominance in AI accelerators.

Key business highlights: 1) Nvidia has three major customer categories: Cloud Service Providers (CSPs) for all hyperscalers (Amazon, Microsoft, Google), consumer internet companies such as Meta who invested in 350000 H100s from Nvidia, and enterprise such as Adobe, Databricks, and Snowflake who are adding AI copilots to their platforms. 2) Sovereign AI.

Reddit IPO: Key Takeaways - App Economy Insights [Link]

YouTube

Apple Vision Pro Vs. Meta Quest: The Ultimate Showdown [Link]

Apple Vision Pro review: magic, until it’s not [Link] [Link]

Apple Vision Pro launched on Feb 2, 2024 at $3,499. It’s interesting that Meta and Apple are starting on opposite ends of the spectrum. Meta quest has the right price and will try to improve the technology overtime. Apple Vision Pro has the right technology and will try to lower the price overtime. With a price of 3499, Apple is targeting the high end of the market, not aiming for a mass market product in the first iteration. Instead their sights are set on the early adaptors. It’s a pattern that most Apple products take several years to achieve the mass production. The first iteration of iPhone in 2007 was a soft launch. iPhone didn’t crack 10M units per quarter until the iPhone 4 in late 2010. Now Apple sells about 200M iPhones every year. So it’s highly possible that mass adoption of technologically improved mixed-reality headsets with more affordable pricing is coming in a decade.

Microsoft Game Day Commercial | Copilot: Your everyday AI companion [Link]

Microsoft’s first Super Bowl Commercial to highlight its transformation into an AI-centric company, with a focus on Copilot’s ability of simplifying coding and digital art creation, etc.

2024 January - What I Have Read

Posted on 2024-01-01

Podcast

Never ever think about something else when you should be thinking about the power of incentives.

Fanaticism and scale combined can be very powerful.

Invert always invert, you could innovate by doing the exact opposite of your competitors.

Once you get on the ball, stay on the ball. And once you start down, it is mighty hard to turn around.

Success in my mind comes from having a successful business, one that is a good place to work, one that offers opportunity for people, and one that you could be proud of to own.

Whatever you do, you must do it with gusto, you must do it in volume. It is a case of repeat, repeat, repeat.

Extreme success is likely to be caused by some combination of the following factors. Number one, extreme maximization or minimization of one or two variables. Number two, adding success factors so that a bigger combination drives success, often in a nonlinear fashion. Number three, an extreme of good performance over many factors. And finally, four, catching and riding some sort of wave.

― Founders #330 Les Schwab [Link]

Learning is not memorizing information, learning is changing your behavior.

Troubles from time to time should be expected. They are an inescapable part in life. So why let them bother you, just handle them and then move on.

You are not changing human nature things will just keep repeating forever.

You need to do your best to avoid problems and the way you do that is you go for great. It’s hard to do but it makes your life easier if you go for great. Great businesses are rare, great people are rare. But it worth the time to find. Great businesses threw off way less problems than average or low quality businesses, just like great people cause way less problems in life than average or low quality people.

― Founders #329 Charlie Munger [Link]

Opportunity is a strange beast. It frequently appears after a loss.

When you read biographies of people who’ve done great work, it is remarkable how much luck is involved. They discover what to work on as a result of chance meeting, or by reading a book, they happen to pick up. So you need to make yourself a big target for luck. And the way to do that is to be curious.

It’s the impossibility of making something new out of something old. In a trade where novelty is all important, I decided that I was not meant by nature to raise corpses from the dead.

I think of my work as a femoral architecture dedicated to the beauty of the female body.

The entrepreneur only ever experiences two states, that of euphoria and terror.

My life, in fact, revolves around the preparation of a collection with its torments and happiness. I know that in spite of all the delights of a vacation, it will seem an intolerable gap. My thoughts stay with my dresses. It is now that I like to sit down in front of my dresses, gaze at them a last time altogether and thank them from the bottom of my heart.

― Founders #331 Christian Dior [Link]

The goal is to not have the longest train, but to arrive at the station first using the least fuel.

Find your edge, don’t diversify, and never repeat what works.

The formula that allowed Murphy to overtake Paley was deceptively simple: number one, focus on industries with attractive economic characteristics; number two, selectively use leverage to buy occasional large properties; number three, improve operations; number four, pay down debt; and number five, repeat this loop.

The behavior of peer companies will be mindlessly imitated.

The business of business is a lot of little decisions every day mixed up with a very few big decisions.

Stay in the game long enough to get lucky.

The outsider CEOs shared an unconventional approach, one that emphasized flat organizations and dehydrated corporate staffs.

Decentralization is the cornerstone of our philosophy. Our goal is to hire the best people we can and give them the responsibility and authority they need to perform their jobs. We expect our managers to be forever cost conscious and to recognize and exploit sales potential.

Headquarters staff was anorexic. No vice presidents in functional areas like marketing, strategic planning, or human resources. No corporate counsel and no public relations department either. In the Capital Cities culture, the publishers and station managers had the power and the prestige internally, and they almost never heard from New York if they were hitting their numbers.

The company’s guiding human resource philosophy: Hire the best people you can and leave them alone.

Murphy delegates to the point of Anarchy. Frugality was also central to the Ethos.

Murphy and Burke realized early on that while you couldn’t control your revenues, you can control your costs. They believed that the best defense against the revenue lumpiness inherent in advertising-supported businesses was a constant vigilance on costs, which became deeply embedded in the company culture.

― Founders #328 Tom Murphy [Link]

Life is like a big lake. All the boys get in the water at one end and start swimming. Not all of them will swim across. But one of them I assure will and that is Gróf.

― Founders #159 Andy Grove [Link]

The money will come as a byproduct of building great products and building a great organization, but you absolutely cannot put that first or you’re dooming yourself.

I was worth about $\$1$ million when I was 23. I was worth $\$10$ million when I was 24, and I was worth over $100 million when I was 25. And it wasn’t that important because I never did it for the money.

I’m looking for a fixer-upper with a solid foundation. I am willing to tear down walls, build bridges, and light fires. I have great experience, lots of energy, a bit of that vision thing, and I’m not afraid to start from the beginning.

Apple is about people who think outside the box, people who want to use computers to help them change the world, to help them create things that make a difference and not just get a job done.

As technology becomes more and more complex, Apple’s core strength of knowing how to make very sophisticated technology, comprehensible to mere mortals is in even greater demand.

Be a yardstick of quality. Some people are not used to an environment where excellence is expected.

Design is a funny word. Some people think design means how it looks. But of course, if you dig deeper, it’s really how it works. The design of the Mac wasn’t what it looked like, although, that was part of it. Primarily, it was how it worked. To design something really well, you have to get it. You have to really grok what it’s all about. It takes a passionate commitment to really, thoroughly understand something, to chew it up, not just quickly swallow it.

Simplicity is complexity resolved. Once you get into the problem, you see that it’s complicated, and then you come up with all these convoluted solutions. That’s where most people stop. And the solutions tend to work for a while, but the really great person will keep going.

I always considered a part of my job was to keep the quality level of people in the organizations I work with very high. That’s what I consider one of the few things that actually can contribute individually to, to really try to instill in the organization the goal of only having A players.

The people who are doing the work are the moving force behind the Macintosh. My job is to create a space for them to clear out the rest of the organization and keep it at bay.

I’ve always been attracted to the more revolutionary changes. I don’t know why. Because they’re harder. They’re just much more stressful emotionally. And you usually go through a period where everyone tells you that you’ve completely failed.

“I could see what the Polaroid camera should be. It was just as real to me as it was – as if it was sitting in front of me before I had ever built one.”

And Steve said, “Yes. That’s exactly the way I saw the Macintosh. If I ask someone who had only use a personal calculator, what a Macintosh should be, they couldn’t have told me. There was no way to do consumer research on it. I had to go and create it and then show it to the people and say, now what do you think?”

Both of them had this ability to, well, not invent products, but discover products. Both of” – this is wild, man. “Both of them said these products had always existed. It’s just that no one has ever seen them before. We were the ones who discovered them. The polaroid camera had always existed, and the Macintosh had always existed. It was a matter of discovery. Steve had huge admiration for Dr. Land. He was fascinated by him.

“Jobs had said several times that he thinks technological creativity and artistic creativity are two sides of the same coin. When asked about the differences between art and technology, he said, ‘I’ve never believed that they’re separate.’ Leonard da Vinci was a great artist and a great scientist. Michelangelo knew a tremendous amount about how to cut stones at a quarry, not just how to make a sculpture, right?”

“I don’t believe that the best people in any of these fields see themselves as one branch of a forked tree. I just don’t see that. People bring these ideas together a lot. Dr. Land at Polaroid said, ‘I want Polaroid to stand at the intersection of art and science, and I’ve never forgotten that.’”

In 30 years since founding Apple, Jobs has remained remarkably consistent. The demand for excellence, the pursuit of great design, the instinct for marketing, the insistence on each – on ease of use and compatibility, all have been there from the get-go.

“The things that Jobs cares about, design, ease of use, good advertising, are right in the sweet spot of the new computer industry. Apple is the only company left in this industry that designs the whole thing,” Jobs said.

“Hardware, software, developer relations, marketing. It turns out that, in my opinion, that is Apple’s greatest strategieec advantage. It is Apple’s core strategic advantage. If you believe that there’s still room for innovation in this industry, which I do, because Apple can then innovate faster than anyone else. The great thing is that Apple’s DNA hasn’t changed,” Jobs said. “The place where Apple has been standing for the last two decades is exactly where computer technology and the consumer electronics markets are converging. So it’s not like we’re having to cross the river to go somewhere else. The other side of the river is coming to us.”

― Founders #204 Steve Jobs [Link]

Articles

Plans should be measured in decades, execution should be measured in weeks.

― Sam Altman’s Blog [Blog]

News

A topological qubit is a system that encodes data into the properties of pairs of non-Abelian anyons by physically swapping them around with each other in space. Non-Abelian anyons is desirable component for holding and manipulating information in quantum computers because of its resilience which is rooted in math from the field of topology - the study of spatial relationships and geometry that hold true even when shapes are distorted.

― The Holy Grail of Quantum Computing, Weird [Link]

Researcher from Google and Quantinuum demonstrated a mechanism needed for a component called a topological qubit, which should promise a means to maintain and manipulate information encoded into quantum states more robustly than existing hardware designs. Topological qubits store and work with digital information in non-Abelian anyons, which retain a sort of “memory” of their past movement that enables the representation of binary data. While physicists previously proved the existence of non-Abelian anyons, Quantinuum’s and Google’s work is the first to demonstrate their signature feature, memory of movement. However, people disagreed that topological qubits had been created because the object was too fragile for practical use and could not reliably manipulate information to achieve practical quantum computing. Delivering a practical topological qubit will require all kinds of studies of non-Abelian anyons and the math underpinning their quirky behavior. Technological breakthroughs can be expected after incremental progress.

Google Quantum Al’s paper published in May 2023 is here and no I’m not going to read this theoretical physics paper :)

Apple’s rivals, such as Samsung, are gearing up to launch a new kind of “AI smartphone” next year. Counterpoint estimated more than 100 million AI-focused smartphones would be shipped in 2024, with 40 percent of new devices offering such capabilities by 2027.

Optimizing LLMs to run on battery-powered devices has been a growing focus for AI researchers. Academic papers are not a direct indicator of how Apple intends to add new features to its products, but they offer a rare glimpse into its secretive research labs and the company’s latest technical breakthroughs.

“Our work not only provides a solution to a current computational bottleneck but also sets a precedent for future research,” wrote Apple’s researchers in the conclusion to their paper. “We believe as LLMs continue to grow in size and complexity, approaches like this work will be essential for harnessing their full potential in a wide range of devices and applications.”

― Apple wants AI to run directly on it hardware instead of in the cloud [Link]

Apple is developing solutions to running LLM or other AI models directly on a customer’s iPhone. The published paper is titled “LLM in a Flash“. Apple’s focus is different from Microsoft’s and Google’s focus of developing Chatbots and GenAI over the Internet from cloud computing platform. I think the smartphone market would be revived and customer experience would be largely changed in the future with this potential vision of personalized and mobile AI agents / assistants. If this personalized little AI agent can learn everything about a person from childhood to adulthood, can we eventually get a perfect adult mind? It reminds me of what Alan Turing said about AI:

“Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child’s? If this were then subjected to an appropriate course of education one would obtain the adult brain.”

Since the beginning of 2017, China has chalked up more than 18 million EV sales, nearly half the world’s total and over four times more than the US, according to BloombergNEF data. By 2026, the research group projects that over 50% of all new passenger vehicle sales in China will be electric, compared to a little over a quarter in the US.

The growth of the network was both a result of state planning and private enterprise. Giant state-owned companies like State Grid Corp. of China were given mandates to roll out chargers, while private companies like Qingdao TGOOD Electric Co. jumped at the chance to build charging posts—in part to lay early claim to the best locations. Baidu’s mapping software—the Chinese equivalent of Google Maps—has them all integrated, delivering constant reminders of where to go. Payment is typically via an app or the ubiquitous WeChat platform.

Demand for new lithium-ion batteries is expected to increase about five-fold between 2023 and 2033, according to Julia Harty, an energy transition analyst at FastMarkets. Meeting that will require recycling as well as mining.

― Electric Cars Are Driving China Toward the End of the Age of Oil [Link]

I haven’t been back to China for 3 years, and last time (2021) there are just a few EVs. Maybe a few years later there are only a few gasoline cars left on the roads. The transition from gasoline cars to electrical vehicles won’t be fast but the peak of sales for gasoline cards is coming soon in China.

The lawsuit, filed in Federal District Court in Manhattan, contends that millions of articles published by The Times were used to train automated chatbots that now compete with the news outlet as a source of reliable information.

In its complaint, The Times said it approached Microsoft and OpenAI in April to raise concerns about the use of its intellectual property and explore “an amicable resolution,” possibly involving a commercial agreement and “technological guardrails” around generative A.I. products.

The lawsuit could test the emerging legal contours of generative A.I. technologies — so called for the text, images and other content they can create after learning from large data sets — and could carry major implications for the news industry.

“If The Times and other news organizations cannot produce and protect their independent journalism, there will be a vacuum that no computer or artificial intelligence can fill,” the complaint reads. It adds, “Less journalism will be produced, and the cost to society will be enormous.”

― The Times Sues OpenAI and Microsoft Over A.I. Use of Copyrighted Work [Link]

Microsoft, OpenAI hit with new lawsuit by authors over AI training [Link]

There is always a group of people who are positively working on changing the world while another group of people who are suspicious and concerned. Every lawsuit in AI field allows us to hold on and reflect on whether we are doing the right things and how to fix the problems along the way of innovation and development. It is a good point that if the intellectual property of journalism is not well-protect while AI is still in its immature stage with a lot of mistakes in text learning and generation, then less journalism will be produced, and less truths will be revealed and documented. The damage to the society is enormous.

DOJ close to filing massive antitrust suit against Apple over iPhone dominance: report [Link]

Apple is under extensive investigation of DOJ for potential antitrust violations. Google currently faces multiple antitrust cases as well.

Per the IRS, for-profit entities and not-for-profit entities are fundamentally at odds with each other, so in order to combine the two competing concepts, OpenAl came up with a novel structure which allowed the non-profit to control the direction of a for-profit entity while providing the investors a “capped” upside of 100x. This culminated in a $1Bn investment from Microsoft, marking the beginning of a key strategic relationship, but complicating the company’s organizational structure and incentives.

― Quick Essay: A Short History of OpenAI [Link]

“I deeply regret my participation in the board’s actions,” Sutskever, a longtime Al researcher and cofounder of OpenAl, posted on X. “I never intended to harm OpenAl. I love everything we’ve built together and l will do everything I can to reunite the company.”

― 4 days from fired to re-hired: A timeline of Sam Altman’s ouster from OpenAI [Link]

More news articles: [Link] [Link] [Link] [Link]

This article (Quick Essay: A Short History of OpenAI) reviewed the history of OpenAl from 2015 when it was founded to 2023 when Sam was once fired. A crucial development happened in 2018 when the company first introduced the foundational architecture of GPT in a paper “Improving Language Understanding by Generative Pre-Training”. This leads to the flagship product of the company - ChatGPT, in Nov 2022. In 2019 OpenAl transitioned from nonprofit to a “capped-profit” model. This novel convoluted corporate structure led to conflicting motivations and incentives within the company and it latter raised the board’s concern about the company’s commitment to safety.

Sam Altman was fired on Nov 17, 2023. On Nov 19, 2023, OpenAl hired former Twitch CEO Emmett Shear as its interim CEO. Meanwhile Microsoft CEO Satya Nadella announced that they would hired Sam to lead a new Al department. On Nov 20, 2023, Nearly all 800 OpenAl employees signed a letter calling for the resignation of the company’s board and the return of Altman as CEO. On Nov 21, 2023, Sam returned as CEO of OpenAl.

One story in 2020: Two of the lead Al developers Amodei and his sister Daniela left OpenAl in late 2020 to launch Anthropic over concerns the company was moving too quickly to commercialize its technology. Anthropic was founded aiming to develop more safer and trustworthy model and it has billions invested from Google, Amazon, Salesforce, Zoom, etc.

Since Sam was rehired, the questions about neglecting Al safety has been quieted, and new board members appear to be more focused on profitability. There is no doubt of OpenAl’s capability of profitably scaling ChatGPT, but it should raise doubts about whether OpenAl is still committing to its purpose in the future.

Sutskever was recruited to OpenAl from Google in 2015 by Elon Musk, who describes him as “the linchpin for OpenAl being successful”. A tweet from Greg Brockman confirms that Ilya was a key figure in Altman’s removal. But Sutskever also a signee calling Sam back to CEO. He later said he deeply regret his participation in the board’s actions.

National Artificial Intelligence Research Resource Pilot [Link]

Training AI models costs huge amount of money. There is growing divide between industry and academia in AI. Thanks to this pilot programs stepping towards democratizing AI access.

Other news:

AI Hallucinations Are a Boon to Creatives [Link]

Altman Seeks to Raise Billions for Network of AI Chip Factories [Link]

SubStack

Tesla: AI & Robotics - App Economy Insights [Link]

What Tesla is experiencing: 1) price cuts, 2) prioritizing volume and fleet growth, 3) continued improvement in the cost of goods sold per vehicle. What negatively impact gross margin are 1) price cuts, 2) Cybertruck production ramp, 3) AI, 4) other product expenses. What positively impact gross margin are 1) lower cost per vehicle, 2) delivery growth, 3) gross margin improvement for non-auto segments. What to watch of Tesla’s business: 1) Model Y Triumph, 2) supercharging the EV market (North American Charging Standard (NACS)), 3) market share hit 4% in North America, 4) Autopilot and Full Self-Driving (FSD) beta software (V12), 5) energy storage deployments (15-gigawatt hours of batteries delivered), 5) Optimus humanoid robot, 6) next-generation platform “Redwood”, 7) Dojo supercomputer.

The Netherlands has restricted the export of ASML’s cutting-edge extreme ultraviolet (EUV) lithography machines to China, a decision influenced by international diplomatic pressures, notably from the US.

― ASML: Advanced Chip Monopoly - App Economy Insights [Link]

ASML is the sole producer of advanced EUV lithography machines at the center of the global chip supply chain. Since the restriction of export of EUV lithography machines to China, it is now also at the mercy of the US-China trade war. There is a list of risk ASML is facing: IP theft and security breaches, cybersecurity costs, China’s ambitions of developing its semiconductor sector, semiconductor industry volatility (shortage and gluts), etc.

Paper and Reports

The economic potential of generative AI: The next productivity frontier [Link]

McKinsey Technology Trends Outlook 2023 [Link]

Starting with my Favorite Quote

Posted on 2023-12-31

Thanks to my favorite quote and my motto in life for pulling me back from the dark over and over and over again.

“It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again... who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly.”

—— Theodore Roosevelt