Substack
To me, the best model going forward is going to be based on the
weighted performance per parameter and training token
count. Ultimately, a model keeps getting better the longer you
train it. Most open model providers could train longer, but it hasn’t
been worth their time. We’re starting to see that change.
The most important models will represent improvements in
capability density, rather than shifting the frontier.
In some ways, it’s easier to make the model better by training
longer compared to anything else, if you have the data.
The core difference between open and closed LLMs on these charts
is how undertrained open LLMs often are. The only open
model confirmed to be trained on a lot of tokens is DBRX.
― The End of the “Best Open LLM” - Interconnects [Link]
Good analysis of the direction of open LLM development in 2023 and
2024. In 2023, models were progressing in MMLU by leveraging more
compute budgets to handle scaled active parameters and training tokens.
In 2024, the progressing direction is slightly changed to be orthogonal
to previous - which is improving on MMLU while keeping compute budgets
constant.
The companies that have users interacting with their models
consistently have moats through data and habits. The models themselves
are not a moat, as I discussed at the end of last year when I tried to
predict machine
learning moats, but there are things in the modern large language
model (LLM) space that open-source will really struggle to replicate.
Concretely, that difference is access to quality and diverse training
prompts for fine-tuning. While I want open-source to win out for
personal philosophical and financial factors, this obviously is not a
walk in the park for the open-source community. It’ll be a siege of a
castle with, you guessed it, a moat. We’ll see if the moat
holds.
― Model commoditization and product moats -
Interconnects [Link]
The goal of promoting scientific understanding for the betterment
of society has a long history. Recently I was pointed to the essay The
Usefulness of Useless Knowledge by Abraham Flexner in 1939 which
argued how basic scientific research without clear areas for profit will
eventually turn into societally improving technologies. If we want LLMs
to benefit everyone, my argument is that we need far more than just
computer scientists and big-tech-approved social scientists working on
these models. We need to continue to promote openness to support this
basic feedback loop that has helped society flourish over the last few
centuries.
The word openness has replaced the phrase open-source among most
leaders in the open AI movement. It’s the easiest way to get across what
your goals are, but it is not better in indicating how you’re actually
supporting the open ecosystem. The three words that underpin the one
messy word are disclosure (the details),
accessibility (the interfaces and infrastructure), and
availability (the distribution).
― We disagree on what open-source AI should mean -
Interconnects [Link]
Google: “A Positive Moment” [Link]
The report of Google Search’s death is exaggerated so far. In fact,
search advertising has grown faster at Google than at Microsoft. User
searching behavior is harder to change than people expected. Also,
Google is leading the development of AI powered tools for Search: 1)
“circle to search” is feature allowing a search from an image, text, or
video without switching apps. 2) “Point your camera, ask a question” is
a feature allowing for multisearch with both images and text for complex
questions given an image to the tool. Overall, SGE (Search Generative
Experience) is revolutionizing search experience (“10 blue links”) by
introducing a dynamic AI-enhanced experience. So far from I observed AI
powers Google Search rather than weakens it.
Amazon: Wild Margin Expansion - App Economy Insights
[Link]
Amazon’s margin expansion: AWS hit $100 B run rate with a 38%
operating margin; Ads is surging; delivery costs have been reduced.
The biggest risk is not correctly projecting demand for end-user
AI consumption, which would threaten the utilization of the capacity and
capital investments made by tech firms today. This would leave them
exposed at the height of the valuation bubble, if and when it bursts,
just like Cisco’s growth story that began to
unravel in 2000. After all, history may not repeat, but it
often rhymes.
At the Upfront Ventures confab mentioned earlier, Brian
Singerman, a partner at Peter Thiel’s Founders Fund, was asked about
contrarian areas worth investing in given the current landscape. His
response: “Anything not AI”.
― AI’s Bubble Talk Takes a Bite Out Of The Euphoria - AI
Supremacy [Link]
When we talk about investment, we talk about economic values. Current
situation of AI is very similar to Cisco’s in 2000. Cisco as an internet
company spread the capacity of the World Wide Web, but sooner people
realized that there is no economic value in internet company, instead,
opportunities are in e-commerce etc. AI is a tool very similar to web
tech. Currently, with heightened expectations, people are allocating
investments and capital expenditure in AI model development, however,
end-user demand is unclear and revenue is relatively minimal. This
situation makes AI look like a bubble from a very long term
perspective.
Steve Jobs famously said that Apple stands at the intersection of
technology and liberal arts. Apple is supposed to enhance and improve
our lives in the physical realm, not to replace cherished physical
objects indiscriminately.
― Apple’s Dystopian iPad Video - The Rational Walk
Newsletter [Link]
Key pillars of the new strategy (on gaming):
- Expanding PC and cloud gaming options.
- Powerful consoles (still a core part of the vision).
- Game Pass subscriptions as the primary access point.
- Actively bringing Xbox games to rival platforms (PS5,
Switch).
- Exploring mobile gaming with the potential for handheld
hardware.
Microsoft’s “every screen is an Xbox” approach is a gamble and
may take a long time to pay off. But the industry is bound to be
device-agnostic over time as it shifts to the cloud and offers
cross-play and cross-progression. It’s a matter of when not if.
― Microsoft: AI Inflection - App Economy Insights
[Link]
Highlights: Azure’s growth accelerated sequentially thanks to AI
services and was the fastest-growing of the big three (Amazon AWS,
Google Cloud, Microsoft Azure). On Search, Microsoft is losing market
share to Alphabet. Capex on AI grows roughly 80% YoY. On gaming, it’s
diversifying approaches from selling consoles. Copilot and the Office
succeed with Enterprise customers.
To founders, my advice is to remain laser-focused on building
products and services that customers love, and be thoughtful and
rational when making capital allocation decisions. Finding
product-market fit is about testing and learning from small bets before
doubling down, and it is often better to grow slower and more
methodically as that path tends to lead to a more durable and profitable
business. An axiom that doesn’t seem to be well understood is that the
time it takes to build a company is also often its half-life.
― 2023 Annual Letter - Chamath Palihapitiya [Link]
This is a very insightful letter about how economic and tech trends
of 2023 have shaped their thinking and investment portfolio. What I have
learned from this letter:
Tech industry has shifted their focus from unsustainable “growth
at any cost” to more prudent forms of capital allocation. This results
in laying off employees and slashing projects that are not relevant to
the core business.
Rising of interest rate is one of the reasons of bank crisis.
During zero interest rate decade, banks sought higher rates of return by
purchasing longer duration assets while the value of them are negatively
correlated to interest rate. As those caused losses are known by the
public, a liquidity crisis ensued.
The advancement of Gen AI has lowered the barriers of starting a
software company, and lowered capital requirement in Bio Tech and
material sciences, and changed the process of building companies
fundamentally, and empowered new entrants to challenge established
businesses.
Heightened geopolitical tensions due to Russia-Ukraine conflict,
Israel and Hamas, escalating tensions between China and Taiwan, resulted
in a de-globalization trend and also a strategic shift in the US. US
legislative initiatives aims to fuel a domestic industrial renaissance
by incentivizing reshoring and fostering a more secure and resilient
supply chain. They include CHIPS Act, Infrastructure Investment, Job
Act, Inflation Reduction Act, etc.
- The author highlights the opportunity for allocators and founders:
companies can creatively and strategically tap into different pools of
capital-debt, equity, and government funding.
OpenAI’s strategy to get its technology in the hands of as many
developers as possible — to build as many use cases as possible — is
more important than the bot’s flirty disposition, and perhaps even new
features like its translation capabilities (sorry).
If OpenAI can become the dominant AI provider by delivering quality
intelligence at bargain prices, it could maintain its lead for some
time. That is, as long as the cost of this technology doesn’t drop near
zero.
A tight integration with Apple could leave OpenAI with a strong
position in consumer technology via the iPhone and an ideal spot in
enterprise via its partnership with Microsoft.
― OpenAI Wants To Get Big Fast, And Four More Takeaways From
a Wild Week in AI News - Big Technology [Link]
As GPT-4o is 2x faster and 50% cheaper, this discourages competitors
to develop LLMs to compete and encourages companies to build with
OpenAI’s model for their business. This shows that OpenAI wants to get
big fast. However, making GPT-4o free disincentivizes users from
subscribing the Plus version.
There is a tight and deep bond between OpenAI and Apple. The desktop
app has been debuted on Mac and Apple will build OpenAI’s GPT Tech into
mobile iOS.
“You can borrow someone else’s stock ideas but you can’t borrow
their conviction. True conviction can only be obtained by trusting your
own research over that of others. Do the work so you know when to sell.
Do the work so you can hold. Do the work so you can stand
alone.”
Investing isn’t about blindly following the herd. It’s about
carving your own path, armed with knowledge, patience, and a relentless
pursuit of growth and learning.
― Hedge Funds’ Top Picks in Q1 - App Economy
Insights [Link]
As I’ve dug into this in more detail, I’ve become convinced that
they are doing something powerful by searching over language
steps via tree-of-thoughts reasoning, but it is much smaller of
a leap than people believe. The reason for the hyperbole is the goal of
linking large language model training and usage to the core components
of Deep RL that enabled success like AlphaGo: self-play and look-ahead
planning.
To create the richest optimization setting, having the ability to
generate diverse reasoning pathways for scoring and learning from is
essential. This is where Tree-of-Thoughts comes in. The
prompting from ToT gives diversity to the generations, which a policy
can learn to exploit with access to a PRM.
Q seems to be using PRMs to score Tree of Thoughts reasoning data
that then is optimized with Offline RL. This wouldn’t look too different
from existing RLHF toolings that use offline algorithms like DPO or ILQL
that do not need to generate from the LLM during training. The
‘trajectory’ seen by the RL algorithm is the sequence of reasoning
steps, so we’re finally doing RLHF in a multi-step fashion rather than
contextual bandits!
Let’s Verify Step by
Step: a good introduction to PRMs.
― The Q* hypothesis: Tree-of-thoughts reasoning, process
reward models, and supercharging synthetic data - Interconnects
[Link]
It’s well known on the street that Google DeepMind has split all
projects into three categories: Gemini (the large looming model),
Gemini-related in 6-12months (applied research), and fundamental
research, which is oddly only > 12 months out. All of Google
DeepMind’s headcount is in the first two categories, with most of it
being in the first.
Everyone on Meta’s GenAI technical staff should
spend about 70% of the time directly on incremental
model improvements and 30% of the time on ever-green
work.
A great read
from Francois Chollet on links between prompting LLMs, word2vec, and
attention. One of the best ML posts I’ve read in a while.
Slides
from Hyung Won Chung’s (OpenAI) talk on LLMs. Great summary of
intuitions for the different parts of training. The key point: We can
get further with RLHF because the objective function is
flexible.
― The AI research job market shit show (and my experience) -
Interconnects [Link]
10 Lessons From 2024 Berkshire Hathaway Annual Shareholder
Meeting - Capitalist Letters [Link]
What I’ve learned from this article:
Why did Berkshire trimmed its APPL position?
No concern about Apple’s earnings potential, make sense to take some
profits as value is now too high.
Right way to look at share buybacks
A business should pay dividends only if it cannot make good use of
the excess capital it has. Good use capital means the Return of Equity,
which is on average 12% for American companies. If the company is able
to allocate capital better than shareholders themselves and provide them
with above average returns, it should retain the earnings and allocate
capital itself.
Buybacks only makes sense at the right price and buying back shares
just to support stock price is not the best action ti take for
shareholders. All investment decisions should be price
dependent.
How would he invest small sums of money?
At the time of market crashes or economic downturns, you find
exceptional companies trading at ridiculously cheap prices and that’s
your opportunity, When you find those companies fairly priced or
overvalued and you look for special situations while holding onto your
positions in those exceptional companies.
Views on capital allocation
Study picking businesses, not stocks.
Investing in foreign countries
America has been a great country for building wealth and capitalist
democracy is the best system of governance ever invented.
Advice on job picking
Remember Steve Jobs’ famous words in the Stanford Commencement speech
he gave before his death: “Keep looking, don’t settle!”
On the importance of culture
In Berkshire culture, shareholders feel themselves as the owners of
the businesses. Greg Abel will keep the culture alive in the
post-Buffett period and this will automatically attract top talent to a
place where they are given full responsibility and trust.
When to sell stocks
- A bigger opportunity comes up, 2. something drastically changes in
the business, and 3. to raise money
Effects of consumer behavior on investment decisions
Two types of businesses have durable competitive advantage: 1) Lowest
cost suppliers of products and services, 2) suppliers of unique products
and services.
How to live a good life? “I’ve written my obituary the way I’ve
lived my life”‘ - Charlie Munger
NVIDIA: Industrial Revolution - App Economy Insights
[Link]
Primary drivers of Data Center Revenue: 1) Strong demand (up 29%
sequentially) for the Hopper GPU computing platform used for training
and inferencing with LLMs, recommendation engines, and GenAl apps, 2)
InfiniBand end-to-end solutions (down 5% sequentially due to timing of
supply) for networking. NVIDIA started shipping the Spectrum-X Ethernet
networking solutions optimized for Al.
In the earning call, three major customer categories are provided: 1)
cloud service providers (CSPs) including hyperscalers Amazon Microsoft
and Google. 2) enterprise usage: Tesla expanded training Al cluster to
35000 H100 GPUs and used NVIDIA Al for FSD V12. 3) consumer internet
companies: Meta’s Llama 3 powering Meta Al was trained on a cluster of
24000 H100 GPUs.
Huang explained in the earning call that AI is not a chip problem
only but also a systems problem now. They build AI factories.
For further growth, Blackwell platform is coming, Spectrum-X
networking is expanding, new software tools like NIMs is developing.
A lot of current research focuses on LLM architectures, data
sources prompting, and alignment strategies. While these can lead to
better performance, such developments have 3 inter-related critical
flaws-
- They mostly work by increasing the computational costs of
training and/or inference.
- They are a lot more fragile than people realize and don’t lead
to the across-the-board improvements that a lot of Benchmark Bros
pretend.
- They are incredibly boring. A focus on getting published/getting
a few pyrrhic victories on benchmarks means that these papers focus on
making tweaks instead of trying something new, pushing boundaries, and
trying to address the deeper issues underlying these
processes.
― Revolutionizing AI Embeddings with Geometry
[Investigations] - Devansh [Link]
Very few AI research work don’t have # 1 and # 3 flaws and they are
really good hard-core work. Time is required to verify whether they are
generalizable, widely applicable or not. Especially nowadays the process
of scientific research is very different from previous years where there
was usually a decade between starting your work and publishing it.
This article highlights some publications in complex embedding and
looked into how they improved embeddings by using complex numbers.
Current challenges in embedding are 1) sensitivity to outliers 2)
limited capacity in capture complex relationship in unstructured text,
3) inconsistency in pairwise rankings of similarities, and 4)
computational cost. The next generation complex embedding is benefitting
from the following pillars: 1) complex geometry provides richer space to
capture nuanced relationships and handle outliers, 2) orthogonality
allows each dimension to be independent and distinct, 3) contrastive
learning can be used to minimize the distance between similar pairs and
maximize the distance between dissimilar pairs. Complex embeddings have
a lot of advantages: 1) increasing representation capacity with two
components (real and imaginary) of complex numbers, 2) complex geometry
allows for orthogonality and thus improves generalization, and also
allows use to reach stable convergence quickly, 3) robust features can
be captured which improves robustness, and 4) solved limitation of
cosine similarity (saturation zones which lead to vanishing gradients
during optimization) by angle optimization in complex space.
Llama 3 8B might be the most interesting all-rounder for
fine-tuning as it can be fine-tuned no a single GPU when using
LoRA.
Phi-3 is very appealing for mobile devices. A quantized version
of it can run on an iPhone 14.
― How Good Are the Latest Open LLMs? And Is DPO Better Than
PPO? [Link]
Good paper review article. Highlights key discussions:
Mixtral 8x22B: The key idea is to replace each feed-forward
module in a transformer architecture with 8 expert layers. It achieves
lower active parameters (cost) and higher performance (MMLU).
Llama 3: The main difference between Llama 3 and Llama 2 are 1)
vocab size has been increased, 2) used grouped-query attention, 3) used
both PPO & DPO. The key research finding is that the more data the
better performance, no matter what model size is.
“Llama 3 8B might be the most interesting all-rounder for fine-tuning
as it can be fine-tuned no a single GPU when using LoRA.”
Phi-3: Key characteristics are 1) it’s based on Llama
architecture, 2) trained on 5x fewer tokens than Llama 3, 3) used the
same tokenizer with a vocab size of 32064 as Llama2, much smaller than
Llama 3 vocab size, 4) has only 3.8B parameters, less than half the size
of Llama 3 8B, 5) secret sauce is dataset quality over quantity - it’s
trained on heavily filtered web data and synthetic data.
“Phi-3 is very appealing for mobile devices. A quantized version of
it can run on an iPhone 14.”
OpenELM: key characteristics are 1) 4 relatively small sizes:
270M, 450M,1.1B, and 3B, 2) instruct version trained with rejection
sampling and DPO, 3) slightly better than OLMo in performance, even
though trained on 2x fewer tokens, 4) main architecture teak - a
layer-wise scaling strategy, 5) sampled a relatively smaller subset of
1.8T tokens from various public datasets, but no clear rationale for
subsampling, 6) one main research finding is that there is no clear
difference between LoRA and DoRA for parameter efficient
fine-tuning.
About the layer-wise scaling strategy: 1) there are N transformer
blocks in a model, 2) layers are gradually widened from the early to the
later transformer blocks, so for each block: a) number of heads are
increased, b) dimension of each layer is increased.
DPO vs PPO: The main difference between DPO and PPO is that “DPO
does not require training a separate reward model but uses a
classification-like objective to update LLM directly”.
Key findings of the paper and best practices suggested: 1) PPO is
generally better than DPO if you use it correctly. DPO suffers from
out-of-distribution data, which means instruction data is different from
preference data. The solution could be to “add a supervised instruction
fine-tuning round on the preference dataset before following up with DPO
fine-tuning.”, 2) If you use DPO, make sure to perform SFT on preference
data first, 3) “iterative DPO which involves labeling additional data
with an existing reward model is better than DPO on existing preference
data.”, 4) “If you use PPO, the key is to use large batch sizes,
advantage normalization, and parameter update via exponential moving
average.”, 5) though PPO is generally better, DPO is more
straightforward and will still be a popular go-to option, 6) both can be
used. Recall the pipeline behind Llama3: pretraining -> SFT ->
rejection sampling -> PPO -> DPO.
Google I/O AI keynote updates 2024 - AI Supremacy
[Link]
Streaming Wars Visualized - App Economy Insights [Link]
This Week in Visuals - App Economy Insights [Link]
Gig Economy Shakeup - App Economy Insights [Link]
Articles
Musings on building a Generative AI product - LinkedIn
Engineering Blog [Link]
This is a very good read about developing Gen AI product for business
by using pre-trained LLM. This article elaborates how this product is
designed, how each part works specifically, what works and what does not
work, what has been improving, and what has been struggling. Some
takeaways for me are
Supervised fine tuning step was done by embedding-based retrieval
(EBR) powered by an in-memory database to inject response examples into
prompts.
An organizational structure was designed to ensure communication
consistency: one horizontal engineering pod for global templates and
styles, and several vertical engineering pods for specific tasks such as
summarization, job fit assessment, interview tips, etc.
Tricky work:
Developing end to end automatic evaluation pipeline.
Skills in dynamically discover and invoke APIs / agents.
This requires input and output to be ‘LLM friendly’ - JSON or YAML
schemes.
Supervised fine tuning by injected responses of internal
database.
As evaluation becoming more sophisticated, prompt engineering needs
to be improved to reach high quality/evaluation scores. The difficulty
is that quality scores shoot up fast then plateau so it’s hard to reach
a very high score in the late improvement stage. This makes prompt
engineering more like an art rather than science.
Tradeoff of capacity and latency
Chain of Thoughts can improve quality and accuracy of responses, but
increase latency. TimeToFirstToken (TTFT) & TimeBetweenTokens (TBT)
are important to utilization but need to be bounded to limit latency.
Besides, they also intend to implement end to end streaming and async
non-blocking pipeline.
The concept of open source was devised to ensure developers could
use, study, modify, and share software without restrictions. But AI
works in fundamentally different ways, and key concepts don’t translate
from software to AI neatly, says Maffulli.
But depending on your goal, dabbling with an AI model could
require access to the trained model, its training data, the code used to
preprocess this data, the code governing the training process, the
underlying architecture of the model, or a host of other, more subtle
details.
Which ingredients you need to meaningfully study and modify
models remains open to interpretation.
both Llama 2 and Gemma come with licenses that restrict what
users can do with the models. That’s anathema to open-source principles:
one of the key clauses of the Open Source Definition outlaws the
imposition of any restrictions based on use cases.
All the major AI companies have simply released pretrained
models, without the data sets on which they were trained. For people
pushing for a stricter definition of open-source AI, Maffulli says, this
seriously constrains efforts to modify and study models, automatically
disqualifying them as open source.
― The tech industry can’t agree on what open-source AI means.
That’s a problem. ― MIT Technology Review [Link]
This article argues that the definitions of open-source AI are
problematic. ‘Open’ models either have restriction on usage or don’t
release details of training data. This does not fit traditional
definition of ‘open source’. However, people argue that for the special
case of AI, we need different definition of open source. As long as the
definition remains vague, it’s problematic, because big tech will define
open-source AI to be what suits it.
Everything I know about the XZ backdoor [Link]
Some great high-level technical overview of XZ backdoor [Link] [Link]
[Link]
[Infographic]
[Link] [Link]
A backdoor in xz-utils (used for lossless compression) was recently
revealed by Andres Freund (Principle SDE at Microsoft). The backdoor
only shows up when a few specific criteria are met at least: 1) running
a distro that uses glibc, 2) have version 5.6.0 or 5.6.1 xz installed or
liblzma installed. There is a malicious script called
build-to-host.m4
which checks for various conditions like
the architecture of the machine. If those conditions check, the payload
is injected into the source tree. The intention of payload is still
under investigation. Lasse Collin, one of the maintainer of the repo,
has posted an update and
is working on carefully analyzing the situation. The author Evan Boehs
in the article present a timeline of the attack and online
investigators’ discoveries of Jia Tan identity (from IP address,
LinkedIn, commit
timings, etc), and raises our awareness of the human costs of open
source.
Having a crisp mental model around a problem, being able to break
it down into steps that are tractable, perfect first-principle thinking,
sometimes being prepared (and able to) debate a stubborn AI — these are
the skills that will make a great engineer in the future, and likely the
same consideration applies to many job categories.
― Why Engineers Should Study Philosophy ― Harvard Business
Review [Link]
Human comes into a new stage of learning: smartly asking AI questions
to get answers as accurate as possible. So prompt engineering is a very
important skill in AI era. In order to master prompt engineering, we
need to have divide and conquer mindset, perfect first-principle
thinking, critical thinking, and skepticism.
If we had infinite capacity for memorisation, it’s clear the
transformer approach is better than the human approach - it truly is
more effective. But it’s less efficient - transformers have to store so
much information about the past that might not be relevant. Transformers
(🤖) only decide what’s relevant at recall time. The
innovation of Mamba (🐍) is allowing the model better ways of forgetting
earlier - it’s focusing by choosing what to discard using
Selectivity, throwing away less relevant information at
memory-making time.
― Mamba Explained [Link]
A very in-depth explanation of Mamba architecture. So the main
difference between Transformer and Mamba is that Transformer stores all
past information and decides what is relevant at recall time. While
Mamba uses Selectivity to decide what to discard earlier. Mamba ensures
both efficiency and effectiveness (space complexity reduces from O(n) to
O(1), time complexity reduces from O(n^2) to O(n)). If Transformer has
high effectiveness and low efficiency due to large state, and RNN has
high efficiency and low effectiveness due to small state, Mamba is in
between - Mamba selectively and dynamically compress data into the
state.
The Power of Prompting ― Microsoft Research Blog [Link]
Basically this study demonstrates that GPT-4 is able to outperform a
leading model that was fine-tuned specifically for medical application
by Medprompt - a composition of several prompting strategies. This shows
that fine-tuning might not be necessary in the future though it can
boost performance, it is resource-intensive and cost-prohibitive. Simple
prompting strategies could serve to transform generalist models into
specialists and extending benefits of models to new domains and
applications. Similar study was also done in finance domain by JP Morgan
with similar results.
Previously, we made some progress matching patterns of neuron
activations, called features, to human-interpretable concepts. We used a
technique called “dictionary learning”, borrowed from classical machine
learning, which isolates patterns of neuron activations that recur
across many different contexts.
In turn, any internal state of the model can be represented in
terms of a few active features instead of many active neurons. Just as
every English word in a dictionary is made by combining letters, and
every sentence is made by combining words, every feature in an Al model
is made by combining neurons, and every internal state is made by
combining teatures.
The features are likely to be a faithful part of how the model
internally represents the world, and how it uses these representations
in its behavior.
― Mapping the Mind of a Large Language Model -
Anthropic [Link]
This is an amazing work towards AI safety by Anthropic. The main goal
is to understand the inner workings of AI models and identify how
millions of concepts are represented inside Claude Sonnet, so that
developers can better control AI safety. Previous progress of this work
was to match pattern of neuron activations (“features”) to
human-interpretable concepts by technique called “dictionary learning”.
Now they are scaling up the technique to the vastly larger AI language
models. Below is a list of key experiments and findings.
- Extracted millions of features from the middle layer of Claude 3.0
Sonnet. Features have a depth, breadth, and abstraction reflecting
Sonnet’s advanced capabilities.
- Find more abstract features - responding to bugs in code, discussion
of gender biases in professions, etc.
- Measure a “distance” between features based on which neurons
appeared in their activation patterns. They find that features with
similar concept are close to each other. This demonstrates internal
organization of concepts in AI model correspond to human notions of
similarity.
- By artificially amplifying or suppressing features, they see how
Claude’s responses change. This shows that features can be used to
change how a model acts.
- For the purpose of AI safety, they find features corresponding to
the capabilities with misuse potential (code backdoors, developing
bio-weapons), different forms of biases (gender discrimination, racist
claims about crime), and potentially problematic AI behavior
(power-seeking, manipulation, secrecy)
- For previous concern about sycophancy, they also find a feature
associated with sycophantic praise.
This study proposed a good approach to ensure AI safety: use the
technique described here to monitor AI systems for dangerous behaviors
and to debias outcomes.
To qualify as a “Copilot+ PC” a computer needs distinct CPUs,
GPUs, and NPUs (neural processing units) capable of >40 trillion
operations per second (TOPS), and a minimum of 16 GB RAM and a 256 GB
SSD.
All of those analysts who assumed Wal-Mart would squish Amazon in
e-commerce thanks to their own mastery of logistics were like all those
who assumed Microsoft would win mobile because they won PCs. It turns
out that logistics for retail are to logistics for e-commerce as
operating systems for a PC are to operating systems for a phone. They
look similar, and even have the same name, but require fundamentally
different assumptions and priorities.
I then documented a few seminal decisions made to demote windows,
including releasing Office on iPad as soon as he took over, explicitly
re-orienting Microsoft around services
instead of devices, isolating the Windows organization from the rest
of the company, killing Windows Phone, and finally, in the decision that
prompted that Article, splitting up Windows itself. Microsoft was
finally, not just strategically but also organizationally, a services
company centered on Azure and Office; yes, Windows existed, and still
served a purpose, but it didn’t call the shots for the rest of
Microsoft’s products.
That celebration, though, is not because Windows is
differentiating the rest of Microsoft, but because the rest of Microsoft
is now differentiating Windows. Nadella’s focus on AI and the company’s
massive investments in compute are the real drivers of the business,
and, going forward, are real potential drivers of Windows.
This is where the Walmart analogy is useful: McMillon needed to
let e-commerce stand on its own and drive the development of a
consumer-centric approach to commerce that depended on centralized
tech-based solutions; only then could Walmart integrate its stores and
online services into an omnichannel solution that makes the company the
only realistic long-term rival to Amazon.
Nadella, similarly, needed to break up Windows and end Ballmer’s
dreams of vertical domination so that the company could build a
horizontal services business that, a few years later, could actually
make Windows into a differentiated operating system that might, for the
first time in years, actually drive new customer acquisition.
― Windows Returns - Stratechery [Link]
Chatbot Arena results are in: Llama 3 dominates the upper and
mid cost-performance front (full analysis) ― Reddit [Link]
Efficiently fine-tune Llama 3 with PyTorch FSDP and
Q-Lora [Link]
YouTube and Podcasts
I don’t have an answer to peace in the Middle East, I wish I did,
but I do have a very strong view that we are not going to get to peace
when we are apologizing or denying crimes against humanity and crime
mass rape of women. That’s not the path to peace, the path to peace is
not saying this didn’t happen, the path to peace is saying this happened
no matter what side of the fence you are on no matter what side of the
world you are on, if you are the far right the far left, anywhere on the
world, we are not going to let this happen again and we are going to get
to peace to make sure. - Sheryl Sandberg
― In conversation with Sheryl Sandberg, plus open-source AI
gene editing explained - All-In Podcast [Link]
U.N. to Study Reports of Sexual Violence in Israel During Oct. 7
Attack [Link]
Western media concocts ‘evidence’ UN report on Oct 7 sex crimes
failed to deliver [Link]
It’s crazy that what is happening right now in some of the colleges
is not to protest sexual violence as a tool of war by Hamas. This kind
of ignorance or denial of sexual violence is horrible. People are so
polarized to black and white that if something does not fit into their
view, they are going to reject it. There are more than two sides to the
Middle East story, one of them is sexual violence - mass rape, genital
mutilation of men and women, women tied to trees naked bloody leg
spread…
There is a long history of the involvement of women’s bodies in Wars.
It’s only 30 years ago, people started to say rape is not a tool of War
and should be prosecuted as a war crime against humanity. The feminist,
human rights, and civil rights groups made this happen. Now it happened
again in Gaza according to the report released by U.N., however there
are a lot difficulties in proving and testifying the truth e.g. they
couldn’t locate a single victim, or they don’t have the victim rights to
take pictures. But victims are dead and they cannot speak up. Denying
the fact of sexual violence is just unacceptable. And there is such a
great documentary
shedding lights on the unspeakable sexual violence committed on Oct 7,
2023 that I think everyone should watch.
Good news is that the testimony of eyewitness meets the criteria of
any international or global court. So crimes can be proven by any
eyewitness for sure.
John Schulman - Reinforcement Learning from Human Feedback:
Progress and Challenges [Link]
John Schulman is a research scientist and cofounder of OpenAI,
focusing on Reinforcement Learning (RL) algorithms. He gave a talk on
making AI more truthful on Apr 24, 2023 in UCB. The ideas and
discussions are still helpful and insightful today.
In this talk, John discussed the issue of hallucination with large
language models. He claims that behavior cloning or supervised learning
is not enough to fix the hallucination problem, instead, reinforcement
learning from human feedback (RLHF) can help improve the model’s
truthfulness by 1) adjusting output distribution so model is allowed to
express uncertainty, challenge premise, admit error, and 2) learning
behavior boundaries. In his conceptual model, fine-tuning leads the
model to hallucinate when it lacks knowledge. Retrieval and citing
external sources can help improve verifiability. John discusses models
that can browse the web to answer technical questions, citing relevant
sources.
John mentioned three open problems in LLM: 1) how to train models to
express uncertainty in natural language, 2) go beyond what human
labelers can easily verify (“scalable oversight”), and 3) optimizing for
true knowledge rather than human approval.
The 1-Year Old AI Startup That’s Rivaling OpenAI — Redpoint’s
AI Podcast [Link]
A great interview with the CEO of Mistral Arthur Mensch on the topic
of sovereignty and open models as a business strategy. Here are some
highlighted points from Arthur:
- Open-source is going to solidify in the future. It is an
infrastructure technology and at the end of the day it should be
modifiable and owned by customers. Now Mistral has two offerings, open
source one and commercial one, and the aim is to find out the business
model to sustain the open source development.
- The things that Mistral is best at 1) training model, and 2)
specializing models.
- The way they think about partnership strategy is to look at what
enterprises would need, where they were operating, where the developers
were operating, and figure out the channels that would facilitate
adoption and spread. To be a multiplatform solution and to replicate the
solution to different platforms is a strategy that Mistral is
following.
- There is still an efficiency upper bound to be pushed. Other than
compute to spend on pre-training, there is still research to do on
improving model efficiency and strength. On architecture side, we can be
more efficient than plain Transformer which spends same amount of
compute on every token. Mistral is making model faster. By making model
faster, we open up a lot of applications that involve an LLM as a basic
brick and then we can figure out how to do planning, explorations, etc.
By increasing efficiency, we open up areas of research.
- Meta has more GPUs than Mistral do. But Mistral has a good
concentration of GPU (number of GPU per person). This is the way to be
as efficient as possible to come up with creative ways of training
models. Also unit economics need to be considered to make sure that
\(\$1\) that you spend on training
compute eventually accrues to more than \(\$1\) revenue.
- Transformer is not an optimal architecture. It’s been out there for
7 years now. Everything is co-adapted to it such as training methods,
debug methods, the algorithms, and hardware. It’s challenging to find a
better one and also beat the baseline. But there are a lot of research
on modification of attention to boost memory efficiencies and a lot of
things can be done in that direction and similar directions.
- About AI regulations and EU AI Act, Arthur states that it does not
solve the actual problem of how to make AI safe. Because making AI safe
is a hard problem (stochastic model), different from the way we evaluate
software before. It’s more like a product problem rather than a
regulation problem. We need to rethink continuous integration,
verifications, etc and make sure everything is happening as it should
be.
- Mistral recently released Le Chat to help enterprise start
incorporating AI. It gives an assistant that is contextualized on their
enterprise data. It’s a tool to be closer to the end user to get
feedback for the developer platform and also a tool to get the
enterprise into GenAI.
Open Source AI is AI we can Trust — with Soumith Chintala of
Meta AI [Link]
Synthetic data is the next rage of LLM. Soumith pointed out that
synthetic data is where we as humans already have good symbolic models
off, we need to impart that knowledge to neural networks, and we figured
out the synthetic data is a vehicle to impart this knowledge to it.
Related to synthetic data but in an unusual way, there is new research
on distilling GPT-4 by creating synthetic data from GPT-4, creating mock
textbooks inspired by Phi-2 and then fine tuning open source models like
Lambda.
Open source means different things to different people and we haven’t
had a community norm definition yet at this very early stage of LLM.
When being asked about open source, people in this field are used to
highlight the definition of it in advance. In the open source topic,
Soumith pointed out that the most beneficial value of open is it makes
the distribution very wide and available with no friction so that people
can do transformative things in a way that is very accessible.
Berkshire Hathaway 2024 Annual Meeting Movie: Tribute to
Charlie Munger [Link]
First year that the annual meeting movie is made public. First year
that the annual meeting is without Charlie. Already started to miss his
jokes.
I think the reason why the car could have been completely
reimagined by Apple is that they have a level of credibility and trust
that I think probably no other company has, and absolutely no other tech
company has. I think this was the third Steve Jobs story that I left out
but in 2001, I launched a 99 cent download store and Steve Jobs just ran
total circles around us, but the reason he was able to is he had all the
credibility to go to the labels and get deals done for licensing music
that nobody could get done before. I think that is an example of what
Apple’s able to do which is to use their political capital to change the
rules. So if the thing that we could all want is safer roads and
autonomous vehicles, there are regions in every town and city that could
be completely converted to level 5 autonomous zones. If I had to pick
one company that had the credibility to go and change those rules, it’s
them. Because they could demonstrate that there was a methodical safe
approach to doing something. So the point is that even in these
categories that could be totally reimagined, it’s not for a lack of
imagination, again it just goes back to a complete lack of will. I
understand because if you had 200B dollars of capital on your balance
sheet, I think it’s probably easy to get fat and lazy. - Chamath
Palihapitiya
― In conversation with Sam Altman — All-In Podcast
[Link]
If you are a developer, the key thing to understand is where does
model innovation end and your innovation begin, because if you get that
wrong you will end up doing a bunch of stuff that the model will just
obsolete in a few months. - David Sacks
The incentive for these folks is going to be push this stuff into
the open source. Because if you solve a problem that’s operationally
necessary for your business but it isn’t the core part of your business,
what incentive do you have to really keep investing in this for the next
5 to 10 years to improve it. You are much better off release it in the
open source, let the rest of the community take it over so that it’s
available to everybody else, otherwise you are going to be stuck
supporting it, and then if and when you ever wanted to switch out a
model, GPT-4o, Claude, Llama, it’s going to be costly. The incentive to
just push towards open source in this market if you will is so much
meaningful than any other market. - Chamath Palihapitiya
I think the other thing that is probably true is a big measure at
Google on the search page in terms of search engineer performance was
the bounceback rate, meaning someone does a search, they go off to
another site and they come back because they didn’t get the answer they
wanted. Then one box launched which shows a short answer on the top,
which basically keeps people from having a bad search experience,
because they get the result right away. So a key metric is they are
going to start to discover which vertical searches will provide the user
a better experience than them jumping off to a third party page to get
the same content. And then they will be able to monetize that content
that they otherwise were not participating in the monetization of. So I
think the real victim in all this is that long tale of content on the
internet that probably gets cannibalized by the snippet one box
experience within the search function. And then I do think that the
revenue per search query in some of those categories actually has the
potential to go up not down. You keep people on the page so you get more
search volume there, you get more searches because of the examples you
gave. And then when people do stay, you now have the ability to better
monetize that particular search query, because you otherwise would have
lost it to the third party content page. Keeping more of the experience
integrated they could monetize the search per query higher and they are
going to have more queries, and then they are going to have the quality
of the queries go up. Going back to our earlier point about precision vs
accuracy, my guess is there’s a lot of hedge fund type folks doing a lot
of this Precision type of analysis trying to break apart search queries
by vertical and try to figure out what the net effect will be of having
better AI driven box and snippets. And my guess is that is why there is
a lot of buying activity happening. I can tell you Meta and Amazon do
not have an Isomorphic Lab and Waymo sitting inside their business, that
suddenly pops to a couple hundred billion of market cap and Google does
have a few of those. - David Friedberg
One thing I would say about big companies like Google or
Microsoft is that the power of your monopoly determines how many
mistakes you get to make. So think about Microsoft completely missed
iPhone, remember they screwed up the whole smartphone era and it didn’t
matter. Same thing here with Google, they completely screwed up AI. They
invented the Transformer, completely missed LLMs. Then they had that
fiasco where they have black George Washington. It doesn’t matter, they
can make 10 mistakes but their monopoly is so strong, that they can
finally get it right by copying the innovator, and they are probably
going to be come 5T dollar company. - David Sacks
― GPT-4o launches, Glue demo, Ohalo breakthrough,
Druckenmiller’s bet, did Google kill Perplexity? — All-In
Podcast [Link]
Great conversations and insightful discussions as usual. Love it.
When you are over earning so massively, the rational thing to do
for other actors in the arena is to come and attack that margin, and
give it to people for slightly cheaper slightly faster slightly better
so you can take share. So I think what you’re seeing and what you will
see even more now is this incentive for Silicon Valley who has been
really reticent to put money into chips, really reticent to put money
into hardware. They are going to get pulled into investing this space
because there is no choice. - Chamath Palihapitiya
Why? It’s not that intel was a worse company, but it’s that
everything else caught up. And the economic value went to things that
sat above them in the stack, then it want to Cisco for a while right,
then after Cisco, it went to the browser companies for a little bit,
then it went to the app companies, then it went to the device companies,
then it went to the mobile companies. So you see this natural tendency
for value to push up the stack over time. For AI, we’ve done the step
one which is now you’ve given all this value to NVIDIA and now we are
going to see it being reallocated. - Chamath Palihapitiya
The reason why they are asking these questions is that if you go
back to the doom dot come boom in 1999, you can see that Cisco had this
incredible run. And if you overlay the stock price of Nvidia, it seems
to be following that same trajectory. And what happened with Cisco is
that when the doc come crash came in 2000, Cisco stock lost a huge part
of its value. Obviously Cisco is still around today and it’s a valuable
company, but it just hasn’t ever regained the type of market cap it had.
The reason this happened is because Cisco got commoditized. So the
success and market cap of that company attracted a whole bunch of new
entrance and they copied Cisco’s products until they were total
commodities. So the question is whether that happened to Nvidia. I think
the difference here is that at the end of the day Network equipment
which Cisco produced was pretty easy to copy, whereas if you look at
Nvidia, these GPU cores are really complicated to make. So it’s a much
more complicated product to copy. And then on top of that, they are
already in the R&D cycle for the next chip. So I think you can make
the case that Nvidia has a much better moat than Cisco. - David
Sacks
I think Nvidia is going to get pulled into competing directly
with the hyperscalers. So if you were just selling chips, you probably
wouldn’t, but these are big bulky actual machines, then all of a sudden
you are like well why don’t I just create my own physical plant and just
stack these things, and create racks and racks of these machines. It’s
not a far stretch especially because Nvidia actually has the software
interface that everybody uses which is CUDA. I think it’s likely that
Nvidia goes on a full frontal assault against GCP and Amazon and
Microsoft. That’s going to really complicate the relationship that those
folks have with each other, but I think it’s inevitable because how do
you defend an enormously large market cap, you are forced to go into
businesses that are equally lucrative. Now if I look inside of compute
and look at the adjacent categories, they are not going to all of a
sudden start a competitor to TikTok or a social network, but if you look
at the multi hundred billion revenue businesses that are adjacent to the
markets that Nvidia enables, the most obvious ones are the hyperscalers.
So they are going to be forced to compete otherwise their market cap
will shrink and I don’t think they want that, and then it’s going to
create a very complicated set of incentives for Microsoft and Google and
Meta and Apple and all the rest. And that’s also going to be an
accelerant, they are going to pump so much money to help all of these
upstarts. - Chamath Palihapitiya
Economy is bad without recognizing that it is an inflationary
experience whereas economists use the definition of “economic growth”
being gross product, and so if gross product or gross revenue is going
up they are like oh the economy is healthy we are growing. But the truth
is we are funding that growth with leverage at the national level the
federal level and at the household a domestic level. We are borrowing
money to inflate the revenue numbers , and so the GDP goes up but the
debt is going higher, and so the ability for folks to support themselves
and buy things that they want to buy and continue to improve their
condition in life has declined if things are getting worse… The average
American’s ability to improve their condition has largely been driven by
their ability to borrow not by their earnings. - David
Friedberg
Scarlett Johansson vs OpenAI, Nvidia’s trillion-dollar
problem, a vibecession, plastic in our balls [Link]
It’s a fun session and it made my day :). Great discussions about
Nvidia’s business, America’s negative economic sentiment, harm of
plastics, etc.
Building with OpenAI What’s Ahead [Link]
Papers and Reports
Large Language Models: A Survey [Link]
This is a must-read paper if you would like to have a comprehensive
overview of SOTA LLMs, technical details, applications, datasets,
benchmarks, challenges, and future directions.
Little Guide to Building Large Language Models in 2024 -
HuggingFace [Link]
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial
Text Analytics? A Study on Several Typical Tasks [Link]
Bloomberg fine-tuned GPT-3.5 on their financial data only to find
that GPT-4 8k, without specialized finance fine-tuning, beat it on
almost all finance tasks. So there is really a moat? Number of
parameters matters and data size matters, and they all require compute
and money.
Jamba: A Hybrid Transformer-Mamba Language Model [Link] [Link]
Mamba paper
has been rejected while fruits are reaped fast: MoE-Mamba, Vision Mamba, and Jamba.
It’s funny to see the asymmetric impact in ML sometimes, e.g.
FlashAttention has <500 citations and is used everywhere. Github
repos used by 10k+ has <100 citations, etc.
KAN: Kolmogorov-Arnold Networks [Link] [authors-note]
This is a mathematically beautiful idea. The main difference between
traditional MLP and KAN is that KAN has learnable activation function on
weights, so all weights in KAN are non-linear. KAN outperforms MLP in
accuracy and interpretability. Whether in the future KAN is able to
replace MLP depends on whether there could be suitable learning
algorithms like SGD, AdamW, etc and whether it will be GPU friendly.
The Platonic Representation Hypothesis [Link]
Interesting paper to read if you like philosophy. This paper argues
that there is a platonic representation as a result of convergence of AI
models towards a shared statistical model of reality. They show that
there is a growing similarity in data representation across different
model architectures, training objectives, and data modalities, as the
model size, data size, and task diversity are growing. They also
proposed three hypothesis for the representation convergence: 1) The
multitask scaling hypothesis, 2) The capacity hypothesis, and 3) The
simplicity bias hypothesis. And it definitely worths reading the
counterexamples and limitations.
Frontier Safety Framework - Google DeepMind [Link]
DeepSeek-V2: A Strong, Economical, and Efficient
Mixture-of-Experts Language Model [Link]
One main improvement: Multi-head latent attention via compressed
latent KV requires smaller amount of KV cache per token but achieves
stronger performance. Heads can be compressed differently (taking
different portion of compressed latent states), and keys and values can
be compressed differently.
What matters when building vision-language models
[Link]
The Unreasonable Ineffectiveness of the Deeper
Layers [Link]
RecurrentGemma: Moving Past Transformers for Efficient Open
Language Models [Link]
This paper published by Google DeepMind proposes language model
called RecurrentGemma
that can match or exceed the performance of transformer-based models
while being more memory efficient.
Towards Responsible Development of Generative AI for
Education: An Evaluation-Driven Approach - Google’s Tech Report of
LearnLM [Link]
Chameleon: Mixed-Modal Early-Fusion Foundation
Models [Link]
This paper published by Meta proposed a mixed model which uses
Transformer architecture under the covers but applies some innovations
such as query-key normalization to fix the imbalance between the text
and image tokens and other innovations as well.
Simple and Scalable Strategies to Continually Pre-train Large
Language Models [Link]
Tricks for successful continued pretraining:
- Re-warming and re-decaying the learning rate.
- Adding a small portion (e.g., 5%) of the original pretraining data
(D1) to the new dataset (D2) to prevent catastrophic forgetting.
Note that smaller fractions like 0.5% and 1% were also effective.
Cautious about their validity on model with larger sizes.
Is DPO Superior to PPO for LLM Alignment? A Comprehensive
Study [Link]
Algorithmic Progress in Language Models [Link]
Physics of Language Models: Part 3.3, Knowledge Capacity
Scaling Laws [Link]
Efficient Multimodal Large Language Models: A Survey
[Link]
Good overview of multimodal LLMs.
Financial Statement Analysis with Large Language
Models [Link]
LoRA Learns Less and Forgets Less [Link]
Lessons from the Trenches on Reproducible Evaluation of
Language Models [Link]
Challenges and best practices in evaluating LLMs.
Agent Planning with World Knowledge Model [Link]
GitHub Repo
Google Research Tune Playbook - GitHub [Link]
ML Engineering - GitHub [Link]
LLM from Scratch [Link]
Prompt Engineering Guide [Link] [Link]
ChatML + chat templates + Mistral v3 7b full example
[Link]
Finetune pythia 70M [Link]
Llama3 Implemented from Scratch [Link]
News
Intel Inside Ohio [Link]
Intel Ohio One Campus Video Rendering [Link]
Intel Corp has committed \(\$28\)B
to build a “mega fab” called Ohio One which could be the biggest chip
factory on Earth. The Biden administration has agreed to provide Intel
with \(\$19.5\)B in loans and grants to
support finance the project.
EveryONE Medicines: Designing Drugs for Rare Diseases, One at
a Time [Link]
Startup EveryONE Medicine aims to develop drugs designed based on
genetic information for individual children who have rare,
life-threatening neurological diseases. Since the number of patients
with diseases caused by rare mutation is significant, the market share
is large if EveryONE can scale its process. Although the cost won’t be
the same as a standard drugmaker that runs large clinical trials, the
challenge is safety without a standard clinical-testing protocol. To be
responsible to patients, the initial drugs will have a temporary effect
and a wide therapeutic window, so the potential toxicity will be
minimized or stopped if there is.
Voyager 1’s Communication Malfunctions May Show the
Spacecraft’s Age [Link]
In Nov 2023, NASA’s over 46-year-old Voyager 1 spacecraft started
sending nonsense to Earth. Voyager 1 was initially intended to study
Jupiter and Saturn and was built to survive only 5 years of flight,
however the trajectory was forged further and further into space and so
the mission converted from a two-planet mission to an interstellar
mission.
In Dec 2023, the mission team restarted the Flight Data Subsystem
(FDS) but failed to return the subsystem to functional state. On Mar 1
2023, they sent a command “poke” to the probe and received a response on
Mar 3. On Mar 10, the mission team finally determined the response
carried a readout of FDS memory. By comparing the readout with those
received before the issue, the team confirmed that 3% of FDS memory was
corrupted. On Apr 4, the team concluded the affected code was contained
on a computer chip. To solve the problem, the team decided to divide
these affected code into smaller sections and to insert those smaller
sections into other operative places in the FDS memory. During Apr
18-20, the team sent out the orders to move some of the affected code
and received responses with intelligible systems information.
Editing the Human Genome with AI [Link]
Berkeley based startup Profluent Bio used an AI based protein
language model to create and train on an entirely new library of Cas
proteins that do not exist in nature today and eventually find one
called ‘OpenCRISPR-1’ that is able to replace or improve the ones that
are on the market today. The goal of this AI model is to learn what
sequence of DNA generated what structure of protein that’s really good
at gene editing. The new library of Cas proteins is created by
simulation of trillions of letters. They made ‘OpenCRISPR-1’ publicly
available under an open source license so anyone can use this particular
Cas protein.
Sony and Apollo in Talks to Acquire Paramount [Link]
Paramount’s stock declined 44% in 2022 and another 12% in 2023. It’s
experiencing declining revenue as consumers abandon traditional pay-TV
and it’s losing streaming business. Berkshire sold its entire Paramount
shares in March 2023 and soon Sony Pictures and Apollo Globals
Management reached out to Paramount board expressing interest of
acquisition. Now Paramount decided to open negotiation with them after
exclusive talks with Hollywood studio Skydance. This deal would break
the Paramount and potentially transform the media landscape if
successful. Otherwise an office of the CEO as the replacement of CEO Bob
Bakish will be preparing a long term plan for the company.
AlphaFold 3 predicts the structure and interactions of all of
life’s molecules [Link]
Previously, Google DeepMind AlphaFold project took 3D images of
proteins and the DNA sequence that codes for those proteins and then
they built a predictive model that predicted the 3D structure of protein
base on DNA sequence. What is difference in AlphaFold 3 is that all
small molecules are included. The way how small molecules are bind
together with the protein is part of the predictive model. This is a
breakthrough in that off target effect could be minimized by taking
consideration of other molecules’ interactions in the biochemistry
environment. Google has a drug development subsidiary called Isomorphic
Labs. They kept all of IP for AlphaFold 3. They published a web viewer
for non-commercial scientists to do fundamental research but only
Isomorphic Labs can make it for commercial use.
Introducing GPT-4o and making more capabilities available for
free in ChatGPT [Link]
I missed the live announcement but watched the recording. GPT-4o is
amazing.
One of the interesting technical difference made is tokenizer delta.
GPT-4 and GPT-4-Turbo both had a tokenizer with a vocabulary of 100k
tokens. GPT-4o has a tokenizer with 200k tokens to work better for
native multimodality and multilingualism. The more tokens the more
efficient in generating characters.
“Our goal is to make it effortless for people to go anywhere and
get anything,” said Dara Khosrowshahi, CEO of Uber. “We’re excited that
this new strategic partnership with Instacart will bring the magic of
Uber Eats to even more consumers, drive more business for restaurants,
and create more earnings opportunities for couriers.”
― Uber
Eats to Power Restaurant Delivery on Instacart [Link]
Project Astra: Our vision for the future of AI
assistants [Link]
Google Keynote (Google I/O 24’) [Link]
This developer conference is about Google’s AI related product
updates. Highlighted features: 1) AI Overview for search 2) Ask Photos,
3) 2M context window, 4) Google Workspace, 5) NotebookLM, 6) Project
Astra, 7) Imagen 3, 8) Music AI Sandbox, 9) Veo, 10) Trillium TPU, 11)
Google Serach, 12) Asking Questions with Videos, 13) Gemini interacting
with Gmail and data, 14) Gemini AI Teammate, 15) Gemini App, and
upgrades, 16) Gemini Trip Planning.
Leike went public with some reasons for his resignation on Friday
morning. “I have been disagreeing with OpenAI leadership about the
company’s core priorities for quite some time, until we finally reached
a breaking point,” Leike wrote in a series of posts on X. “I believe
much more of our bandwidth should be spent getting ready for the next
generations of models, on security, monitoring, preparedness, safety,
adversarial robustness, (super)alignment, confidentiality, societal
impact, and related topics. These problems are quite hard to get right,
and I am concerned we aren’t on a trajectory to get there.”
― OpenAI created a team to control ‘superintelligent’ AI —
then let it wither, source says [Link]
Other News:
Encampment Protesters Set Monday Deadline for Harvard to
Begin Negotiations [Link]
Israel Gaza war: History of the conflict explained
[Link]
Cyber Stuck: First Tesla Cybertruck On Nantucket Has A Rough
Day [Link]
Apple apologizes after ad backlash [Link]
Apple nears deal with OpenAI to put ChatGPT on iPhone:
Report [Link]
[Link]
Reddit announces another big data-sharing AI deal — this time
with OpenAI [Link]
Apple Will Revamp Siri to Catch Up to Its Chatbot
Competitors [Link]
OpenAI strikes deal to bring Reddit content to
ChatGPT [Link]