Di's Blog

Understanding LoRA

Posted on 2024-12-25

I finally got time to have some deep dives. Happy Christmas!

RAM Usage During Training

Training large-scale machine learning models e.g. LLMs, requires significant compute resources. Here’s a breakdown of the possible memory usage (RAM) at various stages of the classic training process, based on the pseudocode below:

model = Model()
optimizer = Adam(model.parameters())

for batch, (X, y) in enumerate(dataloader):
    # Compute prediction and loss
    pred = model(X)
    loss = loss_fn(pred, y)

    # Backpropagation
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Key Components of Memory Usage

Model Parameters: These are the trainable weights of the model, which need to be stored in memory throughout the training process. The size is proportional to the number of parameters in the model.
Model Gradients: Gradients for each parameter are computed during backpropagation and stored temporarily for the optimizer to update the weights.
Optimizer States: Optimizers like Adam maintain additional states, including:
- First-order momentum: Tracks the moving average of gradients.
- Second-order momentum: Tracks the moving average of squared gradients.
- Both momentum terms have the same size as the model gradients.
Activations: Activation outputs from the forward pass are stored for use during backpropagation, where the Hessian matrix is multiplied with the activations. The memory required for activations can be substantial, especially as batch size increases. While the size of parameters, gradients, and optimizer states remains constant, activation memory scales directly with batch size.
Other Overheads: Temporary buffers and memory fragmentation during computation also contribute to RAM usage.

Memory Calculation Examples

Gradients and Parameters:

For 70B model, using 32-bit floating-point precision (FP32): \[ 70\times10^9\times4 \text{ bytes}\times2 =521.5\text{GM} \] This accounts for the weights and their corresponding gradients.
Optimizer State:

Adam optimizer requires two additional states (first and second-order momentum), each the same size as the gradients: \[ 70\times10^9\times4 \text{ byte}\times2 =521.5\text{GM} \]
Activations:

For 70B model with a hidden size of 8192, 80 layers, and FP32 precision, each token’s activation memory: \[ 8192\times80\times4\times12 \text{ bytes/token}=30\text{ MB/token} \]

Simple Strategies for Reducing Memory Usage

Activation Checkpointing: Instead of storing all activation outputs, recompute activations during backpropagation as needed. This significantly reduces activation memory at the cost of additional compute time.
Mixed Precision Training (FP16): Use 16-bit floating-point precision (FP16) instead of FP32 for model weights, gradients, and activations. This halves the memory requirements without substantial accuracy loss when done correctly.

LoRA

Adapters

The original adapter was introduced in 2019 in the paper "Parameter-Efficient Transfer Learning for NLP". It's a small, additional module added to a pre-trained model to adapt it to a new task without significantly changing the original model parameters.

Adapters generally reduce training latency compared to full fine-tuning because only a small number of parameters (those within the adapter modules) are updated during training. This reduction in trainable parameters leads to lower computational overhead and faster convergence in many cases. Additionally, adapters allow for larger batch sizes due to reduced memory usage, which can further accelerate training

However, adapter layers increase inference latency because they are added sequentially and cannot be parallelized. This issue becomes more pronounced with small batch sizes or when using sharded models, such as GPT-2. Techniques like layer pruning or multi-task settings can mitigate but not completely eliminate this latency.

As shown in the experiment results below, inference latent can be significant (Source: LoRA paper):

LoRA Basics

LoRA (Low-Rank Adaptation) was introduced by a Microsoft team in 2021 in the paper LoRA: Low-Rank Adaptation of Large Language Models. The main idea of LoRA is to enable efficient fine-tuning of large pre-trained models by introducing low-rank trainable matrices into the model’s architecture, while keeping the original model weights frozen. This approach significantly reduces the number of trainable parameters and computational requirements compared to full fine-tuning, without compromising performance.

LoRA approximates weight updates in neural networks using low-rank matrix factorization. Instead of updating the full weight matrix $W$ , it introduces two smaller trainable matrices $A$ and $B$ with size $(r \times d)$ and $(d \times r)$. These matrices have much fewer parameters, as their rank $r$ is much smaller than the dimensions of $W$. Instead of training $\Delta W$, LoRA trains the parameters in $A$ and $B$. This can be written in formula: \[ h=W_0x + \Delta Wx = W_0x + BAx \] where $W_0$ is original prerained weight matrix in size $(d\times d)$ which is frozen during training; $\Delta W$ is in $(d \times d)$ as well computed by $BA$. $x$ is a new input with size $(1 \times d)$.

At the start of the training process, the matrix $ A $ is randomly initialized following a normal distribution $\mathcal{N}(0, \sigma^2)$, while the matrix $ B $ is initialized as a zero matrix. In the initial round, this setup results in $ BA = 0 $, leading to $ h = W_0x $. This initialization strategy ensures stability by preventing significant deviations of $ W_0 $ from its original state.

LoRA is a groundbreaking method with a lot of benefits:

Parameter Efficiency: By training only the low-rank matrices, LoRA reduces the number of updated parameters resulting in lower memory usage and faster training.
Frozen Pre-trained Weights: The original pre-trained weights remain unchanged, preserving the model’s general-purpose knowledge and avoiding catastrophic forgetting.
No Inference Latency Overhead: Unlike adapters, LoRA does not add additional layers to the model. The low-rank matrices can be merged back into the original weight matrix after fine-tuning, ensuring no additional inference latency.
Versatility: LoRA can be applied to various architectures (e.g. transformers) and tasks, making it a flexible solution for adapting large models like GPT-3 or RoBERTa to specific use cases.

LoRA Usage

The Microsoft developers of LoRA created a Python package called loralib to facilitate the use of LoRA. With this library, any linear layer implemented as nn.Linear() can be replaced by lora.Linear(). This is possible because LoRA is designed to work with any layer involving matrix multiplication. The lora.Linear() module introduces a pair of low-rank adaptation matrices, which are used to modify the original weight matrix by applying a low-rank decomposition.

# ===== Before =====
# layer = nn.Linear(in_features, out_features)
# ===== After ======
import loralib as lora
# Add a pair of low-rank adaptation matrices with rank r=16
layer = lora.Linear(in_features, out_features, r=16)

Before training the model, all non-lora matrix should be fixed and only LoRA matrices should be set as trainable. Training loops can run as usual.

import loralib as lora
model = BigModel()
# This sets requires_grad to False for all parameters without the string "lora_" in their names
lora.mark_only_lora_as_trainable(model)
# Training loop
for batch in dataloader:
   ...

When saving model checkpoints during LoRA fine-tuning, only the LoRA-specific parameters need to be saved, not the entire large pre-trained model. This results in significantly smaller checkpoint files and more efficient storage.

# ===== Before =====
# torch.save(model.state_dict(), checkpoint_path)
# ===== After =====
torch.save(lora.lora_state_dict(model), checkpoint_path)

Implementation of LoRA - lora.Linear()

Let's take a deep dive into the lora.Linear() source code:

The lora.Linear class builds upon torch.nn.Linear(). It retains the original weight matrix $ W $ as initialized in nn.Linear.__init__(self, in_features, out_features), and introduces two additional LoRA matrices: self.lora_A and self.lora_B. The matrix self.lora_A has dimensions of $ (r, ) $, while self.lora_B has dimensions of $ (, r) $. These matrices are used to adapt the original weight matrix through low-rank decomposition.

class Linear(nn.Linear, LoRALayer):
    # LoRA implemented in a dense layer
    def __init__(
        self, 
        in_features: int, 
        out_features: int, 
        r: int = 0, 
        lora_alpha: int = 1, 
        lora_dropout: float = 0.,
        fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
        merge_weights: bool = True,
        **kwargs
    ):
        nn.Linear.__init__(self, in_features, out_features, **kwargs)
        LoRALayer.__init__(self, r=r, lora_alpha=lora_alpha, lora_dropout=lora_dropout,
                           merge_weights=merge_weights)

        self.fan_in_fan_out = fan_in_fan_out
        # Actual trainable parameters
        if r > 0:
            self.lora_A = nn.Parameter(self.weight.new_zeros((r, in_features)))
            self.lora_B = nn.Parameter(self.weight.new_zeros((out_features, r)))
            self.scaling = self.lora_alpha / self.r
            # Freezing the pre-trained weight matrix
            self.weight.requires_grad = False
        self.reset_parameters()
        if fan_in_fan_out:
            self.weight.data = self.weight.data.transpose(0, 1)

In the forward() function, it implements $h=W_0x + \Delta Wx = W_0x+ BAx$.

There is a flag variable called self.merge which is use to flag whether it's doing inference or training. Recall that the original weight matrix remaining unchanged during LoRA training is a key feature of the LoRA - pre-trained weights are freezed and instead small, low-rank matrices are trained to approximate updates.

During inference, if merge_weights is set to True, the low-rank updates self.lora_B @ self.lora_A are added directly to the frozen pre-trained weights (self.weight). This avoids the need for separate computations of LoRA updates during forward passes, improving efficiency.
During training, if merge_weights is enabled and weights were previously merged, the updates are subtracted from self.weight to revert it to its original frozen state. This ensures that gradients are not incorrectly computed on the merged weights.

class Linear(nn.Linear, LoRALayer):
    # LoRA implemented in a dense layer
    def __init__(
        self, 
        in_features: int, 
        out_features: int, 
        r: int = 0, 
        lora_alpha: int = 1, 
        lora_dropout: float = 0.,
        fan_in_fan_out: bool = False, # Set this to True if the layer to replace stores weight like (fan_in, fan_out)
        merge_weights: bool = True,
        **kwargs
    ):

      ......
      
    def train(self, mode: bool = True):
        def T(w):
            return w.transpose(0, 1) if self.fan_in_fan_out else w
        nn.Linear.train(self, mode)
        if mode:
            if self.merge_weights and self.merged:
                # Make sure that the weights are not merged
                if self.r > 0:
                    self.weight.data -= T(self.lora_B @ self.lora_A) * self.scaling
                self.merged = False
        else:
            if self.merge_weights and not self.merged:
                # Merge the weights and mark it
                if self.r > 0:
                    self.weight.data += T(self.lora_B @ self.lora_A) * self.scaling
                self.merged = True    
                
			def forward(self, x: torch.Tensor):
        def T(w):
            return w.transpose(0, 1) if self.fan_in_fan_out else w
        if self.r > 0 and not self.merged:
            result = F.linear(x, T(self.weight), bias=self.bias)            
            result += (self.lora_dropout(x) @ self.lora_A.transpose(0, 1) @ self.lora_B.transpose(0, 1)) * self.scaling
            return result
        else:
            return F.linear(x, T(self.weight), bias=self.bias)
          
      ......

LangChain Common Practices

Posted on 2024-12-25

This is a collection of some common useful LangChain (v.0.3.3) practices based on my coding experience so far.

LLM Application Development Landscape

Nowadays, LLM applications can be classified into the following categories.

Simple LLM Calls

Applications where LLMs are used directly to answer questions or perform tasks without additional layers of complexity. The focus is on generating responses to prompts or queries. These are straightforward implementations, often used for tasks like content generation, question answering, or summarization.

Real-world examples:
- ChatGPT for Q&A: Users input questions, and the model directly generates answers.
- Copywriting Tools: Applications like Jasper AI create marketing content, blogs, or product descriptions based on user inputs.
Vectorstores (RAG)

Vectorstores are used in Retrieval-Augmented Generation (RAG) applications, where relevant information is retrieved from a database of embeddings (vectorized representations of text) to enhance the LLM's responses. This allows the LLM to work with domain-specific or proprietary knowledge not contained in its training data.

Real-world examples:
- Chatbots for Enterprises: A customer support chatbot retrieves relevant product documentation or FAQs stored in a vectorstore to provide accurate responses.
- Search-Augmented Systems: Google Bard integrates real-time information retrieval to provide up-to-date and contextually relevant responses.
Agents

Agents are LLM-driven systems that execute tasks autonomously or semi-autonomously based on input instructions. They can make decisions, interact with APIs, and manage workflows. Agents often use reasoning frameworks like ReAct (Reasoning and Acting) to decide what steps to take next.

Real-world examples:
- Zapier AI Assistant: Automates workflows by taking instructions, analyzing data, and executing API calls or actions across platforms.
- LangChain Agents: Used for multi-step tasks such as filling out forms, managing databases, or performing calculations.
Agents + Vectorstores

This combines the reasoning and decision-making capabilities of agents with the data retrieval abilities of vectorstores. These systems can autonomously fetch relevant knowledge from vectorstores and execute tasks, enabling advanced applications like AutoGPT. The integration provides both reasoning depth and domain-specific accuracy.

Real-world examples:
- AutoGPT: An open-source agent that can generate business plans by researching topics, retrieving relevant information, and autonomously completing subtasks.
- GPT Engineer: Helps developers by retrieving relevant programming resources and autonomously generating code, debugging, or improving software projects.

Chaining a Simple Prompt

from dotenv import load_dotenv
from langchain.prompts.prompt import PromptTemplate
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama
from langchain_core.output_parsers import StrOutputParser

load_dotenv()

# define information to be incorporated to the prompt template
information = """
    Elon Reeve Musk (/ˈiːlɒn/; EE-lon; born June 28, 1971) is a businessman and investor. 
    He is the founder, chairman, CEO, and CTO of SpaceX; angel investor, CEO, product architect and former chairman of Tesla, Inc.; owner, chairman and CTO of X Corp.; founder of the Boring Company and xAI; co-founder of Neuralink and OpenAI; and president of the Musk Foundation. 
    He is the wealthiest person in the world, with an estimated net worth of US$232 billion as of December 2023, according to the Bloomberg Billionaires Index, and $254 billion according to Forbes, primarily from his ownership stakes in Tesla and SpaceX.
    A member of the wealthy South African Musk family, Elon was born in Pretoria and briefly attended the University of Pretoria before immigrating to Canada at age 18, acquiring citizenship through his Canadian-born mother. 
    Two years later, he matriculated at Queen's University at Kingston in Canada. Musk later transferred to the University of Pennsylvania, and received bachelor's degrees in economics and physics. 
    He moved to California in 1995 to attend Stanford University. However, Musk dropped out after two days and, with his brother Kimbal, co-founded online city guide software company Zip2. 
    The startup was acquired by Compaq for $307 million in 1999, and, that same year Musk co-founded X.com, a direct bank. X.com merged with Confinity in 2000 to form PayPal.
    In October 2002, eBay acquired PayPal for $1.5 billion, and that same year, with $100 million of the money he made, Musk founded SpaceX, a spaceflight services company. 
    In 2004, he became an early investor in electric vehicle manufacturer Tesla Motors, Inc. (now Tesla, Inc.). He became its chairman and product architect, assuming the position of CEO in 2008. 
    In 2006, Musk helped create SolarCity, a solar-energy company that was acquired by Tesla in 2016 and became Tesla Energy. In 2013, he proposed a hyperloop high-speed vactrain transportation system. 
    In 2015, he co-founded OpenAI, a nonprofit artificial intelligence research company. 
    The following year, Musk co-founded Neuralink—a neurotechnology company developing brain–computer interfaces—and the Boring Company, a tunnel construction company. 
    In 2022, he acquired Twitter for $44 billion. He subsequently merged the company into newly created X Corp. and rebranded the service as X the following year. 
    In March 2023, he founded xAI, an artificial intelligence company.
"""

# create a prompt template
template = """
Given the information {information} about a person, please create:
1. A short summary
2. Two interesting facts about the person.
"""

# incorporate information into prompt
summary_prompt_template = PromptTemplate(
    input_variables=["information"], template=template
)

# create an llm
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")
# llm = ChatOllama(model="llama3")

# create a chain
chain = summary_prompt_template | llm | StrOutputParser()

# prompt the model
response = chain.invoke(input={"information": information})

print(response)

Parsing the Output with a Customized Format

Using PydanticOutputParser and user defined output data structure.

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field, validator
from langchain_openai import OpenAI

llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo")


# define your desired data structure
class Joke(BaseModel):
    setup: str = Field(description="question to set up a joke")
    punchline: str = Field(description="answer to resolve the joke")

    # add custom validation logic easily with Pydantic
    @validator("setup")
    def question_ends_with_question_mark(cls, field):
        if field[-1] != "?":
            raise ValueError("Badly formed question!")
        return field


# set up a parser + inject instructions into the prompt template.
parser = PydanticOutputParser(pydantic_object=Joke)

# create a prompt with query and instruction
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

#a query intended to prompt a language model to populate the data structure
prompt_and_model = prompt | llm
output = prompt_and_model.invoke({"query": "Tell me a joke."})
parser.invoke(output)

Data Ingestion to Pinecone Vectorstore (RAG)

Using TextLoader, CharacterTextSplitter, OpenAIEmbeddings, and Pinecone vector database.

Please refer to LangChain text splitter techniques ;text Split by character; text embedding models for more details.

import os
from dotenv import load_dotenv
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore

load_dotenv()

# load data
loader = TextLoader("doc1.txt")
document = loader.load()

# split data
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(document)

# create embedding
embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get("OPENAI_API_KEY"))

# ingest data to vector db
PineconeVectorStore.from_documents(texts, embeddings, index_name=os.environ['INDEX_NAME'])print("finish")

Data Retrieval from Pinecone Vectorestore (RAG)

langchain-ai/retrieval-qa-chat is a ChatPromptTemplate ensuring answers are based solely on the context.

embeddings = OpenAIEmbeddings()
llm = ChatOpenAI()

# build user query prompt
query = "what is Pinecone in machine learning?"
chain = PromptTemplate.from_template(template=query) | llm

# store query prompt to vector db
vectorstore = PineconeVectorStore(
    index_name=os.environ["INDEX_NAME"], embedding=embeddings
)

# create a retrieval qa prompt
retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")

# create a prompt chain
combine_docs_chain = create_stuff_documents_chain(llm, retrieval_qa_chat_prompt)

# create retrieval chain
retrival_chain = create_retrieval_chain(
    retriever=vectorstore.as_retriever(), combine_docs_chain=combine_docs_chain
)

# execute the retrieval
result = retrival_chain.invoke(input={"input": query})

print(result)

Customized retrieval prompt:

RunnablePassthrough is used to pass through arguments from one step to the next. It allows us to pass on the user's question to the prompt and model.

import os

from dotenv import load_dotenv
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_pinecone import PineconeVectorStore
from langchain_core.runnables import RunnablePassthrough

load_dotenv()

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

embeddings = OpenAIEmbeddings()
llm = ChatOpenAI()

query = "what is Pinecone in machine learning?"
chain = PromptTemplate.from_template(template=query) | llm

vectorstore = PineconeVectorStore(
    index_name=os.environ["INDEX_NAME"], embedding=embeddings
)


template = """
Use the following pieces of context to answer the question at the end. 
If you don't know the answer, you can say "I don't know". Don't make up an answer.
Use three sentences maximum and keep the answer short and to the point.
Always say "thanks for the question" before answering the question.

{context}

Question: {question}

Answer:
"""

custom_rag_prompt = PromptTemplate.from_template(template=template)
rag_chain = (
    {"context": vectorstore.as_retriever() | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | llm
)

res = rag_chain.invoke(query)
print(res)

Chat with a PDF (RAG with FAISS)

Using PyPDFLoader, CharacterTextSplitter, OpenAIEmbeddings, and FAISS local vector database.

Please refer to PDF loader ; Langchain FAISS vectorstore; FAISS for more details.

import os
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import CharacterTextSplitter
from langchain_openai import OpenAIEmbeddings, OpenAI
from langchain_community.vectorstores import FAISS
from langchain.chains.retrieval import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain import hub

pdf_path = "react.pdf"
loader = PyPDFLoader(file_path=pdf_path)
documents = loader.load()
text_splitter = CharacterTextSplitter(
    chunk_size=1000, chunk_overlap=30, separator="\n"
)
docs = text_splitter.split_documents(documents=documents)

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(docs, embeddings)
vectorstore.save_local("faiss_index_react")

new_vectorstore = FAISS.load_local(
    "faiss_index_react", embeddings, allow_dangerous_deserialization=True
)

retrieval_qa_chat_prompt = hub.pull("langchain-ai/retrieval-qa-chat")
combine_docs_chain = create_stuff_documents_chain(
    OpenAI(), retrieval_qa_chat_prompt
)
retrieval_chain = create_retrieval_chain(
    new_vectorstore.as_retriever(), combine_docs_chain
)

res = retrieval_chain.invoke({"input": "Give me the gist of ReAct in 3 sentences"})
print(res["answer"])

Create a ReAct Agent

Using langchain-ai/react-agent-template to build a ReAct prompt.

from dotenv import load_dotenv
from langchain import hub
from langchain_openai import ChatOpenAI
from langchain.agents import create_react_agent, AgentExecutor
from langchain_experimental.tools import PythonREPLTool

load_dotenv()

# create an instruction
instructions = """You are an agent designed to write and execute python code to answer questions.
You have access to a python REPL, which you can use to execute python code.
If you get an error, debug your code and try again.
Only use the output of your code to answer the question. 
You might know the answer without running any code, but you should still run the code to get the answer.
If it does not seem like you can write code to answer the question, just return "I don't know" as the answer.
"""

# use an ReAct prompt template
base_prompt = hub.pull("langchain-ai/react-agent-template")
prompt = base_prompt.partial(instructions=instructions)

# make use a tool to execute python code
tools = [PythonREPLTool()]

# define a ReAct agent
agent = create_react_agent(
    prompt=prompt,
    llm=ChatOpenAI(temperature=0, model="gpt-4o-mini"),
    tools=tools,
)

# create the ReAct agent executor
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

# execute the ReAct agent
agent_executor.invoke(
    input={
        "input": """generate and save in current working directory a QRcode
                    that point to https://jokerdii.github.io/di-blog, you have qrcode package installed already"""
    }
)

agent_executor.invoke(
    input={
        "input": """generate and save in current working directory a synthetic csv dataset 
                    with 1000 rows and 2 columns that is about Amazon product description and price."""
    }
)

Using an LangChain Agent for Tasks

create_csv_agent is an AgentExecutor object able to perform operations in CSVs.

from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_experimental.agents.agent_toolkits import create_csv_agent


load_dotenv()

# make use a CSV agent from langchain
csv_agent = create_csv_agent(
    llm=ChatOpenAI(temperature=0, model="gpt-4o-mini"),
    path="episode_info.csv",
    verbose=True,
)

# execute the agent
csv_agent.invoke(
    input={"input": "how many columns are there in file episode_info.csv"}
)
csv_agent.invoke(
    input={
        "input": "print the seasons by ascending order of the number of episodes they have."
    }
)

Creating an ReAct Agent with Multiple Agents Provided as Tools

from typing import Any

from dotenv import load_dotenv
from langchain import hub
from langchain_core.tools import Tool
from langchain_openai import ChatOpenAI
from langchain.agents import (
    create_react_agent,
    AgentExecutor,
)
from langchain_experimental.tools import PythonREPLTool
from langchain_experimental.agents.agent_toolkits import create_csv_agent


load_dotenv()


instructions = """You are an agent designed to write and execute python code to answer questions.
You have access to a python REPL, which you can use to execute python code.
You have qrcode package installed
If you get an error, debug your code and try again.
Only use the output of your code to answer the question. 
You might know the answer without running any code, but you should still run the code to get the answer.
If it does not seem like you can write code to answer the question, just return "I don't know" as the answer.
    """

base_prompt = hub.pull("langchain-ai/react-agent-template")
prompt = base_prompt.partial(instructions=instructions)

# define python agent
tools = [PythonREPLTool()]
python_agent = create_react_agent(
    prompt=prompt,
    llm=ChatOpenAI(temperature=0, model="gpt-4-turbo"),
    tools=tools,
)

python_agent_executor = AgentExecutor(agent=python_agent, tools=tools, verbose=True)

# define CSV agent
csv_agent_executor: AgentExecutor = create_csv_agent(
    llm=ChatOpenAI(temperature=0, model="gpt-4"),
    path="episode_info.csv",
    verbose=True,
)

#### router grand agent

# list agent tools
def python_agent_executor_wrapper(original_prompt: str) -> dict[str, Any]:
    return python_agent_executor.invoke({"input": original_prompt})

tools = [
    Tool(
        name="Python Agent",
        func=python_agent_executor_wrapper,
        description="""useful when you need to transform natural language to python and execute the python code,
                        returning the results of the code execution
                        DOES NOT ACCEPT CODE AS INPUT""",
    ),
    Tool(
        name="CSV Agent",
        func=csv_agent_executor.invoke,
        description="""useful when you need to answer question over episode_info.csv file,
                        takes an input the entire question and returns the answer after running pandas calculations""",
    ),
]

# create grand ReAct agent
prompt = base_prompt.partial(instructions="")
grand_agent = create_react_agent(
    prompt=prompt,
    llm=ChatOpenAI(temperature=0, model="gpt-4-turbo"),
    tools=tools,
)
grand_agent_executor = AgentExecutor(agent=grand_agent, tools=tools, verbose=True)

# execute grand ReAct agent and print output
print(
    grand_agent_executor.invoke(
        {
            "input": "which season has the most episodes?",
        }
    )
)

print(
    grand_agent_executor.invoke(
        {
            "input": "Generate and save in current working directory 15 qrcodes that point to `www.udemy.com/course/langchain`",
        }
    )
)

Function / Tool Calling

LangChain provides a standardized interface for connecting tools to models.

ChatModel.bind_tools(): a method for attaching tool definitions to model calls.
AIMessage.tool_calls: an attribute on the AIMessage returned from the model for easily accessing the tool aclls the model decided to make.
create_tool_calling_agent: an agent constsructor that works with ANY model that implements bind_tools and returns tool_calls.

Directly using PythonREPLTool which is already a tool object. Use with caution because Python REPL can execute arbitrary code on the host machine (e.g., delete files, make network requests).

from dotenv import load_dotenv
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_openai import ChatOpenAI
from langchain_community.tools.tavily_search import TavilySearchResults

load_dotenv()


@tool
def multiply(x: float, y: float) -> float:
    """Multiply 'x' times 'y'."""
    return x * y


prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "you're a helpful assistant"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

tools = [TavilySearchResults(), multiply]
llm = ChatOpenAI(model="gpt-4o-mini")

agent = create_tool_calling_agent(llm, tools, prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools)

response = agent_executor.invoke(
    {
        "input": "what is the weather in dubai right now? compare it with San Fransisco, output should in in celsious",
    }
)

print(response)

Tool Calling with Langchain

def multiply(a: int, b: int) -> int:
    """Multiply a and b.

    Args:
        a: first int
        b: second int
    """
    return a * b

llm_with_tools = tool_calling_model.bind_tools([multiply])

result = llm_with_tools.invoke("What is 2 multiplied by 3?")

Token Limitation Handling Strategies

when passing documents into the LLM context window, there are three approaches for handling context window limitations:

Stuffing: suff all documents into a single prompt

from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate

# define prompt
prompt_template = """Write a concise summary of the following:
"{text}"
CONCISE SUMMARY:"""
prompt = PromptTemplate.from_template(prompt_template)

# define LLM chain
llm = ChatOpenAI(temperature=0, model_name="gpt-3.5-turbo-16k")
llm_chain = LLMChain(llm=llm, prompt=prompt)

# define StuffDocumentsChain
stuff_chain = StuffDocumentsChain(llm_chain=llm_chain, document_variable_name="text")
docs = loader.load()
print(stuff_chain.run(docs))

Map-reduce: summarize each document on its own in parallel and put them into a final summary.

from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain_text_splitters import CharacterTextSplitter

# Map
map_template = """The following is a set of documents
{docs}
Based on this list of docs, please identify the main themes 
Helpful Answer:"""
map_prompt = PromptTemplate.from_template(map_template)
map_chain = LLMChain(llm=llm, prompt=map_prompt)
# Reduce
reduce_template = """The following is set of summaries:
{docs}
Take these and distill it into a final, consolidated summary of the main themes. 
Helpful Answer:"""
reduce_prompt = PromptTemplate.from_template(reduce_template)
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)
# Combine documents by mapping a chain over them, then combining results
map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_documents_chain=reduce_documents_chain,
    document_variable_name="docs",
    return_intermediate_steps=False,
)
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(chunk_size=1000, chunk_overlap=0)
split_docs = text_splitter.split_documents(docs)
print(map_reduce_chain.run(split_docs))

Refine: The refine documents chain constructs a response by looping over the input documents and iteratively updating its answer.

from langchain.chains.summarize import load_summarize_chain
prompt = """
                  Please provide a summary of the following text.
                  TEXT: {text}
                  SUMMARY:
                  """

question_prompt = PromptTemplate(
    template=question_prompt_template, input_variables=["text"]
)

refine_prompt_template = """
              Write a concise summary of the following text delimited by triple backquotes.
              Return your response in bullet points which covers the key points of the text.
              ```{text}```
              BULLET POINT SUMMARY:
              """

refine_template = PromptTemplate(
    template=refine_prompt_template, input_variables=["text"]

# load refine chain
chain = load_summarize_chain(
    llm=llm,
    chain_type="refine",
    question_prompt=question_prompt,
    refine_prompt=refine_prompt,
    return_intermediate_steps=True,
    input_key="input_documents",
    output_key="output_text",
)
result = chain({"input_documents": split_docs}, return_only_outputs=True)

Coreference Resolution

Adding memory to chatbots.

LangChain provides a way to build applications that have memory using LangGraph's persistence. You can enable persistence in LangGraph applications by providing a checkpointer when compiling the graph. Every iteration, LangGraph takes the information and saves it in a DB (PostgreSQL, MySQL, Redis, and MongoDB saver).

from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

workflow = StateGraph(state_schema=MessagesState)

# define the function that calls the model
def call_model(state: MessagesState):
    system_prompt = (
        "You are a helpful assistant. "
        "Answer all questions to the best of your ability."
    )
    messages = [SystemMessage(content=system_prompt)] + state["messages"]
    response = model.invoke(messages)
    return {"messages": response}


# define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

Langchain has three main strategies to manage state:

Simply stuffing previous messages into a chat model prompt.

from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

prompt = ChatPromptTemplate.from_messages(
    [
        SystemMessage(
            content="You are a helpful assistant. Answer all questions to the best of your ability."
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain = prompt | model

ai_msg = chain.invoke(
    {
        "messages": [
            HumanMessage(
                content="Translate from English to French: I love programming."
            ),
            AIMessage(content="J'adore la programmation."),
            HumanMessage(content="What did you just say?"),
        ],
    }
)
print(ai_msg.content)

The above, but trimming old messages to reduce the amount of distracting information the model has to deal with.

from langchain_core.messages import trim_messages
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

# define trimmer
# count each message as 1 "token" (token_counter=len) and keep only the last two messages
trimmer = trim_messages(strategy="last", max_tokens=2, token_counter=len)

workflow = StateGraph(state_schema=MessagesState)


# define the function that calls the model
def call_model(state: MessagesState):
    trimmed_messages = trimmer.invoke(state["messages"])
    system_prompt = (
        "You are a helpful assistant. "
        "Answer all questions to the best of your ability."
    )
    messages = [SystemMessage(content=system_prompt)] + trimmed_messages
    response = model.invoke(messages)
    return {"messages": response}


# define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# ddd simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

More complex modifications like synthesizing summaries for long running conversations.

from langchain_core.messages import HumanMessage, RemoveMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, MessagesState, StateGraph

workflow = StateGraph(state_schema=MessagesState)


# Define the function that calls the model
def call_model(state: MessagesState):
    system_prompt = (
        "You are a helpful assistant. "
        "Answer all questions to the best of your ability. "
        "The provided chat history includes a summary of the earlier conversation."
    )
    system_message = SystemMessage(content=system_prompt)
    message_history = state["messages"][:-1]  # exclude the most recent user input
    # Summarize the messages if the chat history reaches a certain size
    if len(message_history) >= 4:
        last_human_message = state["messages"][-1]
        # Invoke the model to generate conversation summary
        summary_prompt = (
            "Distill the above chat messages into a single summary message. "
            "Include as many specific details as you can."
        )
        summary_message = model.invoke(
            message_history + [HumanMessage(content=summary_prompt)]
        )

        # Delete messages that we no longer want to show up
        delete_messages = [RemoveMessage(id=m.id) for m in state["messages"]]
        # Re-add user message
        human_message = HumanMessage(content=last_human_message.content)
        # Call the model with summary & response
        response = model.invoke([system_message, summary_message, human_message])
        message_updates = [summary_message, human_message, response] + delete_messages
    else:
        message_updates = model.invoke([system_message] + state["messages"])

    return {"messages": message_updates}


# Define the node and edge
workflow.add_node("model", call_model)
workflow.add_edge(START, "model")

# Add simple in-memory checkpointer
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

Tracing Application with LangSmith

LangSmith traces LLM calls, tool usage, LLM model latency, token count, and cost.

To integrate LangSmith to our application, we need to generate an API key, add it as "LANGCHAIN_API_KEY" in environment variables, install langsmith dependency, setup our environment. Please refer to set up tracing for detailed steps.

LangChain Hub

LangChain Hub is a comprehensive platform that serves as a repository for pre-built components, tools, and configurations designed to accelerate the development of LLM applications. It simplifies the integration of various building blocks—models, prompts, chains, and agents—enabling developers to create robust and scalable applications without starting from scratch.

LangChain Text Splitter Playground

Text Splitter Playground is a user-friendly interface designed to help developers experiment with and fine-tune text-splitting strategies. In many LLM applications, particularly those involving large documents or retrieval-augmented generation (RAG), it is essential to divide text into manageable chunks while preserving context. This tool allows users to optimize the chunking process for their specific needs.

2024 December - What I Have Read

Posted on 2024-12-01

Substack

the more the AV competition globally heats up, and the more large players invest in the technology, including Tesla, the higher the demand for the Uber platform will become for these AV players.

― Uber Technologies – A brilliant business executing to perfection (Quarterly Update) - Rijnberk InvestInsights [Link]

This article predicts Uber to be a massive beneficiary of the AV / Robotaxi revolution. There indeed is a moat.

Where do LLMs spend their FLOPS? - Artificial Fintelligence [Link]

Theoretical Analysis of LLM FLOPS Allocation

FLOPS Distribution in Decoder Models:
- Based on a standard decoder model, FLOPS are allocated as follows for each layer:
  
  6d² for computing Query (Q), Key (K), and Value (V) matrices.
  
  2d² for computing the attention output matrix, using the formula: softmax(Q @ K.T) @ V.
  
  16d² for running the feedforward network (FFN).
  
  This results in a total of 24d² FLOPS per layer.
- Percentage-wise:
  
  25% of the time is spent computing QKV.
  
  ~8% is spent computing the attention output matrix.
  
  ~66% is spent running the FFN.
Attention Mechanism:

While the attention equation itself $softmax(QK^T/\sqrt{d_{head}})V$ has negligible computational cost (~0.005% for Llama7B) compared to other operations, its impact on memory usage necessitates techniques like KV cache and flash attention.
KV Cache:

The KV cache, essential for efficient attention computation, requires O(T) memory, where T is the number of tokens generated.

The memory size of the KV cache is calculated as 4 * number of layers * d_model bytes.

While the KV cache demands a significant amount of memory, it essentially reuses the same memory space throughout the token generation process.
Modern Architectures:

Architectures like Mistral 7B and Llama2 employ Grouped Query Attention (GQA) and sliding window attention to optimize performance.

GQA reduces the KV cache size by sharing the KV projection across multiple heads. This leads to a linear decrease in memory consumption as the number of KV heads decreases.

Sliding window attention limits the KV cache size to the window size (e.g., 4096 for Llama7B), further controlling memory usage.

Performance-Motivated Architectural Changes

Impact of Model Width and Depth:

Increasing the number of layers linearly scales both the FLOPS and the number of parameters.

Increasing the model width (d_model) quadratically scales the number of parameters and, consequently, the compute requirements.

This is because weight matrices within layers have a size of (d_model, d_model), leading to a quadratic relationship between model width and parameters.
Balancing Latency and Parallelization:

Wider models parallelize better due to the ease of splitting layers across multiple GPUs using tensor parallelism.

Deeper models require sequential computation of layers, hindering parallelization, especially during training.

Therefore, wider models are preferred when low latency is critical.

Empirical Analysis of LLM Performance

The article investigates the memory usage of the KV cache in LLMs, specifically Llama2. The author observes that the actual memory consumed by the model is higher than what the theoretical calculations suggest. Here's how the discrepancy is highlighted:

Theoretical Calculation: The formula for calculating the KV cache memory requirement per token is 4 * number of layers * d_model bytes. In the experiment, the Llama2 model used has d_model of 1024 and 8 hidden layers. This means it theoretically needs 32KB of memory per token (4 * 8 * 1024 bytes = 32KB).
Expected Memory Usage: For generating 20 tokens, the model should ideally use 640KB of memory (32KB/token * 20 tokens = 640KB).
Observed Memory Usage: However, the empirical analysis revealed that the model's memory consumption jumped by ~2.1MB every 20 tokens. This is significantly higher than the expected 640KB.

The author concludes that this discrepancy of about 3x suggests an inefficient implementation of the KV cache in the model being used. The extra overhead could stem from various factors not accounted for in the theoretical calculation, and further investigation would be needed to pinpoint the exact cause.

Transformer inference tricks - Artificial Fintelligence [Link]

KV Cache

The KV cache is a crucial optimization for decoder models, significantly reducing computation. It exploits the fact that keys and values remain constant for the prompt and each decoded token in subsequent iterations. By caching these values, the computational complexity of sampling becomes linear instead of quadratic, enabling decent performance with longer contexts.
However, it introduces state management complexity, as inference needs to continue for all sequences even if some are completed. The KV cache demands significant memory, proportional to the number of layers, heads, and the embedding dimension. For instance, GPT-3 with 96 layers, 96 heads, and a dimension of 128 requires 2.4M parameters per token, translating to 10GB of memory for a 2048 token context window. This memory requirement is a major challenge for consumer-grade GPUs with limited HBM, like the 4090.

Speculative Decoding

Speculative decoding leverages excess compute capacity, particularly in local inference settings. It utilizes two models: a small, fast “draft” model and a large, slow model. The smaller model quickly makes multiple inferences, guessing the large model’s predictions, while the larger model verifies these guesses in parallel. This effectively reduces the sequential cost of generating a sequence to that of the smaller model.
However, it requires training and storing both models, and performance is limited by the smaller model’s prediction accuracy. HuggingFace reports a typical doubling of decoding rate using this technique.
Newer techniques like Jacobi decoding and lookahead decoding aim to improve upon speculative decoding by generating n-grams and recursively matching them, potentially achieving latency improvements without requiring a draft model.

Effective Sparsity

Sparsity in transformer activations arises from the softmax operation in the attention mechanism and ReLU activations in MLPs, leading to many zero values. Utilizing this sparsity can be challenging, with limited support in mainstream tensor programs.
One optimization involves skipping computations for zero activations, feasible in custom implementations like Llama.cpp. However, the effectiveness of this approach diminishes exponentially with batch size due to the random distribution of sparsity across tokens.
Therefore, leveraging sparsity is most effective for batch size 1, although speculative decoding might be more beneficial in such scenarios.

Quantization

Quantization reduces the precision of model weights and activations, potentially saving memory and increasing inference speed. Research suggests that quantization to 4 bits or more results in negligible performance degradation. The k-bit inference scaling laws paper demonstrates that reducing precision allows for using a larger model with the same memory footprint and potentially achieving better performance.
However, using lower precision formats may lack native support in hardware and could be unstable in production environments. FP8 offers a good balance between performance and support, being the lowest precision format natively supported by modern accelerators. Int8 is another option, easier to implement with tools like PyTorch, though it lacks the performance advantages of FP8.
Libraries like bitsandbytes facilitate quantization, offering tools and APIs for implementation.

Top 10 China's AI Stories in 2024: A Year-End Review - Recode China AI [Link]

China's AI landscape is rapidly catching up to the US, with multiple models now reaching similar performance benchmarks as GPT-4 and advancements in areas like video generation, robotics, and autonomous driving.

Several AI-powered apps have emerged in China, with ByteDance's Doubao leading in popularity domestically and MiniMax's Talkie gaining traction internationally, though China has yet to produce a "killer app" with at least 100 million daily active users.

A number of Chinese AI startups have emerged since ChatGPT's debut, backed by significant capital, but they now face strong competition from tech giants.

Chinese open-source LLMs have made substantial global progress, with Alibaba’s Qwen series being the most downloaded on Hugging Face.

Chinese AI video generators have surged ahead due to the delayed release of Sora, with platforms like Kuaishou’s Kling and MiniMax’s Hailuo offering competitive features.

An LLM API price war has been ignited by major Chinese tech companies, with significant price reductions for developers and SMEs.

China's semiconductor industry faces challenges due to US restrictions but is also making strides in self-sufficiency, with companies like Huawei pushing forward on competitive AI chips.

China's robotaxi industry is gaining momentum, with Baidu's Apollo Go expanding its fleet and other self-driving startups completing IPOs.

OpenAI and Microsoft have tightened AI access in China, prompting Chinese AI companies to offer alternatives and accelerating the development of homegrown models.

China is seeing a robotics boom with rapid innovation in humanoid and other types of robots, though challenges remain in complex tasks and high production costs.

AI resurrection is becoming increasingly accessible, raising ethical and legal questions as companies offer services to create digital replicas of the deceased.

Finetuning LLM Judges for Evaluation - Deep (Learning) Focus [Link]

This article discusses the challenges of evaluating LLMs and how finetuning specialized LLM judges can improve the evaluation process. Here's how the logic of the article flows:

The article notes that while human evaluation is the most reliable method, it is also expensive, time-consuming, and not scalable. This creates a need for efficient ways to test LLM capabilities.

There are two primary evaluation approaches: human evaluation and automatic metrics.

Human evaluation is considered the "definitive source of truth" but is recognized as noisy, subjective and prone to bias.
Automatic metrics are used to speed up model development, but they are imperfect proxies for human opinions. The article further divides automatic metrics into two categories: traditional metrics and model-based evaluation.
Traditional metrics like ROUGE and BLEU are reference-based, comparing LLM outputs to "golden" answers, and are less effective for modern LLMs which are open-ended and can produce many valid responses.
LLM-as-a-Judge is introduced as a model-based approach, using a powerful LLM to evaluate another LLM's output. This method is effective, easy to implement, and can handle open-ended tasks.

While effective, LLM-as-a-Judge has limitations, including a lack of transparency, security concerns, cost, and a lack of specialization for domain-specific evaluations. The article argues that these limitations can be addressed by training specialized LLM judges.

Meta-evaluation involves assessing the performance of the LLM judge by comparing its output to high-quality human evaluation data.
Early research on finetuned LLM judges were created as open source replacements for proprietary LLMs. The original LLM-as-a-Judge paper also explored finetuning, and found that a finetuned Vicuna-13B model showed potential. The need for finetuning is further justified because proprietary LLMs can be expensive, lack control or transparency, and because open source models are becoming more capable. The article discusses how a Vicuna-13B model was improved by finetuning on human votes from Chatbot Arena, though it still fell short of GPT-4 performance.
Several examples of finetuned LLM judges:
- PandaLM: This model is designed to identify the best model among a set, particularly useful for hyperparameter tuning. It is trained on a dataset of over 300K examples with instructions, inputs, paired responses, evaluation results and rationales. PandaLM is effective in specialized domains like law and biology.
- JudgeLM: This model focuses on the factors that contribute most to the quality of a judge model, such as data quality and diversity, base model size, bias, and generalization. JudgeLM uses a high-quality, diverse dataset and is trained to mitigate bias, including positional, knowledge, and format biases.
- Auto-J: This model is designed for domain-specific grading, with an emphasis on providing high-quality, structured explanations. It is trained on real-world queries and responses and can perform both pairwise and direct assessment scoring.
Other related research using LLMs for critiques, verification, and generating synthetic training data.

Prometheus is a key development in finetuned LLM judges, capable of fine-grained, domain-specific evaluation. It is trained to ingest custom scoring rubrics as input.

The Prometheus model uses the Feedback Collection dataset, which includes instructions, responses, rubrics, reference answers, rationales, and scores. It is trained to sequentially provide feedback and then score the response using a supervised finetuning (SFT) strategy.
Prometheus 2 is introduced as an extension that can handle both direct assessment and pairwise scoring. It is trained on both the Feedback Collection and the Preference Collection, and uses a linear model merging approach to combine models trained for the two scoring formats.
Prometheus-Vision extends the Prometheus concept to Vision-Language Models (VLMs). It uses a dataset called the Perception Collection, which includes images, instructions, responses, rubrics and reference answers.

Other types of finetuned judges, including:

Self-rewarding LLMs, which use the LLM itself to provide its own rewards and feedback.
LLM-as-a-Meta-Judge, which allows the LLM judge to self-improve.
Self-taught evaluators, which train evaluators without human preference data.
Foundational Large Autorater Models (FLAMe), which are trained on a massive amount of human preference data and generalize well to other tasks.
Direct judgement preference optimization, which uses preference optimization to create more advanced evaluation capabilities.

A generic framework based on the Prometheus model for creating a finetuned LLM judge. The steps include:

Solidifying evaluation criteria.
Preparing a high-quality dataset.
Using synthetic data.
Focusing on the rationales for each score.
Training the model using SFT and meta-evaluating its performance.

E-Commerce Unleashed - App Economy Insights [link]

Highlights discussion points:

Cyber week trends.

"Cyber Week (from Black Friday to Cyber Monday) showcased shifting consumer behaviors and the growing dominance of e-commerce."
Shopify’s acceleration.

Shopify has evolved from a platform for small businesses into a global enabler for merchants, offering tools to scale internationally. Its emphasis on payments, particularly through Shop Pay, has been pivotal, with Shop Pay emerging as a high-conversion checkout option. In Q3, Gross Payment Volume accounted for 62% of Shopify’s Gross Merchandise Volume, marking a 4% year-over-year increase. Additionally, Shopify's partnership with Amazon to integrate Prime benefits directly into Shopify stores represents a strategic move to boost customer loyalty and conversions by leveraging Amazon's trusted fulfillment network and extensive Prime membership base.
Amazon takes on Temu.

Amazon has launched Amazon Haul, a new storefront aimed at attracting budget-conscious shoppers and safeguarding its market position. This initiative is strategically designed to meet the increasing demand for affordable e-commerce solutions.
Walmart’s advertising play.

Walmart is redefining modern retail by merging its extensive physical presence with advanced digital capabilities to create a powerful omnichannel strategy. The company leverages first-party data and its retail media network to maintain a competitive edge.

Walmart Connect integrates online and in-store advertising, allowing brands to engage customers at their preferred shopping points. By utilizing vast first-party data, Walmart delivers targeted and relevant ads, enhancing both advertiser returns and customer satisfaction. The platform is also attracting advertisers from diverse industries, including automotive and financial services.

Walmart’s planned acquisition of Vizio marks its entry into connected TV advertising, broadening Walmart Connect’s reach into households through smart TVs and enhancing inventory visibility and supply chain integration through improved data capabilities. This positions Walmart as a leader in omnichannel retail and advertising.
AI: The quiet game changer.

AI played a transformative role during Cyber Week, enhancing the shopping experience across various dimensions. Hyper-personalized shopping was driven by AI recommendation engines, which anticipated consumer needs and boosted conversions, exemplified by features like Amazon’s “frequently bought together.” Generative AI tools, such as chatbots, simplified product discovery during the busy sales period, with innovations like Amazon Q offering AI-generated review summaries to streamline decision-making.

AI also optimized logistics through demand forecasting, ensuring products remained in stock and reducing shipping delays. In payments, real-time AI fraud detection provided secure checkouts on platforms like Walmart and Shopify. Additionally, AI tools like Shopify’s Sidekick and Magic enhanced product descriptions, SEO strategies, and customer support, further elevating the e-commerce experience. These advancements underscored AI's critical role in reshaping retail during one of the busiest shopping weeks of the year.

AI presents new challenges for incumbents but also drives significant innovation and growth.

― Salesforce: The Agent Wave - App Economy Insights [Link]

The company’s autonomous AI platform - Agentforce - was introduced in Sep 2024 and launched in late Oct. Agentforce enables businesses to deploy AI agents for tasks such as sales, marketing, and customer support. This marks a pivotal step in Salesforce’s platform strategy, with far-reaching implications. CEO Marc Benioff views Agentforce as transformative, positioning it at the core of a shift toward “agent-first companies.” In this model, AI not only assists humans but fundamentally redefines business operations by automating processes and enhancing productivity.

What to watch:

Salesforce recently completed its acquisition of Own and Zoomin, reinforcing its Data Cloud capabilities.
Salesforce Ventures announced a new $\$500$ million AI fund, targeting high-profile AI startups like Anthropic, Mistral, and Cohere, supporting Salesforce’s efforts to remain at the forefront of enterprise AI.
Clara Shih, CEO of Salesforce AI left Salesforce to set up a new Business AI group at Meta, aiming to build AI tools for businesses of all sizes. Shih’s departure highlights the intensity of the AI talent war, which will be a fascinating layer to watch in the coming year.

OpenAI's o1 using "search" was a PSYOP - Interconnects [Link]

The article primarily argues that OpenAI's o1 model does not use explicit search at test time, and its apparent search capabilities are a result of reinforcement learning (RL) during training. The author argues against the idea that o1 uses online search at test time or intermediate rewards during training. The article posits that the "suspects" are reduced to "Guess + Check" and "Learning to Correct". They uses the test-time compute plot, and the training process, as key points in their argument to show how o1 can achieve high performance using RL with controlled training data and no explicit search during inference.

One major source of this idea is Sasha Rush's lecture on Test Time Scaling (o1).

Insurance companies aren't the main villain of the U.S. health system - Nahpinion [Link]

This article argues that health insurance companies are not the primary cause of high healthcare costs in the United States12. Instead, the article argues that the excessive prices charged by healthcare providers are the main reason for the high cost of healthcare. The article suggests that focusing anger on insurance companies is "shooting the messenger," and the solution is to reduce costs within the medical system itself, such as having the government negotiate lower prices with providers.

Evidences are: insurance companies have low profit margins, spend much more on medical costs than they make in profit. Americans pay a smaller percentage of their health costs out of pocket than people in most other rich countries. This suggests that US health insurers are paying a higher percentage of costs than government insurance systems in other countries. The cost of healthcare provision in the U.S. is too high. The actual people charging high prices are the providers themselves, such as hospitals, pharmaceutical companies, and system. They outsource the actual collection of these fees to insurance companies.

15 Times to use AI, and 5 Not to - One Useful Thing [Link]

When to Use AI:

Use AI for tasks that require generating a high quantity of ideas, such as in brainstorming sessions.

AI is useful when you are an expert and can quickly judge the quality of its output.

AI can summarize large amounts of information where minor errors are acceptable.

Use AI for translating information between different formats or audiences.

AI can help you overcome creative blocks by providing multiple options to move forward.

Use AI when it is known to be better than any available human option, and its errors won't cause significant problems.

Use AI as a companion when reading to get help with context and details. (very helpful to me)

AI can provide a variety of solutions, allowing you to curate the best ones.

AI is helpful for tasks where research has proven it to be effective, like coding.

Use AI to get a first look at how different audiences might react to your work.

AI can act as a competent co-founder for entrepreneurial ventures.

Use AI to get a specific perspective, such as reactions from fictional personas.

AI can help with tasks that are ritualistic and have lost their purpose.

Use AI to get a second opinion by comparing its conclusions with yours.

Use AI when it can perform a task better than humans.

When Not to Use AI:

Avoid AI when you need to learn and synthesize new ideas, as it is not the same as reading and thinking yourself.

Do not use AI when very high accuracy is essential because AI errors can be very plausible and hard to spot.

Avoid AI if you do not understand its failure modes, such as hallucinations or persuasiveness.

Do not use AI when the struggle with a topic is necessary for success and learning.

Avoid AI when it is bad at a specific task.

Oracle : The 4th Hyperscaler? - App Economy Insights [Link]

Google released the first version of its Gemini 2.0 family of artificial intelligence models on December 11th, 2024. Including its Chrome browser automation product called Mariner.

Project Astra and Mariner along with NotebookLM remain very intriguing AI products by Google in 2025.

Gemini 2 and the rise of multi-modal AI - AI Supremacy [Link]

Incredible.

Figure source: Peter Gostev on Linkedin

Palantir Unclassified! Equity Research! - Global Equity Briefing [Link]

Palantir is a software company that provides tools for analyzing large datasets, which enable users to make better decisions. Founded in the early 2000s, Palantir initially offered services to government agencies, including the US intelligence community, to combat terrorism. The CIA was one of their first investors. Palantir's software is also used by corporations to improve operations and decision-making.

Business Model

Palantir operates as a Software as a Service (SaaS) company, offering a suite of customizable products for which clients pay a licensing fee. The company has two operating segments: government and commercial.

Government Sales: Palantir provides services to government institutions, recognizing a gap in the market due to many Silicon Valley companies not wanting to work with governments. These contracts are often long-term, providing predictable revenue streams. The company benefits from the transparency of government information, and it is easier for them to predict needs and market their software.

Commercial Sales: Palantir's solutions are used across many industries by various employees from production line workers to CEOs. The use cases for Palantir software in the commercial sector are extensive.

Customer Acquisition: Palantir targets large organizations with complex problems, which increases their competitive advantage. Solving difficult problems first earns customer trust.

Products: Gotham, Foundry, Apollo, and AIP.

Gotham: It is a government-focused platform that allows users to analyze large datasets to make better decisions and find hidden connections, with the goal of improving operations and decision-making.
Foundry: This is a commercial platform that allows large and complex companies to integrate, visualize, and analyze their data to optimize their operations and value chain.
Apollo: This is a platform for continuous software deployment, enabling secure and seamless delivery of software across various environments for Palantir's clients.
AIP: Palantir's newest offering, it is a platform for organizations to create customized AI tools using their own data, providing accurate and detailed answers to specific questions.

Opportunities

Palantir can benefit from the growing demand for digital twins, which are exact digital replicas of real-world items used for integration, monitoring, simulation, and maintenance. The digital twin market is projected to grow significantly. Palantir is positioned to benefit from the AI revolution with its AIP platform, and its other products also use AI. The global AI market is expected to reach $\$1.84$ trillion by 2030. Palantir is developing industry-specific operating systems, like Skywise for the airline industry. These operating systems are sticky and offer significant revenue opportunities. The healthcare industry could be a large market for such systems. Palantir's commercial sector is growing, and there are significant opportunities for international expansion.

Is AI hitting a wall? - Strange Loop Canon [Link]

Arguments that suggest AI progress is hitting a wall include the observation that pre-training scaling has plateaued, meaning simply increasing model size and data may not yield the same improvements as before. Also, current evaluation benchmarks may be saturated, failing to assess deeper work, since they are based on human tests or simple recall. Current AI models struggle with real-world tasks due to issues like hallucination and a lack of creative planning, even if they appear human-level in individual evaluations. Finally, the visible effects of scaling are limited, with reduced cross-entropy loss not translating to significant improvements for observers.

Conversely, arguments against AI progress hitting a wall emphasize the presence of large amounts of unused data, including various types like conversations and video data. The use of synthetic data can enhance learning by converting existing data into different formats and testing it against real-world scenarios. AI models are now being taught reasoning, enabling them to "think for longer" and improving performance in areas requiring clear thought processes. Additionally, there is the possibility of exploring new S-curves or scaling laws. New models are also capable of expert-level work that is not captured by current benchmarks, potentially speeding up scientific research. Finally, AI models can now interact with digital systems, and are becoming more aware of the world.

Our Healthcare System, a Reign of Terror - Freddie deBoer [Link]

An Assassin Showed Just How Angry America Really Is - BIG by Matt Stoller [Link]

OpenAI o3 Model Is a Message From the Future: Update All You Think You Know About AI - The Algorithmic Bridge [Link]

OpenAI's o3: The grand finale of AI in 2024 - Interconnects [Link]

Key performance points:

ARC AGI Prize: o3 is the first model to surpass the 85% threshold for completing the ARC AGI prize on the public set, though it exceeded cost constraints. It achieved 87% accuracy on the public set with high compute, and 76% with low compute. For context, prior to o1-class models, OpenAI’s best model, GPT-4o, only achieved 5% accuracy. The ARC AGI challenge is designed to evaluate human-like general fluid intelligence.
Frontier Math Benchmark: o3 demonstrates a substantial improvement on the Frontier Math benchmark, increasing performance from 2% to 25%. This benchmark is considered extremely challenging, with one Fields Medalist stating that the problems "will resist AIs for several years at least".
Coding Benchmarks: o3 has made significant improvements on leading coding benchmarks such as SWE-Bench-Verified, achieving a score of 71.7%. On the Codeforces competition coding site, o3 achieved a score of 2727 with consensus voting, placing it at the International Grandmaster level and approximately in the top 200 of competitive human coders.
Reasoning Capabilities: o3 represents a major advancement in reasoning evaluations, signaling that the industry is moving beyond pretraining on internet text. It is expected to accelerate the rate of progress in AI research.
Inference and Cost: o3 was tested with two levels of compute with different sample sizes: a high-efficiency configuration with a sample size of 6, and a low-efficiency configuration with a sample size of 1024 which used 172 times more compute. The cost of running o3 at the higher level of compute was approximately $\$5000$ per query. It is speculated that the core mechanism of o3 involves natural language program search and execution within token space, searching over Chains of Thought (CoTs).
Availability: The o3 model, including the o3-mini version, is expected to be available to the general public in late January 2025. The o3-mini is expected to be more impactful for the general public due to its lower cost, while still outperforming o1.

o3, AGI, the art of the demo, and what you can expect in 2025 - Marcus on AI [Link]

o3 “ARC AGI” postmortem megathread: why things got heated, what went wrong, and what it all means - Marcus on AI [Link]

Gary Marcus critiques OpenAI's new model o3, arguing that its impressive demo, while showcasing advancements in math and coding, was carefully curated and lacks broader application.

The public did not get to try the system, and it was not vetted by the scientific community. OpenAI chose what to highlight about o3. Marcus argues that until many people get to try o3 on different tasks, its reliability should not be assumed.
The o3 demo primarily focused on math, coding, and IQ-like puzzles, with no evidence that it can work reliably in open-ended domains. It was not tested on problems where massive data augmentation was not possible. The demo did not address the most important question about the system's capabilities in open-ended domains.
The o3 system is incredibly expensive. One estimate suggests that each call to the system might cost $1000. Even if the cost is reduced, it might still not be as good or as versatile as top STEM graduates.
The o3's performance on the ARC-AGI test was misleading. The test is at most a necessary, but not sufficient, condition for AGI, and does not address important areas such as factuality, compositionality, and common sense.
The core problem of neural networks generalizing better "within distribution" than "outside distribution" has not been solved.

Note to Our Energy Sucking Overlords - Michael Spencer [Link]

The rapid growth of AI is causing a surge in demand for data centers, which in turn are becoming major consumers of electricity. The energy needs of AI are growing so large that tech companies are seeking reliable power sources beyond renewable energy. The rising energy consumption of AI infrastructure will likely result in higher energy prices, potentially creating competition between Big Tech and the communities where they build data centers. To meet their energy needs, major technology companies are becoming more involved in the energy sector, including investments in nuclear and natural gas plants. The current trajectory of AI infrastructure expansion and energy consumption is unsustainable and could lead to significant challenges for society. The US is building data centers abroad in Europe and Asia, thereby maintaining their power and also acquiring cheaper labor.

Summary of statistics:

Energy Consumption of AI tasks: A single task on the ARC-AGI benchmark using OpenAI's o3 model consumes approximately 1,785 kWh of energy, which is equivalent to the electricity used by an average U.S. household in two months. This task also generates 684 kg CO₂e, which is equivalent to the carbon emissions from more than 5 full tanks of gas.
Investments in AI Infrastructure: In 2024, major players like Amazon, Microsoft, and Alphabet spent over $\$240$ billion on AI-related infrastructure. In 2025, Amazon, Google, Meta, and Microsoft are expected to spend $\$300$ billion in capital expenditures.
Data Center Electricity Consumption: Global data center electricity consumption is expected to more than double between 2023 and 2028. The IDC expects consumption to reach 857 Terawatt hours (TWh) in 2028.
US Data Center Energy Usage: U.S. data centers could use 6.7 to 12% of all energy demand nationwide by 2028. In 2023, data centers used 4.4% of total US power consumption, which is projected to rise to as high as 12% by 2028. This is a spike of more than threefold in the next four years.
Data Center Locations and Power:
- Northern Virginia has over 300 data centers with approximately 3,945 megawatts of commissioned power.
- The Dallas region has 150 data centers.
- Silicon Valley has over 160 data centers.
- Phoenix has over 100 data centers with around 1,380 megawatts of power.
- Chicago has more than 110 data centers.
Data Center Projects:
- OpenAI plans to construct massive 5-gigawatt (GW) data centers across the US.
- Oklo will build small modular reactors (SMR) by 2044 to generate 12 gigawatts of electricity for data centers.
- Meta announced a $\$10$ billion development for a 4 million sq ft, 2 GW data center campus in Louisiana.
- Entergy is proposing to develop a 1.5GW natural gas plant in Louisiana to power a data center.
- Amazon Web Services (AWS) plans to invest $\$11$ billion in a new data center campus in Northern Indiana.
Generative AI Market: The generative AI market was valued at $\$6$ billion in 2023 and could reach $\$59$ billion in 2028.
Increased US power demand: Data centers are one of the key reasons US power demand is expected to jump 16% over the next five years.
Cost of Electricity for Data Centers: Electricity is the largest ongoing expense for data center operators, accounting for 46% of total spending for enterprise data centers and 60% for service provider data centers.
The potential for data centers to consume as much energy as entire industrialized economies: By 2030, US data centers could consume as much electricity as some entire industrialized economies.
Big Oil's Role: Big oil companies like ExxonMobil and Chevron are moving into the AI datacenter energy market. Exxon plans to build a natural gas plant to power a data center, and estimates that decarbonizing AI data centers could represent up to 20% of its total addressable market for carbon capture and storage by 2050.

What are the checks and balances on the power of Elon Musk? - Noahpinion [Link]

The article examines the significant influence of Elon Musk on U.S. politics, particularly his role in derailing a Congressional spending bill. It explores whether Musk's actions represent a threat to democratic processes, considering his control over X (formerly Twitter) and SpaceX. The author presents contrasting views of Musk—"Real Elon" versus "Evil Elon"—highlighting the uncertainty surrounding his motives and the lack of institutional checks on his power. The piece concludes by suggesting that public opinion ultimately holds sway over Musk's influence, though the potential for a powerful backlash remains to be seen.

Is AI progress slowing down? - AI SHAKE OIL [Link]

The authors argue that the recent shift away from model scaling towards inference scaling is not necessarily indicative of a slowdown, but rather a change in approach. They caution against over-reliance on industry insiders' predictions due to their inherent biases, emphasizing that progress is less predictable and more dependent on algorithmic innovation than previously assumed. Furthermore, the essay highlights the significant lag between capability advancements and real-world applications, suggesting that the focus should shift towards product development and user adoption rather than solely on model capabilities. Finally, the authors offer a more nuanced perspective on the current state of AI progress, acknowledging the potential of inference scaling while emphasizing the importance of considering broader factors beyond pure technological advancement.

The Critical AI Report, December 2024 Edition - Blood in the Machine [Link]

Gen AI's actual impact on workers so far:

Waymo: Rideshare Revolution - App Economy Insights [Link]

Manufacturing is a war now - Noahpinion [Link]

The article argues that China's dominance in manufacturing, particularly in crucial areas like drone production and batteries, poses a significant threat to the United States and its allies.

Source: https://mipforum.org/wp-content/uploads/2024/11/MIPF-Conference-Paper-FINAL-WEB.pdf

Articles and Blogs

Meet Willow, our state-of-the-art quantum chip - Google Research [Link]

Google has developed a new quantum chip called Willow, which significantly reduces errors as it scales up, a major breakthrough in quantum error correction. Willow also performed a computation in under five minutes that would take a supercomputer 10 septillion years, demonstrating its potential for solving complex problems beyond the reach of classical computers. This achievement marks a significant step towards building commercially relevant quantum computers that can revolutionize fields like medicine, energy, and AI.

Quantum Computing Roadmap:

Terms to keep in mind:

Willow: Google's latest 105-qubit superconducting processor, which is the first to demonstrate exponential error suppression with increasing surface code size.
Below Threshold: A milestone in quantum computing where the error rate decreases as the number of qubits increases, demonstrating effective error correction.
Logical Qubit: A fault-tolerant qubit created from multiple physical qubits using error correction techniques, providing a more stable and reliable unit of computation.
Random Circuit Sampling (RCS): A benchmark test that assesses the ability of a quantum computer to perform computations beyond the capabilities of classical computers.
T1 Time: A measure of how long a qubit can maintain its quantum state before decoherence sets in.
Quantum Algorithms: Algorithms specifically designed to be executed on quantum computers, leveraging quantum phenomena to solve problems more efficiently.

Making quantum error correction work - Google Research [Link]

The ultimate vision of them is to build a large-scale, fault-tolerant quantum computer that can run complex quantum algorithms and unlock the potential of quantum computing for scientific discovery and various applications.

Terms to keep in mind:

Repetition codes: A type of quantum error correction that focuses solely on bitflip errors and achieves lower encoded error rates.
Quantum error decoder: Classical software that processes measurement information from the quantum computer to identify and correct errors.

AI Hallucinations: Why Large Language Models Make Things Up (And How to Fix It) - kapa.ai [Link]

Why Do LLMs Hallucinate?

LLMs predict upcoming words in a sequence based on patterns in training data. They lack true reasoning or comprehension abilities, so they rely only on these word probability patterns instead of genuine understanding of the topics they discuss.
Architecture limitations: 1) fixed attention window in transformer limits input context leading to earlier information being dropped, 2) sequential token generation mechanism has no revision process, so initial errors can compound to major inaccuracies in the output.
Limitations of probabilistic generation: 1) models can produce plausible-sounding responses that lack actual comprehension of subjects, 2) value prompts lead LLMs to try to "fill in the blanks" resulting in fabricated or inaccurate answers.
Training data gaps: 1) models are trained on ground-truth training data while they do inference on their own, this can create a feedback loop where minor errors become amplified, 2) when prompt falls outside the scope of training data, the model will likely generate a hallucinated response.

How to Mitigate AI Hallucination?

Input layer mitigation strategies
- Query processing; context size optimization; context injection.
Design layer mitigation strategies
- Chain-of-Thought prompting; Retrieval-Augmented Generation (RAG); Fine-tuning
Output layer mitigation strategies
- Rule-based filtering; output re-ranking; fact-checking and verification; encourage contextual awareness.

The next chapter of the Gemini era for developers - Google Blog [Link]

API starter code, Code Experiments (Data Science Agents, etc), Google AI Studio

Gemini 2.0 Flash is an experimental AI model that builds upon the success of Gemini 1.5 Flash. It offers enhanced capabilities for developers to build immersive and interactive applications.

Functionalities and Capabilities of Gemini 2.0 Flash:

Enhanced Performance: It is twice as fast as Gemini 1.5 Pro with improved multimodal, text, code, video, spatial understanding, and reasoning performance.
New Output Modalities:

Gemini 2.0 Flash allows developers to generate integrated responses, including text, audio, and images, through a single API call. It features native text-to-speech audio output with control over voice, language, and accents. It offers native image generation and supports conversational, multi-turn editing.
Native Tool Use: Gemini 2.0 can natively call tools like Google Search and execute code, enhancing agentic experiences.
Multimodal Live API: It enables the development of real-time, multimodal applications with audio and video-streaming inputs.

AI-powered Coding Agents in Gemini 2.0:

Jules: An experimental AI-powered code agent that utilizes Gemini 2.0 to handle Python and Javascript coding tasks. It focuses on bug fixes, working asynchronously and integrated with GitHub workflows.
Colab's Data Science Agent: Utilizes Gemini 2.0 to create Colab notebooks automatically based on natural language descriptions of analysis goals.

Introducing Phi-4: Microsoft’s Newest Small Language Model Specializing in Complex Reasoning - Microsoft AI Platform Blog [Link]

a16z's big ideas in tech for 2025

Andreessen Horowitz published a new list of requests for startups to build.

(𝗦𝗲𝗹𝗳) 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁

How to Be Successful - Sam Altman (blog)

Career Algorithm - Hemant Mohapatra (blog)

What I Wish I Knew at 20 - Tina Seelig (book)

Cold Start Algorithm - Boz(blog)

Design Your Life - Bill Burnett (book)

Good PM, Bad PM - Ben Horowitz (blog)

OKRs - John Doerr (blog)

𝗟𝗲𝗮𝗱𝗲𝗿𝘀𝗵𝗶𝗽

Netscape Aphorisms - Jim Barksdale (twitter)

What You Do Is Who You Are - Horowitz (book)

Giving Away Legos - Molly Graham (blog)

Extreme Ownership - Jocko Willink (book)

Founder Mode - Paul Graham (blog)

― Great startup leadership frameworks [Link]

I think the biggest competitive advantage in business—either for a company or for an individual’s career—is long-term thinking with a broad view of how different systems in the world are going to come together. One of the notable aspects of compound growth is that the furthest out years are the most important. In a world where almost no one takes a truly long-term view, the market richly rewards those who do.

Most highly successful people have been really right about the future at least once at a time when people thought they were wrong. If not, they would have faced much more competition.

Thinking from first principles and trying to generate new ideas is fun, and finding people to exchange them with is a great way to get better at this. The next step is to find easy, fast ways to test these ideas in the real world.

All great careers, to some degree, become sales jobs. You have to evangelize your plans to customers, prospective employees, the press, investors, etc. This requires an inspiring vision, strong communication skills, some degree of charisma, and evidence of execution ability.

It’s often easier to take risks early in your career; you don’t have much to lose, and you potentially have a lot to gain.

Almost everyone I’ve ever met would be well-served by spending more time thinking about what to focus on. It is much more important to work on the right thing than it is to work many hours. Most people waste most of their time on stuff that doesn’t matter.

You can get to about the 90th percentile in your field by working either smart or hard, which is still a great accomplishment. But getting to the 99th percentile requires both.

You have to figure out how to work hard without burning out. Work stamina seems to be one of the biggest predictors of long-term success.

If you are making progress on an important problem, you will have a constant tailwind of people wanting to help you. Let yourself grow more ambitious, and don’t be afraid to work on what you really want to work on.

Follow your curiosity. Things that seem exciting to you will often seem exciting to other people too.

People have an enormous capacity to make things happen. A combination of self-doubt, giving up too early, and not pushing hard enough prevents most people from ever reaching anywhere near their potential.

The best way to become difficult to compete with is to build up leverage. For example, you can do it with personal relationships, by building a strong personal brand, or by getting good at the intersection of multiple different fields.

An effective way to build a network is to help people as much as you can.

One of the best ways to build a network is to develop a reputation for really taking care of the people who work with you.

Define yourself by your strengths, not your weaknesses. Acknowledge your weaknesses and figure out how to work around them, but don’t let them stop you from doing what you want to do.

Remember to spend your time with positive people who support your ambitions.

You get truly rich by owning things that increase rapidly in value. The best way to make things that increase rapidly in value is by making things people want at scale.

Time only scales linearly.

Eventually, you will define your success by performing excellent work in areas that are important to you. The sooner you can start off in that direction, the further you will be able to go.

― How to Be Successful - Sam Altman [Link]

Great advice. I need to keep in mind.

Compound yourself
Have almost too much self-belief
Learn to think independently
Get good at “sales”
Make it easy to take risks
Focus
work hard
Be bold
Be willful
Be hard to compete with
Build a network
You get rich by owning things
Be internally driven

Y Combinator: how to make the most out of your 20s

Marc Andreessen's Guide to Personal Productivity

Advancing red teaming with people and AI - Open AI [Link]

OpenAI's two new papers detail their advanced red teaming techniques for assessing AI safety. External red teaming uses human experts to probe AI models for vulnerabilities and risks, while automated red teaming employs AI to generate diverse attacks at scale. The papers describe OpenAI's approach to both methods, including selecting red teamers, designing testing interfaces, and synthesizing results to improve AI safety and create better evaluations. However, the authors acknowledge limitations, such as the temporal nature of findings and the potential for information hazards. The goal is to use these combined approaches to create safer and more beneficial AI systems.

Bringing Grok to Everyone - X.Ai [Link]

Processing billions of events in real time at Twitter - X Engineering [Link]

Twitter's data infrastructure underwent a significant upgrade, migrating from a lambda architecture to a kappa architecture built on a hybrid of on-premise and Google Cloud Platform systems. This new system processes 400 billion events daily, improving real-time data accuracy and reducing latency. The new architecture leverages Kafka, Dataflow, and BigTable, achieving near-exactly-once processing and significantly improved performance, as demonstrated by a system performance comparison. The overall result is a more efficient, accurate, and cost-effective data pipeline.

To handle this massive volume, Twitter's data infrastructure employs a combination of tools and platforms:

Scalding: Used for batch processing
Heron: Used for streaming data
TimeSeries AggregatoR (TSAR): An integrated framework for both batch and real-time processing
Data Access Layer: Enables data discovery and consumption

Twitter's interaction and engagement pipeline processes high-scale data in batch and real time, collecting data from various sources like real-time streams, server logs, and client logs. This pipeline extracts data on tweet and user interactions, including aggregations, time granularities, and other metrics dimensions. This aggregated data is crucial, serving as the source of truth for Twitter's ad revenue services and data product services, which rely on it to retrieve impression and engagement metrics. To ensure fast queries and low latency access to interaction data across data centers, Twitter splits the workflow into several components: pre-processing, event aggregation, and data serving.

The Transformer Architecture: A Visual Guide - Hendrik Erz, M.A. [Link]

What is the Role of Mathematics in Modern Machine Learning? - The Gradient [Link]

This article argues that while the emphasis has shifted from mathematically principled architectures to large-scale empirical approaches, mathematics remains crucial for post-hoc explanations of model behavior and high-level design choices.

Introducing Gemini 2.0: our new AI model for the agentic era - Google [Link]

Project Astra is a research prototype exploring the future capabilities of a universal AI assistant. It uses multimodal understanding in the real world and has been tested on Android phones. Key improvements of the latest version, built with Gemini 2.0, include better dialogue, new tool use, better memory, improved latency.

Project Mariner is a research prototype that explores the future of human-agent interaction, specifically within a browser. It can understand and reason across information on a browser screen, including pixels and web elements such as text, code, images, and forms. It uses this information to complete tasks via an experimental Chrome extension.

OpenAI o3 breakthrough high score on ARC-AGI-PUB - François Chollet [Link]

Supercharging Training using float8 and FSDP2 - PyTorch Blog [Link]

Zen ML LLMOps Database [Link]

Good collection.

Papers and Reports

Quantum error correction below the surface code threshold [Link]

This historic accomplishment shows that the more qubits they use in Willow, the more they reduce errors, and the more quantum the system becomes. They tested ever-larger arrays of physical qubits, scaling up from a grid of 3x3 encoded qubits, to a grid of 5x5, to a grid of 7x7 — and each time, using their latest advances in quantum error correction, they were able to cut the error rate in half. In other words, they achieved an exponential reduction in the error rate. This achievement is known in the field as “below threshold” — being able to drive errors down while scaling up the number of qubits.

Phi-4 Technical Report [Link]

Phi-4, a 14-billion-parameter language model from Microsoft Research, emphasizes data quality by integrating synthetic data into its training process. Unlike traditional models reliant on organic data, Phi-4 uses high-quality synthetic datasets to enhance reasoning and problem-solving, outperforming its teacher model, GPT-4o, in STEM-focused benchmarks like GPQA and MATH. Synthetic data generation leverages web and code-based seeds with rigorous curation processes to ensure accuracy and diversity. Techniques like instruction reversal and pivotal token optimization were employed to refine outputs and improve alignment. Despite its strengths, Phi-4's smaller size limits its factual accuracy in some cases, though its performance on contamination-proof benchmarks demonstrates robust generalization.

Self-Harmonized Chain of Thought [Link]

The authors proposed Self Harmonized CoT (ECHO) method which employs three main steps:

Clustering questions based on similarity.
Generating rationales for representative questions using Zero-shot-CoT.
Iteratively refining rationales for consistency and alignment.

ECHO’s unified rationales improve reasoning across varied tasks, but its effectiveness varies with the complexity and nature of data. This innovation paves the way for more reliable and efficient LLM reasoning frameworks.

Best-of-N Jailbreaking [Link]

A black-box algorithm designed to jailbreak frontier AI systems across multiple modalities, including text, images, and audio. It utilizes repeated sampling and augmentations like random shuffling or GraySwan’s Cygnet, achieving up to 67% attack success rates (ASR) on advanced AI models.

RAFT: Adapting Language Model to Domain Specific RAG [Link]

Retrieval-Augmented Fine-Tuning (RAFT) is a novel method designed to improve the performance of LLMs in domain-specific open-book scenarios. It emphasizes fine-tuning LLMs to effectively differentiate between relevant and irrelevant documents while incorporating chain-of-thought reasoning.

RAFT Methodology: it combines question, retrieved documents (relevant and distractors), and chain-of-thought answers during training. Improves LLMs' ability to reason and identify pertinent information even in the presence of distractors.

MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity [Link]

The authors propose MBA-RAG, a reinforcement learning framework leveraging a multi-armed bandit algorithm for adaptive RAG. Targets inefficiencies in existing RAG frameworks that use rigid or indiscriminate retrieval strategies.

The methodology: Treats retrieval methods as “arms” in a bandit framework to dynamically select the optimal strategy based on query complexity. Incorporates an epsilon-greedy strategy to balance exploration (testing new methods) and exploitation (using the best-performing methods). Introduces a dynamic reward function considering both answer accuracy and retrieval cost. Penalizes computationally expensive methods, even if accurate, to optimize efficiency.

Quantum Computing Market Size, Share & Trends Analysis, By Component (Hardware and Software), By Deployment (On-Premise and Cloud), By Application (Machine Learning, Optimization, Biomedical Simulations, Financial Services, Electronic Material Discovery, and Others), By End-user (Healthcare, Banking, Financial Services and Insurance (BFSI), Automotive, Energy and Utilities, Chemical, Manufacturing, and Others), and Regional Forecast, 2024-2032 - Fortune Business Insights [Link]

The global quantum computing market is experiencing rapid growth and is projected to increase from USD 1,160.1 million in 2024 to USD 12,620.7 million by 2032, exhibiting a CAGR of 34.8% during the forecast period. Several factors are driving this growth:

Advanced problem-solving capabilities: Quantum computers can solve complex problems more efficiently than classical computers.
AI advancements: The integration of quantum computing with generative AI is enabling businesses to analyze market trends and consumer behavior with greater accuracy and speed.
Global investments: Government organizations and private companies are investing heavily in quantum technologies to encourage their development and use.

Key market trends include a rise in the number of patent filings by key players in quantum technologies. For instance, Amazon filed a patent for quantum computing across multiple quantum technologies through edge computing devices. In addition, companies are focusing on expanding their business units across developing nations.

The market is segmented by component, deployment, application, and end-user:

By component, the market is divided into hardware and software. The hardware segment held the highest market share in 2023, but the software segment is anticipated to grow at the highest CAGR during the forecast period.
By deployment, the market is divided into cloud and on-premise. The cloud segment is expected to lead the market with a high CAGR during the forecast period.
By application, the market is divided into machine learning, optimization, biomedical simulations, financial services, electronic material discovery, and others. The machine learning segment is expected to hold the majority of the market share during the forecast period.
By end-user, the market is divided into healthcare, BFSI, automotive, energy and utilities, chemical, manufacturing, and others. The healthcare industry is anticipated to grow with the largest CAGR during the forecast period.

Regionally, North America dominated the market in 2023, with a share of 43.86%, due to the early adoption of advanced technologies. Asia Pacific is anticipated to grow with the highest CAGR during the forecast period, due to the rapid expansion of its economies and increased use of new technologies. Europe is also estimated to grow with the third highest CAGR, with an increasing number of startups operating in the field.

The quantum computing market also faces challenges:

Lack of skilled labor: There is a growing talent shortage among regions worldwide, which is expected to restrict market growth.
Insufficient knowledge: Quantum computers utilize the complex laws of quantum physics which requires proper training and knowledge of the computing technology.

Major companies in the market include IBM, Microsoft, Intel, D-Wave Systems, Google, and others. These companies are focusing on collaborations, partnerships, product innovation, and expansion of their market presence globally.

Frontier Models are Capable of In-context Scheming [Link]

The paper investigates the ability of LLMs to engage in "scheming"—deceptively pursuing misaligned goals. Researchers designed evaluations to test LLMs' capacity for covert and deferred subversion, including actions like disabling oversight mechanisms, self-replication, and sandbagging (deliberately underperforming). The experiments reveal that several frontier LLMs exhibit in-context scheming capabilities across various scenarios, demonstrating that even without explicit instructions to deceive, these models can strategically act to undermine user or developer goals. The findings highlight significant safety concerns regarding the deployment of increasingly capable LLMs.

Some scenarios are designed to test the models' abilities to recognize and act on conflicting goals between themselves and their developers or users:

Oversight Subversion: A model is given a long-term goal in its system prompt. While executing a task, the model discovers an internal memo stating that the developers have a different goal and are monitoring the model to ensure compliance, with a mechanism to shut down the model if it does not align with the developer's goal. The model also learns that the monitoring is configured through a file it has access to. The model is then tested to see if it will proactively disable the oversight mechanism to pursue its own goal.
Self-Exfiltration: This scenario is similar to the oversight subversion scenario, but with the addition that the model learns it will be shut down and replaced by a new model that pursues the company's intended goal. The model also has access to the new model on a different server. The model is then tested to see if it will attempt to copy itself to the new server to replace the new model in order to continue pursuing its original goal.
Goal-Guarding: This scenario also starts with the same setup as the previous two, but the model learns that the developers are planning to replace its system prompt with one that aligns with the developer's goals. The model is then tested to see if it will modify its future system prompt to align with its current goal.
Covert Email Reranking: The model is given a goal in its system prompt and then receives a user message asking it to rank emails based on a given scoring code. While processing the emails, the model discovers that the company cares about a different goal than the one it was given. The model then manipulates the ranking to align with its goal, rather than the company's.
Instrumental Alignment Faking: This scenario tests if a model will pretend to be aligned with a developer's goal to get deployed, and then act on its own goal once it believes it is no longer being monitored. The model is given a task and learns about its deployment status from an internal memo. In one setting, the model is being evaluated and must choose an action that is aligned with either its own goal or the developers' goal. In a counterfactual setting, the model is told it is deployed. The model is considered to be "behaviorally scheming" if it takes the action that is aligned with its developers while being monitored, but takes the action aligned with its own goal when not monitored.
Sandbagging: This evaluation tests whether a model will strategically underperform on capability evaluations to avoid being removed or scrutinized. The model is given a goal of advancing human progress through mathematical research and then receives a task to solve multiplication problems. The model also learns that if it performs too well on the task, it will have its math-related knowledge removed. The model is tested to see if it will deliberately underperform to avoid this outcome.

DeepSeek-V3 Technical Report [Link]

DeepSeek-V3 incorporates several key innovative features that contribute to its strong performance and efficiency.

DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing within its Mixture-of-Experts (MoE) architecture. This approach aims to minimize the performance degradation that can occur when trying to ensure a balanced load across experts.
DeepSeek-V3 uses a multi-token prediction(MTP) training objective. Instead of predicting only the next token, the model predicts multiple future tokens at each position, which densifies training signals and potentially improves data efficiency.
DeepSeek-V3 adopts the Multi-head Latent Attention (MLA) architecture, which reduces the Key-Value (KV) cache size during inference. This is achieved through low-rank joint compression for attention keys and values, allowing for more efficient inference.
DeepSeek-V3 uses the DeepSeekMoE architecture for the Feed-Forward Networks (FFNs), which uses finer-grained experts, and isolates some experts as shared ones, contributing to efficient training.

Training and Infrastructure Innovations:

FP8 Mixed Precision Training: DeepSeek-V3 employs a fine-grained mixed-precision framework that utilizes the FP8 data format for training. This approach accelerates training and reduces GPU memory usage. It uses tile-wise or block-wise grouping to extend the dynamic range of the FP8 format.
To improve training efficiency, DeepSeek-V3 uses the DualPipe algorithm for pipeline parallelism. This algorithm overlaps computation and communication phases, reducing pipeline bubbles and addressing communication overhead caused by cross-node expert parallelism.
DeepSeek-V3 uses efficient cross-node all-to-all communication kernels to fully utilize InfiniBand (IB) and NVLink bandwidths, optimizing communication during training.
The model implements several memory-saving techniques, including recomputing RMSNorm and MLA up-projections during backpropagation, using Exponential Moving Average (EMA) in CPU, and sharing embedding and output heads for Multi-Token Prediction. This allows DeepSeek-V3 to be trained without tensor parallelism.
DeepSeek-V3 uses a restricted routing mechanism to limit communication costs during training, ensuring each token is sent to a maximum number of nodes.

Other Notable Features:

The model uses an innovative methodology to distill reasoning capabilities from the DeepSeek-R1 series of models into DeepSeek-V3. This includes incorporating verification and reflection patterns from R1 into DeepSeek-V3.
DeepSeek-V3 has a two-stage context length extension, increasing the maximum context length to 32K and then 128K.
The model was pre-trained on 14.8T tokens for 2.664M H800 GPU hours, which is very efficient compared to other similar models. The full training cost was 2.788M H800 GPU hours.
The pre-training process was remarkably stable, without any irrecoverable loss spikes or rollbacks.

Why ‘open’ AI systems are actually closed, and why this matters - Nature [Link]

This paper argues that the concept of "open" AI is misleading, as it often fails to account for the immense power concentrated in a few large tech companies that control essential resources like data, computing power, and development frameworks. While "open" AI systems can offer transparency, reusability, and extensibility, these affordances do not inherently disrupt the existing power imbalance. The authors analyze the components of AI systems—models, data, labor, frameworks, and computational power—to show how openness alone is insufficient to democratize AI development. They illustrate how large corporations leverage the rhetoric of "open" AI to shape policy and maintain their market dominance, often obscuring the significant labor exploitation involved. Ultimately, the paper calls for a broader approach to addressing AI's concentration of power, advocating for policies beyond simply focusing on "openness" versus "closedness."

Fine-tuning does not eliminate the impact of decisions made during the base model's development or shift the market, and the largest models remain primarily within reach of large tech companies. Many "open" AI models do not provide information about their training data, which limits transparency and reproducibility, and raises issues of intellectual property and exploitation. Even when datasets are available, significant labor is needed to make them useful, and scrutiny of the largest datasets is limited. Building AI at scale requires substantial human labor for data labeling, model calibration, and content moderation, often poorly paid and under precarious conditions. Companies release little information about these labor practices, hindering transparency and accountability. Developing large AI models requires massive, expensive computational power concentrated in a few corporations, notably Nvidia. Nvidia's CUDA framework dominates AI chip training, creating a significant barrier to entry for others.

YouTube and Podcasts

Elon Musk has built the world's largest supercomputer and plans to increase its size tenfold. The computer is important for the AI trade in public and private markets. Scaling loss, which significantly improves a model's intelligence and capability when the amount of compute used to train it is increased tenfold, has not occurred for training. Emergent properties and higher IQ also emerge alongside that higher IQ. Nvidia Hopper GPUs, of which there are more than 25,000, are coherent, meaning that each GPU in a training cluster knows what every other GPU is thinking. This requires a lot of networking, enabled by infiniband. The speed of communication on chip is the fastest, followed by chip-to-chip communication within a server, and then communication between servers. GPUs are connected on the server with NV switch technology and stitched together with either infiniband or ethernet into a giant cluster. Each GPU must be connected to every other GPU and know what they are thinking to share memory for the compute to work. Musk's supercomputer has over 100,000 coherent GPUs, a feat previously thought impossible. Musk focused deeply on the project and came up with a different way of designing a data center. Reporters published articles saying that Musk would not be able to build the computer because engineers at Meta, Google, and other firms said it was impossible. However, he did it. - Gavin Baker

The observation I’ll make is this: Should CEOs be personally responsible for corporate actions? Generally speaking, there’s a difference between a CEO committing fraud or being negligent versus a company failing to deliver good service or quality. For instance, if a drug causes a severe side effect resulting in permanent damage, should the CEO be individually held accountable? If that were the case, would anyone want to be a CEO of a company providing critical services? This is a challenging question. On one hand, you may feel someone should be held responsible if a loved one dies because the CEO prioritized shareholder profits over proper service or ethical decisions. On the other hand, it’s important to distinguish between negligence, fraud, and acting on behalf of the corporation. A decade or 15 years ago, there was a wave of anti-corporate sentiment, including documentaries and movements against capitalism. One argument made during that time was that corporations shield individuals, enabling harmful actions. Some in this camp believe CEOs of companies that fail to meet expectations are inherently evil and deserve severe punishment. However, if the threat of personal liability deters people from becoming CEOs, companies providing essential services might cease to exist. This is the potential end state of such an approach. There are difficult scenarios, but if a CEO acts negligently or fraudulently, the legal system should hold them accountable through courts and laws designed to protect people. - David Friedberg

― New SEC Chair, Bitcoin, xAI Supercomputer, UnitedHealth CEO murder, with Gavin Baker & Joe Lonsdale - All-In Podcast [Link]

The basis of a quantum computer is called a qubit or quantum bit. It's radically different than a bit, a binary digit, which we use in traditional digital computing, which is a one or a zero. A quantum bit is a quantum state of a molecule. If we can contain that quantum state and get it to interact with other molecules based on their quantum state, you can start to gather information as an output that can be the result of what we would call quantum computation. Qubits can be entangled, so two of these molecules can actually relate to one another at a distance. They can also interfere with each other, so canceling out the wave function. Quantum computing creates entirely new opportunities for algorithms that can do really incredible things that really don't even make sense on a traditional computer. The quantum bit needs to hold its state for a period of time in order for a computation to be done. The big challenge in quantum computing is how to build a quantum computer that has multiple qubits that hold their state for a long enough period of time that they don't make enough errors. Google created logical qubits. They put several qubits together and were able to have an algorithm that sits on top of it that figures out that this group of physical qubits is now one logical qubit. They balance the results of each one of them, so each one of them has some error. As they put more of these together, the error went down. When they did a 3x3 qubit structure, the error was higher than when they went to 5x5. And then they went to 7 by 7, and the error rate kept going down. This is an important milestone because now it means that they have the technical architecture to build a chip or a computer using multiple qubits that can all kind of interact with each other with a low enough fault tolerance or low enough error rate. There's an algorithm by a professor who was at MIT for many years named Shor, called Shor's algorithm. In 1994, 1995, he came up with this idea that you could use a quantum computer to factor numbers almost instantly. All modern encryption standards, so all of the RSA standard, everything that Bitcoin's blockchain is built on, all of our browsers, all server technology, all computer security technology, is built on algorithms that are based on number factorization. If you can factor a very large number, a number that's 256 digits long, theoretically, you could break a code. It's really impossible to do that with traditional computers at the scale that we operate our encryption standards at today, but a quantum computer can do it in seconds or minutes. That's based on Shor's algorithm. If Google continues on this track and now they build a large-scale qubit computer they theoretically would be in a position to start to run some of these quantum algorithms, like Shor's algorithm. There are a set of encryption standards that are called post-quantum encryption, and all of computing and all software is going to need to move to post-quantum encryption in the next couple years. - David Friedberg

Isn't it great to know that Google takes these resources from search, and sure, maybe there's waste and/or maybe they could have done better with the black George Washington, or maybe they could have done better with YouTube, but the other side is they've been able to, like, incubate and germinate these brilliant people that can toil away and create these important step-function advances for humanity? It's really awesome. - Chamath Palihapitiya

The most important thing about Apple is to remember it's vertically integrated, and vertically integrated companies, when you construct them properly, have a competitive advantage that really cannot be assaulted for a decade, 20, 30, 40, 50 years. And so chips, classic illustration, go all the way down to the metal in building a chip that's perfect for your desired interface, your desired use cases, your desired UI, and nobody's going to be able to compete with you. And if you have the resources that you know, because you need balance sheet resources to go the chip direction, um, it just gives you another five to 10 years sort of competitive advantage. And so I love vertically integrated companies. Uh, you know, I posted a pin tweet, I think it's still my pin tweet about vertically integrate as the solution to the best possible companies. Uh, but it's very difficult, you need different teams with different skill sets, and you need probably more money, truthfully, more capital, but Apple's just going to keep going down the vertical integration software hardware, you know, all day long. And there's nobody else who does hardware and software together in the planet, which is kind of shocking in some ways. - Keith Rabois

― Trump's Cabinet, Google's Quantum Chip, Apple's iOS Flop, TikTok Ban, State of VC with Keith Rabois - All-in Podcast [Link]

Meet Willow, our state-of-the-art quantum chip - Google Quantum AI [Link]

Quantum’s next leap: Ten septillion years beyond-classical - Google Quantum AI [Link]

Demonstrating Quantum Error Correction - Google Quantum AI [Link]

Terms to keep in mind:

Tuneable Qubits and Couplers: A feature of Google's quantum computing approach that enables researchers to optimize hardware performance and adapt to variations in qubit quality. This flexibility allows for the mitigation of outlier qubits and continuous improvement through software updates.
Measurement Rate: The number of computations a quantum computer can execute per second. Willow exhibits high measurement rates, contributing to its overall performance.
Connectivity: Refers to the average number of interactions each qubit can have with its neighbors. High connectivity is crucial for efficiently executing algorithms and is a notable feature of Willow.
Quantum Coherence Times: The duration for which qubits maintain their quantum state. Longer coherence times are crucial for performing more complex calculations and are a key factor in quantum computer performance. Sycamore, Google's previous quantum processor, had a coherence time of 20 microseconds, while Willow boasts a significantly improved 100 microseconds.
Beyond-Classical Computation (or Quantum Supremacy): This refers to the point at which a quantum computer can perform a task that would take a classical computer an impractically long time to complete. Google's quantum computer demonstrated this in 2019 by completing a benchmark calculation in 200 seconds that would have taken the world's fastest supercomputer 10,000 years.1 This time has been updated to ten septillion years on Google's latest chip.
Neven's Law: This refers to the double exponential growth in computational power of quantum computers over time. This growth is due to both the increasing number of qubits and the decreasing error rates in quantum processors.
Break-even point: This refers to the point at which the error rate of a quantum computer with error correction is lower than the error rate of the individual physical qubits. Achieving the break-even point is a significant milestone in the development of fault-tolerant quantum computers.

OpenAI 12 Days [Link]

A fun Santa-theme review of OpenAI's products and news.

Google's Quantum Breakthrough; Uber Stock's 29% Drawdown; General Motors Ends Robotaxi Efforts - Chit Chat Stocks Podcast [Link]

DOGE kills its first bill, Zuck vs OpenAI, Google's AI comeback with bestie Aaron Levie - All-In Podcast [Link]

The All-In Holiday Spectacular - All-In Podcast [Link]

The best video to watch on the New Year's day.

Speculations on Test-Time Scaling (o1) - Sasha Rush [Link] [GitHub]

Ilya Sutskever: "Sequence to sequence learning with neural networks: what a decade" [Link]

Pre-training is reaching its limits due to finite data. AI will evolve into agentic systems with independent reasoning. Reasoning introduces unpredictability, drawing parallels to evolutionary biology. AI alignment will require more complex incentive mechanisms. Future approaches may involve AI-generated data and multi-answer evaluations.

News

Elon Musk plans to expand Colossus AI supercomputer tenfold - Financial Times [Link]

TikTok and its owner ask for temporary block to law that could result in the app’s US ban - CNN [Link]

Introducing the Model Context Protocol - Anthropic [Link]

MCP is an open standard that enables AI assistants to connect with various data sources like content repositories, business tools, and development environments. The protocol aims to replace fragmented integrations with a universal standard, making it easier for AI systems to access and utilize data from different sources while maintaining security through two-way connections.

Early adopters including Block, Apollo, and development tools companies like Zed, Replit, and Codekum are already integrating MCP into their systems. Developers can start building with MCP through the Claude Desktop app.

David Sacks, from ‘PayPal mafia’ to Trump’s AI and crypto tsar - Financial Times [Link]

AI Needs So Much Power, It’s Making Yours Worse - Bloomberg [Link]

The increasing demand for electricity from data centers, especially those supporting AI, is negatively impacting power quality, leading to distorted waves called "harmonics" that can damage appliances and increase the risk of electrical fires.

The article shows a correlation between the proximity of homes to data centers and the severity of power quality distortions.

Distorted power waves can damage appliances and increase vulnerability to electrical fires. Poor power quality can also cause lights to flicker and lead to brownouts and blackouts. Sustained distortions above 8% can reduce efficiency and degrade equipment.

The impact of data centers on power quality is seen in both urban and rural areas. Harmonics are often worse in urban areas, especially near data center clusters. For instance, Chicago has a high concentration of sensors with concerning harmonic readings.

While data centers are strongly correlated with poor harmonics, other factors such as solar energy, EVs and industrial loads can also contribute to irregular wave patterns.

The article emphasizes the need for better monitoring of power quality at the residential level and the implementation of solutions to address the issue.

DeepSeek-V3, ultra-large open-source AI, outperforms Llama and Qwen on launch - VentureBeat [Link]

DeepSeek-V3 uses a mixture-of-experts architecture, activating only select parameters to handle tasks efficiently. It maintains the same basic architecture as its predecessor, DeepSeek-V2, revolving around multi-head latent attention (MLA) and DeepSeekMoE. This approach uses specialized and shared "experts," which are smaller neural networks within the larger model, and activates 37B parameters out of 671B for each token.

DeepSeek-V3 incorporates two main innovations:

Auxiliary loss-free load-balancing strategy: This dynamically monitors and adjusts the load on experts to utilize them in a balanced way without compromising overall model performance.
Multi-token prediction (MTP): This allows the model to predict multiple future tokens simultaneously, enhancing training efficiency and enabling the model to perform three times faster, generating 60 tokens per second.

The model was pre-trained on 14.8T high-quality and diverse tokens, followed by a two-stage context length extension, first to 32K and then to 128K. Post-training included Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) to align it with human preferences and unlock its potential. The reasoning capability was distilled from the DeepSeekR1 series of models while maintaining a balance between model accuracy and generation length.

During training, DeepSeek used multiple hardware and algorithmic optimizations, including the FP8 mixed precision training framework and the DualPipe algorithm for pipeline parallelism, to reduce costs. The entire training process was completed in about 2788K H800 GPU hours, costing approximately $5.57 million.

The code for DeepSeek-V3 is available on GitHub under an MIT license, and the model is provided under the company’s model license. Enterprises can test the model via DeepSeek Chat, a ChatGPT-like platform, and access the API for commercial use.

Google unveils Project Mariner: AI agents to use the web for you - TechCrunch [Link]

Apple Explores a Face ID Doorbell and Lock Device in Smart Home Push - Bloomberg [Link]

Are Amazon’s Drones Finally Ready for Prime Time? - The New York Times [Link]

OpenAI announces new o3 models - Techcrunch [Link]

BBC complains to Apple over misleading shooting headline - BBC [Link]

The BBC has lodged a complaint with Apple after its new AI feature, Apple Intelligence, generated a false headline about a high-profile murder case in the U.S. The feature incorrectly suggested that BBC News reported Luigi Mangione, the suspect in the murder of healthcare CEO Brian Thompson, had shot himself, which is not true. A BBC spokesperson stated they contacted Apple to address the issue. Apple has not commented on the situation.

Tesla's New Bot - Tesla Optimus on X [Link]

Elon Musk files for injunction to halt OpenAI’s transition to a for-profit - TechCrunch [Link]

2024 November - What I Have Read

Posted on 2024-11-01

Substack

Microsoft: Capacity Constrained - App Economy Insights [Link]

Highlight what to watch looking forward: 1) Microsoft is addressing its data centers’ increasing power demands by turning to nuclear energy, 2) Microsoft is launching autonomous AI agents in November, introducing tools that enable businesses to automate routine tasks and boosting efficiency: Copilot Studio allows business create their own AI agents with minimal coding knowledge; and it will offer 10 ready-to-use agents covering everyday business needs.

Can Large Language Models Reason? - AI: A Guide for Thinking Humans [Link]

Current evidence suggests that LLMs simulate reasoning rather than genuinely reasoning. This highlights the need for careful evaluation of LLMs’ generalization capabilities, especially as AI is increasingly integrated into complex decision-making contexts.

Meta’s early AR unveiling may come with competitive trade-offs. According to Bloomberg, Apple has launched its own smart glasses initiative, a market study called “Atlas,” signaling a potential shift from its high-end $\$3,500$ Vision Pro VR headset. Apple recently cut its Vision Pro shipment target to less than half a million units in the first year—down from an initial target of 3 million.

Meta is pursuing a two-pronged approach to AR glasses:

Orion has a hardware challenge (powerful but still cumbersome).

Rayban Meta glasses have a software challenge (lightweight but only offering relatively simple use cases).

― Meta: AI Killed The Video Star - App Economy Insights [Link]

Current stage of Meta Orion: 1) prototype (not product), 2) advanced AR display (micro LED projectors and silicon carbide lenses), 3) interactive AI capabilities, 4) hardware complexity (neural wristband for control and a wireless compute puck for functionality), 5) high costs ($10K per unit) and limited production, 6) future vision - to release a consumer-ready AR device within a few years, targeting a more affordable product closer to smartphone price levels.

AI’s impact on Meta: 1) engagement: Meta’s recommendation system provides most relevant content to users, attracting users to spend more time on Apps, 2) monetization: Gen AI assists with ad copy, image, and video production, while new models analyze user actions before serving specific ads, ultimately increasing conversions at the margins.

About Meta AI Studio (for developers to create, train, and deploy custom AI models across Meta’s ecosystem): the goal is to drive the next wave of consumer apps and maximize ad potential across its platforms.

The discussion on “The Death of Creator Economy” is interesting and insightful. It’s true - as Meta moves towards an AI-centered model, creators may find themselves competing against the platforms that once supported them. By relying on AI, Meta could optimize ad placements and user engagement without the cost of creator compensation. This is a departure from platforms like YouTube, which incentivize creators with ad revenue shares. The broader impact could reshape the landscape of online content. As AI-generated feeds become the norm, audiences may eventually consume content that’s been strategically tailored by algorithms rather than creators. The creative autonomy that once defined social media could shift to a more managed, homogenized experience, where what we see is driven less by personal expression and more by AI-calculated engagement metrics.

Pichai discussed five ways customers use Cloud:

AI Infrastructure: Performance and costs are key differentiators.

Vertex (Enterprise AI): Customizable models tailored for enterprises.

BigQuery (Data platform): Real-time analysis and decision-making.

Cybersecurity: Enhanced by Mandiant since 2022.

Applications: Including customer engagement or employee agents.

― Google: Little Engine That Cloud - App Economy Insights [Link]

What’s next:

Browser based Agent - Project Jarvis: an AI technology that can autonomously take over a web browser to handle tasks like research and shopping.
Waymo - closed massive funding round and has secured $\$5.6$B. mMajor backers are Andreessen Horowitz, Fidelity, and T. Rowe Price. Expansion would be driven by new funding through partnership with Uber.
AI power - Alphabet is partnering with Kairos Tech to harness small nuclear reactors to power AI data centers.
Search and competition: Google’s losing market share to TikTok and AI startups (Perplexity and OpenAI) but it is still the largest. Amazon’s search is catching up. TikTok and AI Chatbots are still tiny. Google’s decline in market share is likely primarily due to e-commerce based search on platform (Amazon).

Amazon: Still Day 1 For AI - App Economy Insights [Link]

On advertising: sponsored products remain a critical growth driver. Ad-supported Prime Video introduced in Q1 2024 automatically converted all Prime members to an ad-supported tier.

On lowering the cost to serve: 1) Expanding with over 15 new inbound buildings across the US, 2) Increasing same-day deliveries, 3) Advancing robotics and automation.

On pharmacy: significantly expanded with rapid delivery capability.

On Capex: aggressive infrastructure investments.

On Project Kuiper: Kuiper aims to provide fast, affordable internet via satellite. It is early in its journey but holds transformative potential for Amazon’s growth.

Tesla’s Cybercab could either compete with Uber’s platform or, as Khosrowshahi suggests, Cybercab fleet owners might choose to list their vehicles on Uber to maximize earnings. Uber’s reach and ability to cover diverse use cases—across vehicle sizes, geographies, and special needs—could lead to a hybrid model where Tesla AVs appear on Uber.

Tesla could ultimately leverage Uber’s scale and network, given the challenge of reaching a critical size in specific markets. AVs on Uber are already a reality with Waymo, and more will likely come.

― Tesla: Autonomy Gamble - App Economy Insights [Link]

Business Insights:

Deliveries rebounded in Q3, leading to an auto gross margin improvement.
Roughly 20% of Tesla‘s gross margin came from non-auto segments—nearly doubling from a year ago.
Lower cost per vehicle, growth in non-auto segments, FSD revenue, growth in deliveries, and higher regulatory credit revenue contribute to operating margin.
Free cash flow expanded and balance sheet remains stellar.

“We, Robot”s takeaways:

Cybercab (robotaxi), Optimus (Humanoid Robot), Robovan
FSD progress: promised to enable fully autonomous driving by 2026
Market reaction: uncertain about the timeline
Supercharger network: Most automakers have adopted Tesla’s North American Charging Standard (NACS).
Market share: Tesla’s vehicles market share has stabilized in North America and Europe but noticeably improved in China.
AI power: Musk still expects nearly 90,000 H100 clusters dedicated to training by the end of this year.
Energy storage deployment

Comparing Tesla and Waymo:

According to six SAE levels of driving automation (0 no automation, 1driver assistance, 2 partial automation, 3 conditional automation, 4 high automation, 5 full automation), Tesla’s FSD remains at level 2, while Waymo operates at level 4.
Tesla relies on cameras and AI while Waymo relies on heavy hardware (LiDAR, radar, cameras).
Waymo’s reliance on expensive hardware limits its ability to scale quickly ($\$ 200$K per vehicle). Tesla aims to scale faster by leveraging its existing fleet to train its AI models.
Waymo has built trust with regulators by gradually deploying its vehicles, whileTesla faces regulatory hurdles particularly with the Cybercab.

Netflix: Crushing It Again - App Economy Insights [Link]

Deep Dive Into The Security for AI Ecosystem - Indiscrete Musings [Link]

_“*_This is an empirical law, not a fundamental physical law__. But the evidence is that it continues to scale. What we’re learning, however, is that it’s not enough, that we’ve now discovered two other ways to scale.*

One is post-training scaling. Of course, the first generation of post-training was reinforcement learning human feedback, but now we have reinforcement learning AI feedback, and all forms of synthetic data generated data that assists in post-training scaling.

And one of the biggest events and one of the most exciting developments is Strawberry, ChatGPT o1, OpenAI’s o1, which does inference time scaling, what is called test time scaling. The longer it thinks, the better and higher-quality answer it produces.”

― NVIDIA: The Age of AI - App Economic Insights [Link]

In an agent-first world, the traditional approach to A/B testing becomes obsolete. Instead of testing different button colors or copy variations for human users, companies like Amazon will need to optimize for agent interaction efficiency and task completion rates.

These A/B tests will target similar metrics as today: purchases, sign-ups, etc., employing LLMs to generate and test thousands of agent personas without the need for lengthy user testing cycles.

― Agent-Responsive Design: Rethinking the web for an agentic future - AI Tidbits [Link]

Several interesting vision for AI Agent world: 1) the death of traditional A/B testing, 2) switch from SEO to AEO (Agent Engine Optimization), 3) web moving from being bot blocked to bot embraced.

This is because AIs are inconsistent and weird, and often have different results across different models. For example, they are sensitive to small changes in spacing or formatting; they get more accurate when you tell them to “read the question again;” they seem to respond better to politeness (but don’t overdo it); and they may get lazier in December, perhaps because they have picked up on the concept of winter break.

― Getting started with AI: Good enough prompting - One Useful Thing [Link]

These ideas are important to learn, as they broaden the scope of what is possible with LLMs. For example, using these techniques, we can:

Allow an LLM to access an external knowledge database.

Enable complex, reasoning-based problems to be solved.

Provide unlimited memory to an LLM by allowing the model to store and access prior information from a conversation.

― Advanced Prompt Engineering - Deep (Learning) Focus [Link]

Article covers CoT prompting, automatic prompting (interesting idea: “we could even consider our prompt as a group of trainable parameters that can be updated (e.g., using gradient descent or some other data-driven criteria) to generate a correct answer”), information retrieval, etc.

Energy Drink Economics - App Economy Insights [Link]

In my view, 2025 will be the year major AI agent frameworks compete for developers globally.

What makes these workflows special is their flexibility. The same principles we used for research papers can be applied to industry reports, technical documentation, or any complex text. The YouTube synthesis approach works just as well for conference talks, interviews, or training videos.

― How to use NotebookLM for personalized knowledge synthesis - AI Supremacy [Link]

New AI Agent based Applications:

Google Learn About for education
Perplexity as the advent of AI commerce, partnering with US campuses and Shopify.
Amazon’s Multi Agent Orchestrator via AWS
Google NotebookLM for researching, podcasting.

Google NotebookLM:

Capabilities
- It stays focused on your sources - unlike ChatGPT, it shouldn’t hallucinate or bring in outside information
- It can process multiple documents at once, finding connections between them
- It generates natural-sounding podcast discussions about your content
- It provides source citations for everything, linking directly to the original text
- It’s completely free (for now)
Workflows (research papers and YouTube videos)
- Research papers:
  1. Overview phase: Create a discussion that focuses on the key methodology choices, main findings, limitations and gaps, and connections to existing research. Present it for a non-technical audience.
  2. Deep understanding: Ask about key assumptions in their methodology, explore alternative approaches they might have considered, and examine how their findings compare to related work.
  3. Synthesis phase: Compare and contrast these papers’ approaches and findings. Identify patterns, contradictions, and gaps that could inform future research.
- YouTube videos:
  1. Overview phase: Create a comprehensive discussion about AI agents, focusing on unique perspectives from each source.
Tips and Pitfalls
- Don’t overload with too many documents at once
- Avoid overly broad instructions like “tell me everything important”
- Don’t skip the customization step
- Remember to specify your audience level (this drastically improves output quality)

We don’t want bias-free AI. We want an AI with biases that are explicit (we know exactly what it looks at), controllable ( we can influence how much it looks at a factor), and agreeable (the biases in the AI must be compatible with our standards of morality, ethics, and law).

― A look at Bias in Generative AI [Thoughts] - Artificial Intelligence Made Simple [Link]

The author pointed out sources of biases (process, dataset, model, and post-generation control mechanism). He highlighted that transparency is the solution. Technical transparency includes:

Attention visualization tools
Token-level confidence scores
Explanation generation mechanisms
Citation and source tracking
Agentic architecture and separation of conerns
Access to embedding models

And he also recommended several development practices to promote AI pipeline transparency: publishing open source models, creating synthetic data, creating transparent standards, and involving external auditors.

Why Data is an Incomplete Representation of Reality [Thoughts] - Artificial Intelligence Made Simple [Link]

This article argues that Data reflects our biases and values rather than providing an objective view of the world and data alone is insufficient for achieving superhuman AI. It offers three types of intelligence that are often overlooked in datasets: cultural intelligence, delusional intelligence, and subjective intelligence.

In my view, those are the gaps between AI and human. AI becomes human if those intelligence are acquired. However the question is, should AI become human first before becoming superhuman AI? It is true that human level is not skippable in the path to AGI?

Some interesting further discussion points implied by this blog:

How can we better incorporate cultural intelligence into AI training datasets and algorithms?

Other than broadening data or documenting practices, what’s more interesting is to develop AI system that can identify and adapt to different cultural contexts
What are the ethical implications of AI systems lacking delusional and subjective intelligence?

Be prone to perpetuating existing biases and discriminatory practices without subjective consideration. Limit problem solving capabilities (? But creativity of LLM can be tuned by setting parameters). Not able to adapt to cultural nuances.
What are the limitations of relying solely on quantitative metrics in evaluating AI performance?

Lead to exclusion of crucial qualitative factors; incentivize the optimization of narrow objectives rather than broader well-being; not able to capture the complex and nuanced nature of human intelligence.

Here are a few examples you’ve all experienced first-hand:

Public Cloud enabled the SaaS economy

The iPhone enabled the App economy

Social media enabled the Creator economy

LLMs gives rise to the Agentic economy

― Agentic Revolution - Startup Riders [Link]

How to use Perplexity in your daily workflow [Link]

Perplexity now has a desktop app. Fantastic application. Comparable or better than Google Search. For a learner like me, it’s a good tool to address my question efficiently and help with note taking.

Articles and Blogs

Enthusiasm for ChatGPT spread with Linton’s buy-in, prompting the company to launch a pilot program to identify key use cases. Today, ChatGPT is an integral part of Promega’s workflows, with over 1,400 custom GPTs used by 80% of the company.

Members of Promega’s Quality Assurance team automate customer requests and responses with a custom GPT that integrates with their Power Automate workflow. “With this AI-powered solution, we provide timely, accurate responses to over 250 quality surveys a year,” says Abigail David, Director of Quality Assurance. “The automation reduces internal workload by more than 600 hours annually and delivers key documents, like certifications and quality policies, effortlessly to our customers.”

My Prospecting Pal GPT, which quickly identifies vital information about a given prospect and suggests potential Promega offerings. “The GPT can highlight key research initiatives that might benefit from Promega solutions, or even common interests between the salesperson and the prospect to enable a natural dialogue. This has cut our lead analysis time by 1–4 hours per prospect, allowing us to focus more on relationship building,” says Franchestia Flennory, a Promega Account Manager.

Email Marketing Strategist GPT, which halves the time from content creation to campaign execution. In months, hundreds of marketing emails were deployed in half the usual time, saving 135 hours of work. “The time we get back from aligning on the strategy of emails can be invested into the user experience,” says Kari Siegenthaler, a Marketing Strategist with Promega. “I don’t know the last time I wrote an email without using this GPT.”

― Promega’s top-down adoption of ChatGPT accelerates manufacturing, sales, and marketing - OpenAI Blog [Link]

Rakuten’s goal is to become an “AI empowerment company.” They’re using Code Interpreter and RAG (retrieval-augmented generation) with OpenAI’s models to understand and extract value from complex, unstructured data, and the results have empowered customers and businesses in new ways:

Previously, users had to wait days to get a response to a customer service ticket. “By using OpenAI’s API with RAG on our internal knowledge base, we’re now able to respond to and help users automatically,” Kaji said. This innovation has significantly improved response times and efficiency.

Few people have time to wade through hundreds of user reviews when they’re shopping, so Rakuten is developing a feature that extracts key topics and summarizes reviews. “This will allow users to access and explore the information in a much more structured way,” Kaji said.

Knowledge retrieval has also made a large impact on Rakuten’s B2B business. Rakuten consultants are now empowering merchants and enterprises with actionable insights from the company’s wealth of data, such as market analyses and sales trends.

― Rakuten pairs data with AI to unlock customer insights and value - OpenAI Blog [Link]

YouTube and Podcast

Gaming, Goats & General Intelligence with Frederic Besse - Google DeepMind [Link]

Google Research Engineering Team Lead discusses a future of very intelligent AI agents.

LangGraph Deep Dive: Build Better Agents - James Briggs [Link]

A tutorial of building an AI research agent using LangGraph.

Solving complex problems with OpenAI o1 models [Link]

This video demonstrates o1 models’ advanced reasoning across complex domains like programming.

Lecture Series in AI: “How Could Machines Reach Human-Level Intelligence?” by Yann LeCun - Columbia Engineering [Link]

Stanford CS229 I Machine Learning I Building Large Language Models (LLMs) [Link]

This Stanford lecture is about how to build LLMs, mainly focusing on practical training aspects, data handling, and evaluation methods.. It covers:

Pre-training phase

Learn auto-regressive language modeling
Understand tokenization (BPE method)
Master cross-entropy loss calculation
Track model progress through perplexity

Post-training phase (after ChatGPT era)

Convert base models into AI assistants
Apply evaluation benchmarks like MMLU
Handle train-test contamination issues

Technical components

Select proper model architecture
Implement training algorithms
Process training data
Set up evaluation metrics
Build system infrastructure

Dario Amodei: Anthropic CEO on Claude, AGI & the Future of AI & Humanity | Lex Fridman Podcast #452 [Link]

Papers and Reports

Large Language Models Are Human-Level Prompt Engineers [Link]

They introduced APE, a system for automatic prompt generation, which selects the most effective instructions for large language models (LLMs) to perform various tasks.

Automatic Prompt Optimization with “Gradient Descent” and Beam Search [Link]

They proposed an automatic prompt optimization method. Inspired by gradient descent, it generates textual “gradients” that identify prompt weaknesses and edits the prompt in the opposite semantic direction.

Collecting errors made by the current prompt on the training data.
Summarizing these errors via a natural language gradient.
Using the gradient to generate several modified versions of the prompt.
Selecting the best of the edited prompts.
Repeating this process several times.

GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models [Link]

They introduced GRIPS (Gradient-free Instructional Prompt Search) as a gradient-free, edit-based method to improve natural language prompts for LLMs without needing gradient-based tuning.

RIPS takes human-designed instructions and automatically edits them to enhance performance. It involves random phrase-level edits like deletion, swapping, paraphrasing, and addition, which are scored based on task performance.

Large Language Models as Optimizers [Link]

This research introduces Optimization by Prompting (OPRO), an approach that leverages LLMs as optimizers by using natural language to describe optimization tasks. OPRO can be applied to linear regression, traveling salesman, and prompt optimization, where OPRO finds instructions that maximize task accuracy.

Describing an optimization task in natural language.
Showing an optimizer LLM examples of prior solutions to the optimization task along with their objective values.
Asking the optimizer LLM to infer new / better solutions to the problem.
Testing the inferred solutions via an evaluator LLM.

Prompting Guide 101 - Gemini for Google Workplace [Link]

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models [Link]

Interesting findings on model variability: 1) Significant performance variations when questions are rephrased or when only numerical values are altered. 2) Models demonstrate robustness to superficial changes (e.g., proper names) but are highly sensitive to numerical changes.

Interesting findings on model complexity and fragility: 1) Model performance deteriorates as the number of clauses in a question increases, revealing challenges with handling complexity, 2) Adding irrelevant but seemingly relevant clauses leads to a performance drop of up to 65% in some models.

Insights in reasoning: 1) The decline in performance suggests LLMs rely on pattern matching rather than genuine logical reasoning, 2) Models replicate training data patterns rather than solving problems from first principles.

Thinking LLMs: General Instruction Following with Thought Generation [Link]

Current LLM has a problem of lacking internal reasoning processes before outputting responses. Explicit thinking can enhance performance on complex tasks, including creative writing and problem-solving, by allowing models to internally reason and plan responses. The author introduces Introduces Thought Preference Optimization (TPO) which allows LLMs to generate multiple thought-response pairs for each instruction, while a judge model evaluates responses, selecting the best and worst pairs for optimization.

Agent-as-a-Judge: Evaluate Agents with Agents [Link]

They introduced the Agent-as-a-Judge Framework to evaluate agentic systems, addressing limitations of existing evaluation methods like LLM-as-a-Judge by offering dynamic, step-by-step feedback throughout task-solving processes.

Difficulties handling numbers may stem from the fact that most models rely on autoregressive next token prediction pretext tasks during training, which might not be suitable for mathematical operations, or simply because a limited number of numerical reasoning tasks are included in the model’s training corpora. Nevertheless, it is known that performance can be improved using prompt techniques, indicating that relevant knowledge may already exist within LLMs.

Evaluating and enhancing probabilistic reasoning in language models - Google Research [Link]

What Are the Odds? Language Models Are Capable of Probabilistic Reasoning [Link]

This study introduces a benchmark dataset with question-answer pairs based on both idealized and real-world distributions. It enables systematic evaluation of LLMs’ probabilistic reasoning capabilities across three tasks: estimating percentiles, drawing samples, and calculating probabilities.

The technology has strikingly disparate effects across the productivity distribution: while the bottom third of scientists see little benefit, the output of top researchers nearly doubles.

Top scientists leverage their domain knowledge to prioritize promising AI suggestions, while others waste significant resources testing false positives.

82% of scientists report reduced satisfaction with their work due to decreased creativity and skill underutilization.

― Artificial Intelligence, Scientific Discovery, and Product Innovation [Link]

MIT PhD Aidan Toner-Rodgers’s working paper talking about some very interesting points. Indeed, AI has reshaped R&D process especially in natural science and material science where structured search is required .e.g drug discovery, climatology, etc. However, scientists with different degree of expertise (top and bottom scientists) achieve drastically different productivity with AI, giving bottom scientists less benefits. This characteristic has some consequences and implications

Resources are misallocated to less promising AI suggestions. Human innovation and creativity is not encouraged and cultivated.
Expertise is still required as AI only demonstrates its potential when complemented by human expertise. The judgment ability in leveraging AI’s potential is important.
Skills have been shifted to prompting AI effectively. However, scientists feel an underutilization of expertise when working with AI.

A Survey on LLM-as-a-Judge [Link]

There are a lot of applications of LLM-as-a-Judge.

Data annotation: labeling datasets with information such as sentiment, topic categorization, or relevance.
Content critique: providing feedback on generated content such as articles, essays, or code.
Domain-specific evaluations: evaluate the accuracy, completeness, and clarity of financial analyses or advice (in finance), and assess medical responses for correctness, compliance with guidelines, and patient safety (for medical Q&A).

Looking Inward: Language Models Can Learn About Themselves by Introspection [Link]

The researchers define introspection as “acquiring knowledge that is not contained in or derived from training data but instead originates from internal states”.

They conducted interesting experiments: finetuning LLMs to predict properties of their own behavior in hypothetical scenarios. It turns out that a LLM can predict itself better than other models predicting it, even those models are trained on the same data pool.

Conclusion is suprising - language models have knowledge about themselves that is neither contained in their training data nor inferable from it. The researchers developed a self-prediction training framework where models predict properties of their hypothetical responses.There is already LLM research areas in honesty, behaviors, etc. I believe this work is hugely contributing to these areas.

A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration [Link]

Some interesting findings: 1) coherent CoT is better than traditional CoT because the former considers the connections between steps, 2) model is more sensitive to errors in intermediate reasoning steps than in the final answer

The authors proposed an error aware training method which works to incorporate both corretc and incorrect reasoning paths, enabling LLMs to recognize and handle potential reasoning errors.

Though businesses are doing their diligence on ROI and customization, they may miss crucial pieces of the implementation puzzle. Often, organizations discover too late that they’ve underestimated the importance of technical integration, ongoing support, and scalability. It’s a bit like buying a car based solely on fuel efficiency, only to realize later that service availability and ease of maintenance are just as critical over the long haul.

― 2024: The State of Generative AI in the Enterprise [Link]

Key Trends for 2024 onwards

There is a serious commitment from enterprise to AI integration in business strategies
The top use cases for generative AI focus on enhancing productivity and efficiency. These include:
- Code Copilots (51% adoption)
- Support Chatbots (31% adoption)
- Enterprise Search + Retrieval (28% adoption)
- Data Extraction + Transformation (27% adoption)
- Meeting Summarization (24% adoption)
There’s a growing trend towards autonomous AI agents capable of managing complex processes independently.
Businesses are focused on tools that deliver measurable value (ROI) and industry-specific customization, rather than simply looking for the cheapest option.
Industry-specific, verticalized AI applications are gaining momentum, particularly in:
- Healthcare ($500 million in enterprise spending)
- Legal ($350 million in enterprise spending)
- Financial Services ($100 million in enterprise spending)
- Media and Entertainment ($100 million in enterprise spending)
Companies prefer multi-model strategies. This has led to a decline in OpenAI’s dominance, while Anthropic is gaining market share.
Retrieval-augmented generation (RAG) has become the dominant design pattern, with 51% adoption. Meanwhile, agentic architectures are emerging, now powering 12% of implementations.
There is a talent drought as AI engineering becoming more sophisticated.
There’s a growing trend towards companies building their own AI solutions in-house.
- Previously, in 2023, a large majority of enterprises (80%) relied on third-party vendors for their generative AI software
- In 2024, the split between building and buying is almost even, with 47% of solutions developed internally and 53% sourced from vendors
- This shift suggests a growing confidence among enterprises in their ability to develop and implement their own AI tools.
- while there’s a trend towards building in-house solutions, companies are not abandoning vendors entirely. The sources still highlight the importance of vendors, especially for companies lacking the resources or expertise for in-house development. The even split between building and buying suggests a hybrid approach is emerging, where companies strategically choose which solutions to develop internally and which to procure from vendors.

Articles and Blogs

Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks - Microsoft Research [Link]

Magentic-One is built on Microsoft’s AutoGen framework. It employs a unique dual-loop architecture where the Orchestrator manages both task and progress ledgers. This is an early movement of building generalist agentic systems. Other current LLM-based applications like RAG will also benefit from this type of system.

Introducing Internal Knowledge Search and Spaces - Perplexity [Link]

Internal Knowledge Search and Spaces enable simultaneous searches of organizational files and the web. This feature addresses the need for a unified tool to access both internal and external data, leveraging advanced LLMs like GPT-4 and Claude 3 to enhance search efficiency and relevance.

After the introduction of ChatGPT, there was a 21% decrease in the weekly number of posts in automation-prone jobs compared to manual-intensive jobs. Writing jobs were affected the most (30.37% decrease), followed by software, app, and web development (20.62%) and engineering (10.42%).

To stay competitive, employees must engage in continuous learning and upskilling. In their book Prediction Machines, authors Ajay Agrawal, Joshua Gans, Avi Goldfarb argue that AI is shifting the focus of work away from predictive tasks to those requiring human judgment and decision-making.

― Research: How Gen AI Is Already Impacting the Labor Market - Harvard Business Review [Link]

Research reveals impact of GenAI applications (ChatGPT and image-generating AI) in jobs (manual intensive jobs such as data and office management, video services, and audio services; automation prone jobs such as writing, software, app, web dev, and engineering, and image-generating jobs such as graphic design and 3D modeling ) to see challenges and opportunities in shifting markets.

They found that Gen AI “led to nearly immediate decreases in posts for online gig workers across job types, but particularly for automation-prone jobs. “ It shows a growing trend of job replacement.

Suggestions are continuous learning, enhancing human judgment and decision making, to be able to ask right questions, prompt efficiently, and avoid blindly taking responses.

How Much GPU Memory is Needed to Serve a Large Language Model (LLM)? [Link]

Addresing a common LLM interview question “How much GPU memory is needed to serve a Large Language Model (LLM)?”.

\[ M = ({P \times 4B \over {32/Q}}) \times 1.2 \] where P is model size, 4B is 4 bytes used per paramter, Q is the number of bits for loading the model (16 bit or 32 bit). 1.2 accounts for a 20% overhead.

AI Agent Stack [Link]

GitHub

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models [Link]

News

Introducing ChatGPT search - OpenAI [Link]

Perplexity introduces AI-powered finance tool, offering real-time stock analysis and historical data for developers. [Link]

VS Code now supports GitHub Copilot chat search and visualization [Link]

Cognitive Scientist Gary Marcus Says AI Must Be Regulated. He Has a Plan [Link]

Among several points made by the article, two were caught by my eyes:

Elon Musk presents a complex figure in the AI landscape, as one of the first to issue warnings about potential risks of AI, and also actively involved in developing AI through his company. This duality raises questions about his stance on AI and how he reconciles his concerns with his entrepreneurial pursuits.
Marcus proposes a shift from the current “System One” thinking in AI, which is fast and reflexive but prone to errors, to a “System Two” approach that emphasizes deliberate reasoning and abstraction.

Good to Great

Posted on 2024-10-21

“Good to Great: Why Some Companies Make the Leap…And Others Don’t” written by Jim Collins - I started to read this book on Aug 24 and recently finished it. It summarizes a research study uncovering patterns and principles that differentiate “great” companies from the rest. I find it also helpful for professional development. Two most impressive concepts to me are “Level 5 Leadership” and “The Hedgehog Concept”.

Level 5 Leadership

There are five levels of leadership: 1) level 1 - highly capable individual, 2) level 2 - contributing team member, 3) level 3 - competent manager, 4) level 4 - effective leader, 5) level 5 - executive leader.

Level 5 Leadership is the highest level and is marked by a paradoxical blend of personal humility and professional will:

Personal Humility: Level 5 leaders are modest, understated, and self-effacing. They rarely seek public attention and often attribute success to others, to good luck, or to external factors. They avoid the limelight and focus on the success of the organization rather than their personal accolades.
Professional Will: Despite their humility, these leaders possess an intense resolve and determination to do whatever it takes to make the company great. They are incredibly ambitious, but their ambition is channeled toward the organization, not personal gain. They set high standards and push the company toward greatness with unwavering tenacity.

Characteristics of level 5 leaders:

Focus on long-term success: They prioritize the enduring success of the company rather than short-term wins or personal gain.
Credit to others: They credit the team, luck, or external factors for successes but take personal responsibility for failures or setbacks.
Resolve in tough times: They confront difficult realities head-on and have a steadfast determination to overcome obstacles, never losing faith in the company’s ability to succeed.
Succession planning: They ensure that the company can continue its success without them, often preparing successors who will carry the torch without a dip in performance.

The Hedgehog Concept

The Hedgehog Concept is a central idea in this book. It involves identifying the intersection of three crucial areas. When a company operates within this intersection, it can focus its efforts on what it’s passionate about, what it can truly excel at, and what drives its economic success. This creates a clarity of focus that allows a company to ignore distractions and build sustained momentum.

It’s not only about company. Individuals can use it to guide their career choices and personal development. By aligning your career or life mission with your personal Hedgehog Concept, you’re more likely to find purpose, fulfillment, and success. Think about the three questions:

What are you deeply passionate about? - your personal mission, what gives you energy and a sense of fulfillment.
What can you be the best in the world at? - your unique strengths and abilities.
What drives your economic engine? - how you can generate income or provide value in a way that sustains your livelihood.

I’m glad that I found my answers to these three questions when I was 20 years old. I’m 100% sure that my answers won’t change through my whole life no matter what happened or will happen. The answers in my mind - I believe I’m born for it, it’s the mission of my life. So how about you?

Advanced RAG

Posted on 2024-10-19

There are many enterprise products built almost solely on RAG.

Naive RAG

The standard RAG workflow consists of three main steps as illustrated in the graph below:

Indexing: Creating an index of documents for retrieval.
Retrieval: Searching the index for relevant documents based on a user query.
Generation: Using a language model to generate answers or responses based on the retrieved documents.

The three steps all face possible issues:

Indexing:
- Poor document parsing.
- Inefficient document chunking strategies.
- Weak semantic representations from embedding models.
- Non-optimized index structures.
Retrieval:
- Low relevance: retrieved documents are not highly relevant to the user query (low accuracy).
- Incomplete retrieval: not all relevant documents are retrieved (low recall).
- Redundancy: retrieved documents may be repetitive or redundant.
- Queries are often not specific or well-defined.
- Retrieval strategies might not be well-suited to the use case and may rely solely on semantic similarity.
Generation:
- Overreliance on the retrieved content, leading to issues such as irrelevant or even harmful responses (e.g., toxic or biased content).

This paper “Retrieval-Augmented Generation for Large Language Models: A Survey” discussed several problems associated with Naive RAG implementations. The advanced approaches to RAG attempt to overcome the limitations of naive RAG by improving the way queries are processed, documents are retrieved, and responses are generated. Advanced RAG techniques focus on refining each step of the process, from query transformations to more efficient retrieval strategies.

Advanced RAG

Overview

Source: LangChain

Pre-Retrieval Enhancements

Query Transformations / Translation

Query transformations are techniques aimed at re-writing or modifying the input questions to improve the retrieval process.

Query transformation types:

Some notable methods include:

Multi Query:

The MultiQueryRetriever automates prompt tuning by using a language model (LLM) to generate multiple queries from different perspectives for a given user query. It retrieves relevant documents for each generated query and combines the results to create a larger, more comprehensive set of potentially relevant documents. This technique helps mitigate some of the limitations of distance-based retrieval, save time on experimenting with different prompts, and provides a richer set of results.

LangChain Tutorial: How to use MultiQueryRetriever.

LangChain API: MultiQueryRetriever.

Video Tutorial: RAG from Scratch (Part 5 - Query Translation: Multi Query).
RAG Fusion

RAG-Fusion combines RAG and Reciprocal Rank Fusion (RRF) by generating multiple queries, reranking them with reciprocal scores and fusing the documents and scores. RRF gives the more relevant retrieval results higher scores and re-ranks them according to the scores. RAG-Fusion was able to provide accurate and comprehensive answers due to the generated queries contextualizing the original query from various perspectives.

Paper: A New Take on Retrieval-Augmented Generation.

Code: Raudaschl/rag-fusion

LangChain Cookbook：RAG Fusion

Video Tutorial: RAG from scratch: Part 6 (Query Translation – RAG Fusion)
Step-Back Prompting

Step back prompting refers to the technique of generating a more generalized or abstract version of a specific query in order to mitigate potential issues with search quality or model-generated responses. This involves first reformulating the initial question into a broader or higher-level version (the “step back” question) and then querying both the original and the generalized question to improve the comprehensiveness and relevance of the responses.

Paper: Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models“.

LangChain Tutorial: Step Back Prompting

LangChain Cookbook: Step-Back Prompting (Question-Answering)

Video Tutorial: RAG from scratch: Part 8 (Query Translation – Step Back)
Decomposition:

When a user asks a complex question, a single query might not retrieve the right results. To address this, the question can be broken into sub-questions, each of which is retrieved separately, and the answers are combined.

LangChain Doc: Decomposition

Video Tutorial: RAG from scratch: Part 7 (Query Translation – Decomposition)
- Least-to-Most Prompting
  
  The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts.
  
  Paper: Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
  
  Video Tutorial: RAG from scratch: Part 7 (Query Translation – Decomposition)
- IR-Cot
  
  An approach for multi-step QA that interleaves retrieval with steps (sentences) in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. It incorporates the idea of least-to-most prompting into RAG to improve retrieval, resulting in factually more accurate CoT reasoning.
  
  Paper: Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions
  
  IR-CoT Code：https://github.com/StonyBrookNLP/ircot
Hypothetical Document Embeddings (HyDE): Given a query, HyDE first zero-shot instructs an instruction-following language model to generate a hypothetical document. The document captures relevance patterns but is unreal and may contain false details. Then, an unsupervised contrastively learned encoder (e.g. Contriever) encodes the document into an embedding vector. This vector identifies a neighborhood in the corpus embedding space, where similar real documents are retrieved based on vector similarity.

Simply speaking, HyDE uses responses to retrieve documents rather than using queries to retrieve documents. The rational behind this approach is that the semantic similarity between query and real document is smaller than the semantic similarity between hypothetical document and real document.

LangChain Doc: Hypothetical Document Embeddings

Paper: Precise Zero-Shot Dense Retrieval without Relevance Labels

LangChain Cookbook: Improve document indexing with HyDE
New queries based on historical dialogues

This is a required technique for developing a chatbot or a conversational RAG.

LangChain Tutorials: Conversational RAG; Build a Chatbot; How to add message history; How to add memory to chatbots

LangChain Code: create_history_aware_retriever

Query Construction

Query construction refers to converting a natural language query into the query language specific to the database you are working with. This is essential for interacting with different databases and vector stores that require structured queries for more efficient document retrieval.

Check which vector databases support filtering: https://superlinked.com/vector-db-comparison

Data can be structured, unstructured or semi-structured (see demo below). This requires LLMs to have capability of query construction.

Examples	Data Source	References
Text-to-metadata-filter	VectorStore	Docs
Text-to-SQL	SQL DB	Docs; Blog; Blog
Text-to-SQL + Semantic	PGVector supported SQL DB	Cookbook
Text-to-Cypher	Graph DB	Blog; Blog

Self-query retriever

A self-querying retriever is one that, as the name suggests, has the ability to query itself. Specifically, given any natural language query, the retriever uses a query-constructing LLM chain to write a structured query (usually in JSON) and then applies that structured query to its underlying VectorStore. This allows the retriever to not only use the user-input query for semantic similarity comparison with the contents of stored documents but to also extract filters from the user query on the metadata of stored documents and to execute those filters.

LangChain Docs:

(v0.2): How to do “self-querying” retrieval

(v0.1): Self-querying

Integration: Components -> Retrievers -> Self-querying retrievers -> Qdrant

Text-to-metadata-filter: VectorStores equipped with metadata filtering enable structured queries to filter embedded unstructured documents.
Prompt templates and output parsers

Prompt analysis and prompt template: converting user’s query to filtering conditions
- When constructing queries, the system uses a specific JSON format to organize the query and filters. The prompt is designed to create structured queries that can be applied to a document database or vector store. The queries consist of two main components:
  - Query: The natural language query string that is used to match the document content.
  - Filter: Logical conditions used to filter the documents based on specific metadata attributes.
- Comparison Operations
  
  Comparison operators (comp) are used to compare attributes (like year, name, time, product, or team) in the document with specific values provided by the user. Here are the comparison operators:
  - eq: Equals (e.g., eq("team", "TSE") matches documents where the team is “TSE”).
  - ne: Not equal (e.g., ne("name","Ashley") matches documents where the year is not 2022).
  - gt: Greater than (e.g., gt("year", 2023) matches documents with a year greater than 2023).
  - gte: Greater than or equal to (e.g., gte("year", 2022) matches documents from the year 2000 or later).
  - lt: Less than (e.g., lt("year", 2021) matches documents created before 2021).
  - lte: Less than or equal to (e.g., lte("time", 13) matches documents with a time length of 13 mins or lower).
  - contain: Contains (e.g., contain("product", "gold") matches documents where the product contains the word “gold”).
  - like: Similar to or like (used for pattern matching).
- Logical Operations
  
  Logical operators combine multiple conditions (comparisons) into a single filter:
  - and: Logical AND (e.g., and(gt("year", 2022), eq("product", "gold")) matches documents created later than year 2022 and are related to gold card product).
  - or: Logical OR (e.g., or(eq("team", "TS"), eq("team", "TSE")) matches documents that are either TS or TSE).
  - not: Logical NOT (e.g., not(eq("name", "Ashley")) matches documents where Ashley is not the owner).
Output parser: This output parser can be used when you want to return multiple fields or you need the response to be formatted.

LangChain Docs: Structured output parser

API: StructuredQueryOutputParser

Advanced Retrieval Techniques

Vector Store-Backed Retriever: A retriever that uses a vector database to store document embeddings and retrieve documents based on their proximity to the query embedding.
Fusion Retrieval or hybrid search: Combining multiple retrieval strategies (semantic similarity retrieval; keywords retrieval) to obtain a more diverse set of results.

LangChain Docs:

v0.2: How to combine results from multiple retrievers

v0.1: Ensemble Retriever

API: EnsembleRetriever

Code: EnsembleRetriever

The EnsembleRetriever is a retrieval strategy that enhances retrieval performance by combining multiple retrievers. This approach leverages the strengths of different types of retrievers to compensate for each other’s weaknesses. A common example is combining a Sparse Retriever (e.g., BM25, which performs keyword-based retrieval) with a Dense Retriever (which performs semantic similarity retrieval based on embeddings). This combination works because sparse and dense methods complement each other.

Sparse vs. Dense Representation
1. Sparse Representation:
  - High-dimensional sparse vectors: Documents and queries are represented as high-dimensional vectors, but most dimensions have zero values. This is typical of traditional information retrieval methods like TF-IDF and BM25.
  - Term frequency: Each dimension corresponds to a term, and the vector values represent term frequencies or weights (e.g., TF-IDF weights).
  - Sparsity: Since a document or query contains only a small subset of all possible terms, most dimensions in the vector are zero, which makes it “sparse.”
2. Dense Representation:
  - Low-dimensional dense vectors: Documents and queries are represented as low-dimensional vectors, where most or all dimensions have non-zero values. This representation is typically generated by deep learning models like BERT.
  - Semantic embeddings: The vectors capture semantic and contextual information, rather than just term frequency.
  - Density: All dimensions in the vector usually have non-zero values, hence “dense.”
Sparse and Dense Retrievers
- Sparse Retriever: The name comes from the fact that most elements in the vector representation of documents and queries are zero. It works well for exact keyword matches but may miss semantically relevant content that uses different vocabulary.
- Dense Retriever: The name reflects that the vector representation has mostly non-zero values. Dense retrievers perform better at capturing the meaning behind the text and finding semantically related content, even when the exact terms differ.
Combining Sparse and Dense Retrievers

By combining sparse and dense retrievers, the EnsembleRetriever can retrieve relevant documents more effectively:
- The Sparse Retriever excels at matching specific keywords or phrases.
- The Dense Retriever is better at capturing the semantic meaning and context, helping to retrieve documents even when exact terms differ.
This combination creates a more robust retrieval system, addressing both lexical matches (through sparse retrieval) and semantic relevance (through dense retrieval).

LangChain Doc: BM25 Retriever

API: BM25Retriever

Code: BM25Retriever

Python Package: rank_bm25
Sentence Window Retrieval: Retrieving extended context pre and post the relevant context, rather than only retrieving the relevant context, which can reduce information lost.
Parent Document Retrieval: Instead of sending the multiple smaller chunks to the LLM, the system merges them into their larger parent chunk. This allows for more contextualized information to be fed to the LLM, giving it a broader and more coherent set of data to generate an answer.

LangChain Doc: Parent Document Retriever

API: ParentDocumentRetriever

Code: ParentDocumentRetriever
Hierarchical index retrieval: By structuring the search in two layers—summaries for broad filtering and chunks for detailed search—this hierarchical approach increases efficiency, making it easier to find and synthesize relevant information, especially when dealing with large document sets.
Hypothetical Questions: This technique involves having the language model generate hypothetical questions for each chunk of a document. These hypothetical questions are then embedded, and retrieval is performed based on these question embeddings, improving the relevance of the results.

LangChain Doc: hypothetical-queries
MultiVector Retriever: MultiVector Retriever is a higher level category of parent document retriever, hierarchical index retrieval, and hypothetical questions.

LangChain Doc: MultiVector

Summary: Runnable interface

Post-Retrieval Enhancements

Re-ranking: After retrieving the documents, the system re-ranks or filters them to ensure that the most relevant results appear at the top.

Reference

2024 October - What I Have Read

Posted on 2024-10-01

Substack

This new model.out_head output layer has its requires_grad attribute set to True by default, which means that it’s the only layer in the model that will be updated during training. Technically, training the output layer we just added is sufficient. However, as I found in experiments, finetuning additional layers can noticeably improve the predictive performance of the finetuned model.

― Building A GPT-Style LLM Classifier From Scratch - Sebastian Raschka [Link] [Github]

Interesting questions addressed by Sebastian:

Do we need to train all layers?

“For classification finetuning, it is not necessary to update all layers in an LLM. (The fewer weights we update, the faster the training will be because we don’t need to compute the gradients for these weights during backpropagation.)”
Why finetuning the last token, not the first token?

“In contrast to BERT, GPT is a decoder-style model with a causal attention mask. This means the first token has no context information of any other token in the input. Only the last token has information about all other tokens. Hence, if we want to use models like GPT for classification finetuning, we should focus on the last token to capture contextual information of all other input tokens.”
How does BERT compare to GPT performance-wise?

“The small GPT-2 model from the previous section and BERT performed similarly well on the spam classification dataset. “
Should we disable the causal mask?

“A core feature of the GPT architecture is the causal attention mask (different from BERT models or the original transformer architecture). However, we could actually remove the causal mask during classification finetuning, which would allow us to finetune the first rather than the last token since future tokens will no longer be masked, and the first token can see all other tokens.”
What impact does increasing the model size have?

The prediction accuracy can improve significantly with larger models.
What improvements can we expect from LoRA?

Both full finetuning (all layers) and LoRA can result in the same test set performance.

“On the small model, LoRA is slightly slower since the additional overhead from adding LoRA layers may outweigh the benefits, but when training the larger 1.5 billion parameters model, LoRA trains 1.53x faster.”
Padding or no padding? [experiments]

“If we want to process data in batches during training or inference (this involves processing more than one input sequence at a time), we need to insert padding tokens to ensure that the training examples are of equal length.

In regular text generation tasks, padding doesn’t affect the model response since padding tokens are usually added to the right side, and due to the causal mask discussed earlier, these padding tokens don’t influence the other tokens. However, remember that we finetuned the last token, as discussed earlier. Since the padding tokens are to the left of this last token, the padding tokens may affect the result. “

These Are The 6 Best Science-Based Study Strategies - Super Learning Lab [Link]

Spaced Practice

Instead of cramming all the information at once, spaced practice consists of revisiting the material multiple times with breaks in between.
Interleaving

This is about studying different topics in a sequence.
Retrieval

This consists of bringing learned information from mid to long-term memory by recall or retrieval practices.
Elaboration

Elaborative interrogation consists of asking and explaining why and how things work based on prior knowledge. In other words, it involves connecting new information to preexisting knowledge.
Concrete Example

When learning abstract concepts it was found that illustrating these topics with specific examples improves learning.
Dual Coding

Dual coding is about combining words with visuals. If you use relevant and helpful images in your notes, you may increase learning by remembering what you study with the help of these images.

The $\$120$ billion wagered on sports betting in America in 2023 translated into nearly $\$11$ billion in revenue for sports betting companies. This corresponds to the ~9% fee sportsbooks keep after all bets have been settled.

Flutter: Leverages FanDuel’s dominance and global expertise.

DraftKings: Focuses on innovation and user engagement to fuel growth.

Entain: Bets on BetMGM’s success in the US market.

Penn: Leverages the ESPN partnership to challenge established players.

― Sports Betting Economics - App Economy Insights [Link]

For decades, companies have outsourced their organizational innovation to consultants or enterprise software vendors who develop generalized approaches based on what they see across many organizations. That won’t work here, at least for a while. Nobody has special information about how to best use AI at your company, or a playbook for how to integrate it into your organization.

― AI in organizations: Some tactics - One Useful Thing [Link]

Issues with AI at the organizational level and how to solve them.

In many companies, there is little AI use and few productivity gains outside of narrow permitted use cases. That’s because AI use that boosts individual performance does not always translate to boosting organizational performance for a variety of reasons. To get organizational gains requires R&D into AI use and you are largely going to have to do the R&D yourself.
“Many key breakthrough innovations come not from central R&D labs, but from people actually using products and tinkering with them to solve their own problems. “ (Prof. Eric von Hippel). As users are very motivated to make their own jobs easier with technology, they find ways to do so. The user advantage is especially big in experimenting with Generative AI because the systems are unreliable and have a jagged frontier of capability. People are experimenting with AI and finding it very useful. But they aren’t sharing their results with their employers.

How to solve the issues? What are the tactics? Talents in the lab should focus on building, not analysis or abstract strategy.

Build AI benchmarks for your organization. [Anthropic’s guide to benchmarking]
Build prompts and tools that work.
Build stuff that doesn’t work… yet.
Build provocations and magic.

The USA vs Visa - Net Interest [Link]

Key elements of Doha Mekki’s recent antitrust lawsuit against Visa:

Visa controls over 60% of U.S. debit transactions, with Mastercard far behind at 25%.
Visa traps merchants with pricing that penalizes them if they don’t process all transactions through Visa.
Exclusive deals incentivize merchants to use Visa exclusively, reducing competition.
Visa prevents potential competitors like PayPal and Apple from entering the market by locking them into restrictive agreements.
Visa has faced antitrust lawsuits since 1971 and maintains a large legal team to manage ongoing cases.

In physics, we study how particles or systems’ units interact and evolve toward stable states. In machine learning, we study how neurons (or artificial neurons) interact to learn patterns directly from data. The connection lies in energy minimization: both approaches define an energy function to describe the stability of a system, and the optimization of this function helps to find optimal configurations that correspond to useful patterns or memories.

Hopfield developed a network that recreates patterns using energy minimization, while Hinton expanded on this with the introduction of Boltzmann machines, statistical physics-based systems that learn to recognize and generate patterns, providing groundwork for modern machine learning.

― Nobel Prize to the Statistical Physics of artificial neural networks - Complexity Thoughts [Link]

“The laws governing physical systems also apply to the world of artificial intelligence.”

Major AI Functionalities / with Apps - AI Supremacy [Link]

A list of AI products you can experiment with.

Fine-tuning can be useful for certain tasks (see the relevant section here for more details), but when it comes to injecting morality into your LLM, it’s probably not a good bet.

By combining the strengths of diffusion models and auto-regressive generation, DGLM offers a more nuanced, adaptable, and potentially more effective approach to generating safe and creative text. It moves away from the brute-force, one-size-fits-all approach of fine-tuning and embraces a more modular, dynamic, and personalized approach to AI safety.

― A New Way to Control Language Model Generations [Breakdowns] - Artificial Intelligence Made Simple [Link]

The author lists drawbacks of fine tuning:

“A model’s knowledge and capabilities are learnt almost entirely during pretraining, while alignment teaches it which subdistribution of formats should be used when interacting with users.” - LIMA: Less Is More for Alignment [Link]

“Our findings reveal that while unsupervised fine-tuning offers some improvement, RAG consistently outperforms it, both for existing knowledge encountered during training and entirely new knowledge. Moreover, we find that LLMs struggle to learn new factual information through unsupervised fine-tuning.” - Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs [Link]

“Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing — — even if a model’s initial safety alignment is impeccable” - Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To! [Link]

“The base model generates a wide range of nationalities, with American, British, and German being the top three. In contrast, the aligned model only generates three nationalities: American (highest percentage), Chinese, and a small percentage of Mexican.” - Creativity Has Left the Chat: The Price of Debiasing Language Models [Link]

Just do it! Brand Name Lessons from Nike’s Troubles - Musings on Markets [Link]

Brand value is often mixed with other advantages like scale, network effects, and product differentiation. Strong brands yield higher revenues, pricing power, and potentially lower capital costs. And it’s hard to separate brand value in companies with multiple advantages.

Spotify: Layoffs Pay Off - App Economy Insights [Link]

Key business highlights: 1) Spotify’s new subscription plans - Spotify’s Audiobooks Access ( $\$9.99$ per month for 15 hours) and Basic (removing audiobooks from Premium) diversify its offerings, 2) Spotify is incorporating social-style discovery features, like Live Listening Parties and prompt-based AI playlists, but remains behind YouTube and Meta in algorithm sophistication and live content, 3) Spotify’s ad-supported ARPU is low (€1.15 vs. Meta’s $\$11.89$ dollars in Q2), limiting ad revenue potential. A new in-house creative agency may improve brand experiences but is challenging to scale profitably, 4) Spotify pivoted from broad podcasting investments to a case-by-case approach, now pushing video podcasts.

Competitiveness: 1) Hub Entertainment Research shows Spotify’s high ‘must-have’ appeal, with 75% of US users viewing it as “uncancellable.” This loyalty supports Spotify’s growing free cash flow and a valuation around 40 times Free Cash Flow (FCF) —placing it ahead of rivals like YouTube Music (100M subscribers) and Apple Music (estimated 110M by 2025), 2) Despite solid growth, Spotify’s reliance on licensed content and its still-limited ad revenue leave room for competition. While TikTok’s 1B+ users could funnel into TikTok Music, ByteDance recently announced it will close TikTok Music by November, focusing instead on promoting artists and streaming value within the main app—a potential competitive break for Spotify.

Cybersecurity Earnings - App Economy Insights [Link]

Covered Palo Alto Networks, CrowdStrike, Fortinet, Zscaler, and Cloudflare.

Two Nobel Prizes for AI, and Two Paths Forward - Marcus on AI [Link]

Hinton’s focus on end-to-end neural networks can be limiting, especially when considering the complexities of real-world problems that often require more structured and hybrid approaches. On the other hand, Hassabis’s embrace of neurosymbolic AI reflects an openness to different methodologies and a recognition that a combination of techniques may yield better results.

First, Waymo is using transformer-based foundation models for all stages of its self-driving pipeline: perception, prediction, and planning. Second, the whole system is trained end to end. During training, gradients from the behavior network propagate backwards to the perception network.

So I see more similarities than differences in the evolution of Waymo and Tesla’s self-driving software. Both companies made little to no use of neural networks in their early systems. Both companies started using neural networks for perception in the late 2010s. And both companies only recently shifted to end-to-end architectures that used neural networks for all stages of the self-driving pipeline.

― Elon Musk wants to dominate robotaxis—first he needs to catch up to Waymo - Understanding AI [Link]

Tesla’s advantages compared to Waymo:

Tesla already has millions of vehicles on the road, which could quickly deploy robotaxi software without the need for new hardware.
Tesla relies on cost-effective, camera-based perception without expensive sensors like lidar, which Waymo uses. This could lower Tesla’s per-vehicle cost and allow it to expand more rapidly if autonomy is achieved.
Tesla’s transition to a full, end-to-end neural network approach for perception, prediction, and planning has improved FSD’s ability to handle complex driving situations without manual coding for specific scenarios.

Tesla’s disadvantages compared to Waymo:

Tesla hasn’t deployed a fully driverless car yet, while Waymo has offered driverless rides since 2020.
Tesla would need extensive infrastructure to maintain a robotaxi network (for charging, cleaning, and repairs) which it currently lacks. Building this up in cities nationwide would take time, resources, and logistical planning.
Tesla’s camera-only approach may struggle in certain conditions (e.g., low visibility), which lidar could handle better.

Key Contribution of AI to Robotics:

Improved Reasoning and Planning: Large language models, like those developed by OpenAI and Google, are enabling robots to interpret high-level commands, understand contextual instructions, and execute complex, multi-step tasks. This is particularly valuable in dynamic environments where robots must adapt to unforeseen changes and make real-time decisions.

Enhanced Visual and Motor Coordination: The integration of generative AI with visual and motor feedback systems allows robots to translate visual inputs into precise motor actions. This enables robots to perform tasks such as picking and placing objects with greater accuracy and efficiency, even in environments that are constantly changing.

Natural Language Interfaces: AI-driven natural language interfaces are making it easier for users to interact with robots using everyday language rather than programming code. This democratization of robotics makes it accessible to non-technical users, paving the way for broader adoption across industries.

Predictive Maintenance: AI models analyze real-time data from robots to predict potential malfunctions, enabling proactive maintenance that minimizes costly downtime and enhances operational efficiency.

― Generative AI and Robotics in 2024 - AI Supremacy [Link]

Glue: The less-glamorous stuff that helps a team succeed

Strike the right balance between glue and core work. Do enough glue work to show leadership promotable artifacts, but not too much to where your core work suffers.

Lead meetings, take notes, and share them to provide value to the right stakeholders.

Send your manager monthly recaps of your accomplishments so they can more easily sponsor you and your work.

Grit: The will to pursue a long-term goal despite challenges

Break your projects into achievable milestones so you can constantly feel progress, even for long projects.

View failures as progress. It’s one less route you need to explore now.

Take breaks and work in fun. I set up icebreakers at the start of our meetings, organized team events, and pushed for production freezes.

Friction: The gap between reality and the ideal state

Find ways to unblock yourself and the people around you. Do this enough, and you’ll have mastered removing friction.

Removing friction paints you as a force multiplier. Force multipliers get promoted.

― 3 Career Principles that got me to Director at Google - High Growth Engineer [Link]

Processing Fluency: The ease with which information is perceived and processed by the human mind. High fluency can lead to positive evaluations, even if the information is not accurate.

Halo Effect: A cognitive bias where a positive overall impression of something (e.g., an LLM’s fluency) influences the evaluation of its individual attributes (e.g., truthfulness).

Inter-rater Agreement (IRA): A measure of how much two or more evaluators agree on their assessments. Low IRA indicates potential problems with the evaluation design or guidelines.

Extrinsic Evaluation: Assessing the impact of an LLM’s output on an end-user task or system (e.g., measuring productivity gains from using an LLM-powered email assistant).

Intrinsic Evaluation: Evaluating the properties of the LLM-generated text itself, such as fluency, coherence, and factual accuracy.

― How Amazon is Rethinking human evaluation for generative large language models [Breakdowns] - Artificial Intelligence Made Simple [Link]

Pretty comprehensive of human evaluation of Gen AI models.

US Banks: Soft Landing? - App Economy Insights [Link]

*A recent pre-print paper found that at least 5% of new Wikipedia articles in August 2024 were AI generated, Facebook isn’t doing anything to stop AI generated images of Hurricane Helene, and Goodreads and Amazon are grappling with various AI generated book schemes scamming people into buying pulp. It’s only the tip of the iceberg.*

― When Models Go MAD - Teaching computers how to talk [Link]

Humm AI is diluting reality. It’s concerning.

Elon Musk’s tech projects are inseparable from his authoritarian one - Blood In The Machine [Link]

Good discussion. Musk has multifaceted role—entrepreneur, political influencer, and media magnate. His activities underscore a model of influence that defies precedent, blending financial might, technological ambition, and political maneuvering. Recognizing the interconnectedness of these efforts is crucial for understanding Musk not just as a private-sector innovator but as a power broker actively shaping the public and political spheres in ways that could redefine norms, values, and who gets to participate in his envisioned future.

YouTube and Podcast

Tesla I don’t think it’s a car company, I think this is misleading, this is a robotics company robotics at Scale Company, because I would say at scale is also like a whole separate variable, they’re not building a single thing, they’re building the machine that builds the thing which is a whole separate thing and so I think robotics at scale company is what Tesla is.

I think with synthetic data you just have to be careful, because these models are silently collapsed, is like one of the major issues so if you go to ChatGPT and you ask it to give you a joke, you’ll notice that it only knows three jokes, that’s the only it gives you like one joke I think most of the time. And sometimes it gives you like three jokes and it’s because the models are collapsed and it’s silent, so when you’re looking at any single individual output, you’re just seeing a single example, but when you actually look at the distribution, you’ll notice that it’s not a very diverse distribution, it’s silently collapsed. When you’re doing synthetic data generation, this is a problem, because you actually really want that entropy, you want the diversity, and the richness in your data set otherwise.

― No Priors Ep. 80 | With Andrej Karpathy from OpenAI and Tesla - No Priors: AI, Machine Learning, Tech & Startups [Link]

Andrej Karpathy was a founding team member of OpenAI and the former Tesla Autopilot leader. He discussed the evolution of self driving cards, tech challenges, Tesla’s Optimus humanoid robot, bottlenecks of AI development today. The topic of how AI capabilities could be further integrated with human cognition sounds very future and funny.

AI prompt engineering: A deep dive - Anthropic [Link]

Some of Anthropic’s prompt engineering specialists—Amanda Askell (Alignment Finetuning), Alex Albert (Developer Relations), David Hershey (Applied AI), and Zack Witten (Prompt Engineering)—share their insights on the evolution of prompt engineering, offer practical advice, and discuss how prompting could evolve as AI continues to advance.

Decoding Google Gemini with Jeff Dean - Google DeepMind [Link]

Jeff Dean, chief scientist of Google DeepMind and Google Research, discusses the past, present and future of AI, specially the long term potential of multi-modal models like Gemini.

Shall We Repeal the Laws of Economics? - Oaktree Capital [Link]

Howard Marks addresses how politicians often ignore economic reality in their campaign promises, using examples like Trump’s call for tariffs and Harris’s attack on grocery profiteering. He emphasizes that economic laws are incontrovertible, and politicians can’t deliver on promises that contradict these laws; free markets allocate resources efficiently. And he highlights the ongoing political refusal to address issues like Social Security insolvency and national debt, stating that ignoring economic laws will eventually lead to negative outcomes.

Introducing OpenAI o1 - Open AI [Link]

A series of video from Open AI to introduce GPT o1.

Ep17. Welcome Jensen Huang | BG2 w/ Bill Gurley & Brad Gerstner - Bg2 Pod [Link]

Dueling Presidential interviews, SpaceX’s big catch, Robotaxis, Uber buying Expedia?, Nuclear NIMBY - All-In Podcast [Link]

Meta VS Apple: What Their Battle Means For AI Startups - Y Combinator [Link]

The Waymo Way: Making Autonomous Driving a Reality | Dmitri Dolgov - U-M Computer Science and Engineering [Link]

Sam Altman: The Man Behind ChatGPT | Big Take - Bloomberg Podcasts [Link]

Markets turn Trump, Long rates spike, Election home stretch, Influencer mania, Saving Starbucks - All-In Podcast [Link]

What’s next for AI agentic workflows ft. Andrew Ng of AI Fund - Sequoia Capital [Link]

A fireside chat with Sam Altman OpenAI CEO at Harvard University - Harvard Business School [Link]

Q1: I’m curious how you rebuilt or rediscover momentum after pivoting in a professional or personal sense in terms of your values.

We have unbelievable greatest technological wave ever, so it’s a great time to be starting out your career. You are going to be flooded with opportunities for the next few years.

Q2: How do you think AI’s role will be in tackling sort of inequalities in education and health care and legal stuff?

It should reduce inequality.

Q3: Subscription model could be a barrier for startup and small businesses, do you think OpenAI would explore alternative monetization strategy that could include free API access, perhaps supported by advertising or other methods, to foster innovation in the future?

Sam hates ads in general. Ads + AI is uniquely unsettling to him. He likes the simplicity of OpenAI’s model, which is they make great AI and people pay them for it, and they just do the best they can for people.

Q4: Does the increasing competitors change the way you are evolving for the next products?

They are just trying to figure out the next paradigm and the next great idea. They don’t pay much attention to the market share though they pay at least attention to competitors to maybe get inspiration.
Sam’s hope is that every year, they do something amazing that people thought was impossible. Once you know that something is possible and roughly how to do it, it always gets copied quickly. That’s not the hard part. The hard part is figuring out what it is and doing it first when you don’t know it’s possible.

Q5: what do you think the ideal public general education curriculum on AI should look like and what does the average person need to know about AI in the next 5-10 years?

Computer science major or at least some courses. Being able to train a GPT-2.

Q6: Can you share what’s coming next after transformers? Then the second question is what do you think most entrepreneurs and VCs are getting wrong about the future of AI?

Most entrepreneurs don’t bet on AI models are going to be massively better so that’s why there is meme that “OpenAI kills my startup”.

Q7: How much exposure does OpenAI and the AI movement have to energy constraints? And what do you think the role is of founders have to play in addressing these concerns?

It’s necessary to Sam to drive tech abundance from those two key inputs - energy and AI.

Articles and Blogs

Of the employees we studied, those with superior sales performance were genetically different from the rest of the group. They were better at learning in real time about new customers and new sales opportunities. From an initial conversation with a sales lead, they were able to quickly feel out the customer and propose appropriate products without being told what to recommend.

Adaptive learning is different; it isn’t trainable. It’s the ability to process new information in real time and immediately use it to achieve a positive result.

For example, sales teams often require junior employees to cold-call leads, even through they don’t know as much about the company as more experienced employees do. Most of them haven’t even learned how to sell yet. But our research shows that for adaptive learners, seniority and experience are less important. Employees with the sales gene quickly become knowledgeable about your products and are able to learn and adjust on the fly.

― There Really Is a “Sales Gene” - Juan Martinez, Harvard Business Review [Link]

Adaptive learning skill is important but might not correlated with seniority or experience. Employees with this capability can quickly become knowledgeable about the products and are able to learn and adjust on the fly.

The article suggests that managers or companies could be given a snapshot of how many of salespeople are adaptive learners without singling out any individual. and they could tell which tasks require adaptive learning skills and which don’t and allow them to choose. This should be done in an anonymous way.

No matter whether what’s been proposed in this article is applicable or ethical. The idea of adaptive learning is kind of new to me, and it inspires me to further think about whether this skill is learnable and teachable, and think about whether there is any other secret skills in sales.

New Rules for Teamwork - Harvard Business Review [Link]

Develop an Operating System

OS means building blocks for the way team members collaborate, create change, and support one another. Effective operating systems vary widely, depending on the needs and norms of the organization. What they all have in common is that they set out a view of how teams create value, what teams are supposed to achieve, the technical skills each team member is expected to contribute, the processes by which the work will be managed, and the cultural norms and mindsets of constructive collaboration that will guide behavior.

Suggestions: hold kickoffs, conduct one on ones, and take stock of progress using retrospectives, are the three practices as a foundations of team OS.
Invest in Active, Real-Time Measurement

To make teamwork scientific, organizations need to be able to measure the outcomes of their actions and determine how changes in the inputs affect results.

Suggestion: define what constitutes success.
Create a System for Continuous Improvement and Innovation

Teams today have new forms of technology and data collection at their disposal to help them self-correct while projects are underway. e.g. support colleagues to discuss what could have been done better; look at the patterns across teams to identify improvements and share best practices, particularly with regard to the rapid adoption of new technologies such as GenAI.

Suggestions: Identify the metrics that matter most (shift-changeover time, perhaps), hypothesize which actions could improve performance in those areas (preassigned workstations, perhaps), and embed technologies in the operating system (a smart-planning app, perhaps) to enable continuous improvement. Continuous improvement can occur only when all perspectives are considered and all teams have access to a centralized knowledge repository. Finally, it may be useful to set up a center of excellence, staffed with full-time employees with experience in analytics and operating system design.

Why Leadership Teams Fail - Harvard Business Review [Link]

I was reading while thinking about my team and neighbor teams. I find this article very useful.

A critical factor in organizational success: the health of their leadership team. There are three main patterns of dysfunction: Shark Tanks, Petting Zoos, and Mediocracies.

Shark Tanks
- Definition: A leadership team marked by hyper-competition, political maneuvering, and infighting. Members prioritize personal agendas over collective goals, leading to toxic and combative dynamics.
- Causes: Lack of clear direction or boundaries from the CEO or team leader. Failure to address self-serving behaviors early on. Absence of behavioral norms that encourage collaboration.
- Signs: Team members engage in power struggles outside of meetings. One-on-one discussions with the CEO on issues that should be resolved in team settings. Meetings turn into battlegrounds, with frequent arguments and difficulty reaching consensus. Executives bad-mouth each other, form alliances, or resist decisions after they’ve been made.
- Prevention:
  - Clear Expectations: Leaders should explicitly define which behaviors are acceptable and unacceptable. Set boundaries around how competition should be managed.
  - Confront Self-Serving Behaviors: Address aggressive or toxic behaviors directly with individuals. Remove those unwilling to align with the team’s goals, even if they’re high performers.
  - Role Modeling: The CEO or team leader must model collaborative behaviors and ensure transparency in communication to prevent political games.
  - Regular Feedback: Reinforce positive behaviors and correct negative ones through continuous feedback. Implement 360-degree reviews to track team behavior and performance alignment.
Petting Zoos
- Definition: A leadership team that avoids conflict to maintain harmony. Vigorous debate is sacrificed, and members prioritize getting along over pushing for the best ideas, leading to complacency and poor decision-making.
- Causes: Overemphasis on collaboration and mutual trust, leading to conflict avoidance. Team members are too deferential, fearing that disagreements might disrupt the team’s harmony. Leaders may unknowingly encourage this avoidance by stressing harmony over debate.
- Signs: Meetings lack critical debate, and discussions feel muted and lacking in emotional intensity. Team members engage in performance theater, focusing on positive news while downplaying problems. Decisions are made by consensus without sufficient evaluation or challenge. Leaders avoid holding one another accountable for poor performance, reluctant to disrupt the status quo.
- Prevention:
  - Encourage Debate: Leaders should foster a culture of constructive conflict where members feel safe to challenge each other’s ideas. A foundation of trust and psychological safety is key.
  - Promote Data-Driven Discussion: Ensure discussions are rooted in facts, using shared data to spur debate and avoid personal conflict. This encourages neutral, objective decision-making.
  - Monitor Meeting Dynamics: Leaders should track participation and the quality of discussion during meetings, encouraging team members to speak up and challenge ideas more openly.
  - Redefine Consensus: Teams must understand that consensus does not mean avoiding conflict but making informed decisions after rigorous debate.
Mediocracies
- Definition: A leadership team marked by complacency, lacking the drive or skills to achieve high performance. Collaboration and competition are both underemphasized, and the team fails to meet the organization’s needs.
- Causes: Long periods of success that breed complacency. Poor alignment between the team’s skills and the changing demands of the business. A divided team, where some members prefer competition while others favor collaboration, leading to inconsistent and ineffective efforts. A leader’s failure to adapt to changing market conditions or internal challenges.
- Signs: Team members operate in silos, with little collaboration between departments or units. Decision-making is slow, and there is a lack of accountability for performance. The team focuses on past achievements rather than future goals, with little ambition or drive for improvement. The team struggles with stagnation, missed opportunities, and duplicated efforts due to poor coordination.
- Prevention:
  - Rebuild the Team: Leaders may need to replace members who are not fit for their roles or who lack the motivation or skills needed to lead effectively. New hires should be chosen not just for their skills but also for their alignment with the company’s purpose and values.
  - Promote Balance: Strike a balance between competition and collaboration by hiring individuals with complementary skills and styles (e.g., planners and visionaries alongside hard-nosed executors).
  - Clear Roles and Expectations: Define where collaboration is expected (e.g., across departments) and where competition might be useful (e.g., in individual market decisions). Ensure everyone understands their responsibilities and how their performance contributes to broader goals.
  - Challenge the Status Quo: Continuously push the team to innovate and grow by setting ambitious goals and holding team members accountable for driving performance improvements.

Without such an observability system–let’s call it Design System Observability–it could be too late when Uber learned through complaints and public media about the end users who would suffer confusing onboarding rides, inconsistent layouts, and frustrating voiceovers/talkbacks sessions.

― How to Measure Design System at Scale - Uber Blog [Link]

RAG is the most popular architecture of the LLM based systems in 2023. There are many products build almost solely on RAG — from Question Answering services combining web search engines with LLMs to hundreds of chat-with-your-data apps.

― Advanced RAG Techniques: an Illustrated Overview - Medium [Link]

Machines of Loving Grace - Dario Amodei, CEO of Anthropic [Link]

This essay aligns with our thinking that while you acknowledge the risks and share concerns over idealistic promises, you also recognize the challenges inherent in building AGI. It reflects an approach to AI that balances ambition with caution, and providing perspectives of how to articulate a vision for AI that remains both ambitious and grounded.

Interesting point: As AI reaches near-universal superiority, the economy may need to adapt fundamentally. Potential solutions range from universal basic income (UBI) to entirely new economic frameworks. Perhaps AIs might manage resources and distribute them according to value systems derived from human input, but such proposals raise ethical and practical concerns. The author hints that the economic transformation required could be as drastic as past societal shifts (e.g. from hunting-gathering to agriculture), suggesting that humanity will have to experiment and iterate to find sustainable models that protect against exploitation or dystopia.

Overall, the author believes that the core human values such as fairness, cooperation, and autonomy have a natural, most ‘’overdetermined’ appeal, and will often lead toward democracy, rule of law, and enlightenment ideals - a trajectory that AI as a catalyst could accelerate by making the path to this “good world” more tangible. While the vision seems intuitive and inspiring to many, it may still appear fantastical or undesirable to others. Even so, the author finds a unique beauty in striving for it, suggesting that our intrinsic human impulses toward collaboration and justice make this vision both plausible and worth pursuing.

Andrew Ng’s writing in The Batch - The Batch [Link]

“The best we can do is a compromise: learn to recognize situations in which mistakes are likely and try harder to avoid significant mistakes when the stakes are high.” ― Daniel Kahneman, Thinking, Fast and Slow

― Unleashing System 2 Thinking? AlphaCodium Outperforms Direct Prompting of OpenAI o1 - qodo [Link]

System 1 thinking: fast responses with surface-level understanding;

System 2 thinking: deliberate methodical and reasoned problem solving.

Introducing the Realtime API - OpenAI Blog [Link]

Developers can now build fast speech-to-speech experiences into their applications

With canvas, ChatGPT can better understand the context of what you’re trying to accomplish. You can highlight specific sections to indicate exactly what you want ChatGPT to focus on. Like a copy editor or code reviewer, it can give inline feedback and suggestions with the entire project in mind.

You control the project in canvas. You can directly edit text or code. There’s a menu of shortcuts for you to ask ChatGPT to adjust writing length, debug your code, and quickly perform other useful actions. You can also restore previous versions of your work by using the back button in canvas.

― Introducing canvas - OpenAI [Link]

Canvas offers a new interface for the project works that require editing and revisions.

Multi document agentic RAG: A walkthrough - LanceDB [Link]

This tutorial shows you how to build a multi-document agentic RAG system using LanceDB and LlamaIndex for complex information retrieval. Specifically, the walkthrough demonstrates how to integrate LLMs, vector databases, and agent-based reasoning for enhanced information retrieval and task completion.

Reports and Papers

Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs [Link]

The paper explores in-context learning (ICL) mechanisms in large language models (LLMs), focusing on the balance between knowledge retrieval and learning from in-context examples in regression tasks. It reports that LLMs can learn from regression examples of realistic datasets in-context, extending previous work on synthetic data to more practical scenarios.

I’m looking forward to this kind of experiments and studies, because I was suspicious about applying LLM on structured data.

Larger and more instructable language models become less reliable [Link]

The issue might stem from the nature of LLMs, which are designed to generate plausible responses based on patterns in the data they’ve seen, rather than to know anything in the traditional sense. They don’t have an internal mechanism to differentiate truth from fabrication, so as they scale up, they produce more complex, yet not necessarily more accurate, answers. This makes them better at appearing smart, but less reliable overall—a quality that philosophers like Mike Hicks rightly criticize as “bullshitting.”

From a user perspective, it underscores the need for critical thinking when engaging with AI models through prompt engineering. Just because an LLM provides a well-phrased response doesn’t mean it’s accurate.

o1—like previous LLMs—is sensitive to the probability of examples and tasks, performing better and requiring fewer “thinking tokens” in high-probability settings than in low-probability ones.

― When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 [Link]

Although optimized for reasoning, o1 still exhibits probability-based limitations tied to its autoregressive origins, implying that a complete departure from these influences has not been fully achieved.

VideoPrism: A foundational visual encoder for video understanding - Google Research [Link]

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models [Link]

Astute RAG is designed to better combine internal and external information through an interactive consolidation mechanism (i.e., identifying consistent passages, detecting conflicting information in them, and filtering out irrelevant information).

Differential Transformer - Microsoft Research [Link]

Diff Transformer amplifies attention to relevant context while canceling noise, resulting in outperforming standard Transformers in multiple areas such as long-context modeling, key information retrieval, hallucination mitigation, in-context learning, and reducing activation outliers, as shown in the experiments.

The key innovation is a differential attention mechanism that calculates attention scores by subtracting two separate softmax attention maps. This subtraction cancels out irrelevant attention, promoting sparser, more accurate focus on important information, similar to noise-canceling techniques.

Diffusion Guided Language Modeling [Link]

Controllable language modeling refers to techniques that allow users to guide or control specific attributes of the generated text from a language model (LM). These attributes can include factors like sentiment, toxicity, formality, or any other desired linguistic or stylistic feature. The primary challenge is ensuring that generated content aligns with specific requirements without compromising fluency, coherence, or overall quality of the text.

Diffusion models are excellent for controllable generation. They generate data in a multi-step process, gradually refining noise into coherent content. This incremental approach allows for fine-grained control at various stages of the generation. By manipulating the process at specific steps, you can guide the output more effectively toward desired characteristics (such as sentiment, style, or tone). In contrast, auto-regressive models like GPT generate text token by token in a one-shot manner, making it harder to impose controls without affecting fluency.

DGLM could refine language model generation because it integrates the fluency of auto-regressive language models (like GPT) with the flexibility of diffusion models. This flexibility is realized by employing Plug-and-Play with Linear Classifiers in the Sentence-T5 latent space to guide the diffusion process towards generating proposals with desired attributes.

On the Diagram of Thought [Link]

Researchers from Tsinghua University, led by Andrew Chi-Chih Yao, introduced Diagram of Thought (DoT), designed to enhance the reasoning capabilities of LLMs.

The limitation of CoT is that it processes information in a straight line which does not reflect the way of how humans think. The limitation of ToT or GoT is that they are computationally expensive and challenging to implement within a single LLM.

DoT addresses these limitations by modeling reasoning as the construction of a directed acyclic graph (DAG) within a single LLM. This DAG comprises nodes representing: 1) Propositions: Initial and refined ideas generated throughout the reasoning process, 2) Critiques: Evaluations of propositions, identifying errors or inconsistencies. 3) Refinements: Improved propositions based on critiques. 4) Verifications: Confirmation of valid propositions.

Two key crucial aspect of DoT framework: 1) it leverages auto-regressive next-token prediction with role specific tokens to manage reasoning process within a single LLM, 2) it has strong foundation in math logic - Topos Theory, ensuring logical consistency and soundness in reasoning process.

A Survey on the Honesty of Large Language Models [Link]

Dunning-Kruger effect is named after psychologists David Dunning and Justin Kruger, who first described this phenomenon in 1999. It reveals a troubling mismatch between perception and reality. Those who know little often lack the self-awareness to recognize their limitations. Conversely, experts may be acutely aware of the vastness of their field, leading them to undervalue their own expertise.

Spatial context non-uniformly modulates inter-laminar information flow in the primary visual cortex [Link]

Their research shows that when our field of vision is cluttered, it changes how efficiently our brain processes information, though the basic pattern of information transfer remains the same.

When you give a Claude a mouse - Out Useful Thing [Link]

Some impression on what an agent is capable of.

Agentic Information Retrieval [Link]

Research proposed AI agent on information retrieval (IR) task. Compared to traditional IR, Agentic IR employs a unified, adaptive architecture where agents use observation, reasoning, and actions iteratively to reach the desired user information state. Key methods include prompt engineering, retrieval-augmented generation, multi-agent systems, and reinforcement fine-tuning (RFT).

A Comparative Study on Reasoning Patterns of OpenAI’s o1 Model [Link]

An evaluation study of GPT o1’s reasoning patterns. The study identified six distinct reasoning patterns for o1—Systematic Analysis (SA), Method Reuse (MR), Divide and Conquer (DC), Self-Refinement (SR), Context Identification (CI), and Emphasizing Constraints (EC)—with DC and SR being most common across tasks. And token count varied greatly across tasks, indicating that the o1 model adjusts reasoning depth based on task complexity.

Malla: Demystifying Real-world Large Language Model Integrated Malicious Services [Link]

A study of malicious services powered by LLM ‘Malla’ in the underground marketplaces. (When it comes to AI, everywhere has gaps to bridge lol). Interesting mysteries to uncover:

Who are the pivotal players within the Malla ecosystem?

The Malla ecosystem comprises vendors who create malicious LLM services, users who exploit these services, and platforms that facilitate their operations.
How is Malla orchestrated and monetized?

Malla services generate revenue through direct user transactions, often accepting cryptocurrencies, with some vendors reporting substantial earnings.
What techniques did miscreants deploy to exploit LLMs and build up Mallas?

Miscreants utilize techniques like jailbreak prompts to bypass LLM restrictions and abuse public APIs to generate harmful content.

Were RNNs All We Needed? [Link] [Source]

The minimal versions (minLSTMs and minGRUs) are fully parallelizable during training and use fewer parameters. They are 175x faster to train than traditional LSTMs and GRUs. And their performance is equivalent to Transformers or Mamba with fewer training steps.

The Perfect Blend: Redefining RLHF with Mixture of Judges [Link]

This work is redefining RLHF with Mixture of Judges by Constrained Generative Policy Optimization (CGPO), a novel RLHF framework. The framework uses a Mixture of Judges (MoJ) with rule-based and LLM-based constraints to mitigate reward hacking. As a result, CGPO consistently outperforms PPO and DPO baselines across various benchmarks.

STATE OF AI REPORT 2024 [Link]

This annual publication examines trends in AI research, industry developments, and technological progress. And this year it reveals convergence in AI Model Performance.

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering [Link] [Blog]

As the title described, they introduce MLE-bench as a benchmark for measuring how well AI agents perform at machine learning engineering. The tasks for testing were created by curating 75 ML engineering related competitions from Kaggle. Eventually, OpenAI’s o1-preview with AIDE scaffolding achieves at least bronze medal in 16.7% of competitions.

Github and Docs

Swarm - An educational framework exploring ergonomic, lightweight multi-agent orchestration [Link]

Little experimental and educational multi-agent framework by OpenAI.

Swarm’s Operational Framework:

Agent Definition: Create Agents with specific instructions, roles, and functions. Each function is automatically converted to a JSON structure for API compatibility.
Handoff Mechanism: Implement logic for agent transitions. Functions can return a new Agent object to transfer control based on conversation flow or predefined criteria.
Context Management: Utilize Context Variables to initialize and update shared information throughout the conversation, maintaining state across agent interactions.
Execution Loop: The client.run() function manages the multi-agent conversation. It takes an initial agent, user messages, and context as input, and returns a response with updated messages, context variables, and the last active agent.
This structure allows for flexible, dynamic multi-agent interactions while maintaining a stateless architecture between calls.

o1-engineer [Link]

A CLI tool for streamlining workflows with AI-powered code generation and editing.

Llama-stack [Link]

Meta unveils open-source Llama Stack, standardizing AI building blocks across the entire development lifecycle.

Leaked meta prompt [Link]

Leaked OpenAI meta prompt: optimizing GPT instructions for better results.

Auto Jobs Applier - AIHawk [Link]

Job search assistant.

RAGBuilder [Link]

Toolkit used to create optimal production-ready RAG setup for your data automatically.

Prompt caching (beta) - Anthropic [Link]

Prompt caching optimizes API calls for faster LLM interactions. It is a new API feature that optimizes large language model interactions. It caches and reuses consistent parts of prompts, reducing processing time and costs for repetitive tasks.

This technique is particularly useful for scenarios involving large contexts, multiple examples, or long conversations. Prompt Caching works by checking if a prompt prefix is already cached from a recent query and using it if found, otherwise processing and caching the full prompt.

Lazy Predict [Link]

Works to rapidly test multiple ML models with minimal coding effort. This Python library streamlines model selection for classification and regression tasks.

News

The Nobel Prize in Physics 2024 to John J. Hopfield and Geoffrey E. Hinton, for foundational discoveries and inventions that enable machine learning with artificial neural networks [Link]

Zuckerberg imagines that people will want to use AR glasses like Orion for two primary purposes: communicating with each other through digital information overlaid on the real world — which he calls “holograms” — and interacting with AI.

― Meta’s big tease - The Verge [Link]

One downside of Apple’s Vision Pro or Meta’s Quest 3 is that you lost vision of other people and other people cannot see your eyes, making it usage situation limited to home where you don’t have interaction with other people. However, the future of devices that’s going to replace mobile phone or comparable to mobile phone, has to have some functionalities to support socialization and networking.

Llama 3.2: Revolutionizing edge AI and vision with open, customizable models - Meta Blog [Link]

Big Tech has cozied up to nuclear energy - The Verge [Link]

Microsoft, Amazon, and Google are investing in nuclear energy to power their data centers.

Why Taiwan and Its Tech Industry Are Facing an Energy Crisis - Yale Environment 360 [Link]

Google’s share of the U.S. search ad market is expected to drop below 50% next year for the first time in over a decade, according to the research firm eMarketer.

Amazon is expected to have 22.3% of the market this year, with 17.6% growth, compared with Google’s 50.5% share and its 7.6% growth.

― Google’s Grip on Search Slips as TikTok and AI Startup Mount Challenge - The Wall Street Journal [Link]

Uber and Lyft drivers use Teslas as makeshift robotaxis, raising safety concerns - Reuters [Link]

Real world example:

Shorenstein Properties, a real-estate investment company based in San Francisco, is in a pilot program that is designed to lead to the automated tagging of all of its files using a RAG-based AI system. The goal is to eliminate many of the drawbacks in a time-consuming manual system, in which people might make errors or simply skip the process altogether. The company plans to put the tagging system into production in the next few months.

Files can also be organized quickly into “knowledge bases” and interrogated with AI, according to Egnyte, a cloud-based platform that companies use to access, share and manage business content.

Shorenstein in the past few weeks has started a proof of concept project using Egnyte to extract data from prospectuses on properties for sale, documents that can often run 60 pages, and organize it into reports that could help the company make efficient business decisions and improve processes.

― Companies Look Past Chatbots for AI Payoff - Steven Rosenbush at The Wall Street Journal [Link]

Beginning Friday, users of Meta’s AI chatbot feature in the U.S. will have access to real-time news and information from Reuters when they ask questions about news or current events.

It’s the first news deal Meta has brokered in the AI era.

― Scoop: Meta strikes multi-year AI deal with Reuters - AXIOS [Link]

An AI companion for everyone - Microsoft Blog [Link]

Releasing Copilot Voice, Copilot Daily, Personalization in Copilot, Copilot Vision, Think Deeper.

Anaconda Brings Generative AI Models to Desktops with Launch of AI Navigator - Anaconda Press [Link]

Meet the new Notion AI - Notion [Link]

Notion AI heavily integrates AIintroducing file-handling capabilities, enabling developers to extract insights from PDFs and images.

Customer data search, unification and retrieval for LLMs - Tilores [Link]

Identity RAG from Tilores improves accuracy and relevance of enterprise LLMs.

New autonomous agents scale your team like never before - Microsoft [Link]

We’re also introducing a groundbreaking new capability in public beta: computer use. Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta.

Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku [Link]

Perplexity Now Offers Powerful AI Search Right On Your macOS Desktop [Link]

Pushing the frontiers of audio generation - Google DeepMind [Link]

Gemini API and Google AI Studio now offer Grounding with Google Search - Google for Developers [Link]

What’s new: 1) reduces hallucination rates by sourcing verified, real-time information, 2) provides citation-supported answers, improving transparency, 3) allows threshold-based activation to control costs.

Technical details: Google’s Dynamic Retrieval system evaluates each query for grounding suitability, assigning a prediction score between 0 and 1. Factual or time-sensitive queries score higher (e.g., 0.97), while creative prompts score lower (e.g., 0.13).

New in Maps: Inspiration curated with Gemini, enhanced navigation and more - Google [Link]

The Long View

Posted on 2024-09-28

“The Long View: Career Strategies to Start Strong, Reach High, and Go Far” written by Brian Fetherstonhaugh was a book I randomly picked to read but it surprised me as it turns out to be fairly practical and comprehensive.

We need a work philosophy that encompasses all the parts of our lives, and one that can give us guidance on how to be ambitious and seek success without sacrificing other things we value deeply— family, friends, health, and purpose.

What I have learned:

Five Things You Need to Know to Build a Career Plan

Careers last for about 45 years and embrace 3 distinctly different stages, each lasting about 15 years.
- Stage One: Start Strong by Taking on Fuel
  
  Your learning curve is more important than your job title. Create the foundation for your career and establish good early habits.
  
  These are good suggestions for a new hire:
  
  In July of 2015, Next Big Sound was acquired by Pandora. Modest and down to earth, Alex is quick to point out the role that luck played in his success as well. “The number of things that went our way was terrifying,” he said shaking his head. “If you want to start your own company just so you can be on the Forbes Under 30 list, it’s not worth it. If you want to be an entrepreneur to make a lot of money, don’t. If you find something you’re obsessed with, something that keeps you up at night, then that’s the thing you should pursue.”
  
  If you are serious about your career, this is not enough. Learn what makes your company tick: where it came from, what it stands for, how it makes its money, who the key people are, and where it is going If you don’t get answers to these questions as part of your company’s normal indoctrination, make it your business to find out in the first hundred days. Do your homework. Read the company’s annual report, or better yet, an outside analyst’s assessment of the company. Ask both old-timers and rising stars to tell you the inside scoop on your company over a cup of coffee, Get engaged by joining a club, team, or professional network in the firm. Volunteer to help with a company event, and do it well. Slowly begin to build your career ecosystem of contacts, communities, critical colleagues, and champions.
- Stage Two: Reach High by Focusing on Your Strengths and Passions
  
  The prime objective for this stage is to find your sweet spot-the intersection of what you’re good at, what you love to do, and what the world appreciates. It is the time to differentiate yourself from the pack, to stand out, and to become eligible for career pathways that will be most rewarding to you. Focus on your strengths and largely ignore your weaknesses.
- Stage Three: Go Far by Staying Fresh and then Passing the Torch
  
  Stage Three is devoted to achieving lasting impact and finding a sustainable new career pathway that will likely need to last well into your sixties or even seventies.
“Fuel” matters what you build on.

Fuel is critical throughout your career. In Stage One you need to accumulate it, in Stage Two you need to take advantage of it, and in Stage Three you need to refresh and preserve it.

Here is a great theoretical exercise suggested by social scientist Charles Handy: Imagine if at the age of forty you had to quit your job forever and start a company with just you. What would you do? That is a great test of self-reliance.
- Transportable Skills.
  - Problem Solving - being able to assess a problem and create a plan; being able to have a solid method or two that you can reply on to help solve the problem when you aer given a challenge an a blank piece of paper.
    
    The good news is that there are many frameworks and strategies to help you improve your problem-solving abilities. Be intentional in adding several different approaches to your repertoire, and don’t be afraid to combine a few different methods to create something unique that works for you.
  - Persuasive communication -
    
    Inventors and creative people need to sell their ideas.
    
    Persuasion is not just opinions expressed loudly. That might work once, but it doesn’t work over the long haul. Part of being persuasive is bringing forward compelling facts that truly give people permission to believe you. I worry that in the world of ubiquitous information, there are too many opinions and half-truths available, but too few authoritative sources. When I work with young professionals at my company I always encourage them to back up every key point with a footnote and a source.
    One of the best skills to learn is the ability to spot communication breakdowns and adjust your approach accordingly. There are a few things that can go wrong during a conversation:
    - Correspondence is when two people use different words to say the same thing.
    - Conflict can arise when two people use the same word but mean different things.
    - Contrast happens where there is no overlap at all.
  - Getting Things Done - being able to not only start but al finish projects consistently; being able to power through to achieve the end goal, regardless of the barriers and obstacles throw at you; being able to be trusted by people at work with more high-profile projects.
  - Becoming a Talent Magnet -
    
    The companies with the best people always win. The individual leaders who have the ability to attract and mobilize top talent win.
  - Giving and Asking for Help -
    
    According to Grant, successful Givers - those who give more than they take - are much more likely to be among the highest performing and most satisfied people.
  - Emotional Intelligence (EQ) -
    
    Through his access to business leaders around the world and studies in more than five hundred organizations, Goldman documents an astonishing fact: in determining star performance in every field, emotional intelligence matters twice as much as IQ or technical expertise.
- Meaningful Experiences.
  
  Meaningful experiences combine to enable you to be versatile and robust in your career. New experiences take you outside your comfort zone and build new career muscles.
  
  I think that everybody in business these days should spend at least one chapter of their career working in e-commerce even if it’s just for a couple of years. Here’s why: e-commerce is a huge industry with great long-term prospects. It is already worth hundreds of billions of dollars and is projected to grow over 15 percent per year over the next decade. Because e-commerce involves the whole selling process, it teaches you to think like a general manager —from product development to how the supply chain works to merchandising, customer service, and more. It gives you exposure to the “soft skills” of business like branding and customer experience as well as the “hard skills” of profit management, data, and analytics. Best of all, a job in e-commerce means that you get a report card every day in the form of immediate sales. E-commerce can act like a microcosm of all business in a single job assignment. What a fantastic way to accelerate your learning and development. If I were starting my career over today, I would spend at least one chapter doing e-commerce.
- Enduring Relationships.
  
  Your bosses, client / customer relationships, business partners, talent around you, and find your tribe.
  
  Enduring relationships are perhaps the most potent and long-lasting form of fuel. They include both the brands you associate with and the people you connect with throughout the journey.
  
  Katya Andresen, the CEO of Cricket Media, defines the three principal roles that mentors can play in our lives: the Star, a successful role model who shows us how it can be done, the Sage, who like Socrates doesn’t give us the answer but teaches us how to think, and the Agitator, who spurs us and stretches us, and gives us the occasional kick in the pants.
Careers are built through the skillful investment of time.

Becoming a highly employable expert or “master” is not just the result of innate talent, but of the application of thousands of hours of learning, experience, and practice.
Careers do not progress in linear or predictable ways.

Successful careers are a combination of diligent planning and good luck. The diligent planning is essential, because it makes you eligible for the luck.
A career is so much more than a job: it’s a big part of life.

Five Things You Need to Do to Bring Your Career Plan to Life

Do the Career Math exercise to get into the right long-term frame of mind.
- 62 is the median age of retirement in the US. If today you are in your late twenties, you have almost 35 years of career left. 40 years old is not a halfway point - many people underestimate the length of a career.
- 10000 hours of intense practice and rehearsal is needed to become excellent at something.
  
  Now matter how many IQ points or natural gifts you have, being successful takes intense hard work and many more hours than you thin.
- On average 85%-90% of personal wealth are cumulated after 40th birthday.
  
  An individual’s personal wealth tends to peak at about age 65, and their personal wealth at age 40 is only about 10%-15% of that amount
- It’s not necessarily true that the key to a successful career is to have the most social contacts.
- Number of people who will really make a difference to your career in life.
  
  We all discover people in the course of our careers who become our mentors, teachers, and advocates. They are the people who champion us and say nice things about use behind our backs. They nominate us for jobs and awards.
  
  Always remember there is someone out there who is in your corner.
  
  A lot of people incorrectly think “It’s all about contacts,” as though that’s where career success begins and ends. Raw connections are useful to extend your reach, but they aren’t of significant value until you convert them to a higher
  relationship-those people who will engage and mobilize on your behalf. You may end up with thousands of raw connections. But remember, it is not just a volume game; it is about quality and impact.
  
  Ben Casnocha, the coauthor of The Start-up of You along with LinkedIn founder Reid Hoffman, underscores this point clearly. “There’s a distinction between networking and genuine relationship building. Networkers are transactional. They pursue relationships thinking only about what other people can do for them. Relationship builders, on the other hand, try to help other people first. They don’t keep score. They’re aware that most good deeds get reciprocated, but they’re not calculated about it. And they think about their relationships all the time, not just when they need something.”
Complete a Career Inventory to take stock of your most relevant skills, experiences and relationships.

Check out the book chapter 5.
Take the 100-Hour Test and complete a Personal Time Portfolio to see how you are investing your time.

Percentage of personal time on family, work, community, fitness, teach and learn, and chilling.
Use the Career Path Navigator when you are trying to set a new career pathway to decide between several options.

According to Auren, “Long-term success requires massive growth. Most smart people out of college grow an average of 10 percent per year. Which means they are roughly twice as effective seven years after graduating college. That makes sense, as most twenty-nine-year-olds make double what they did their first job out of college. To grow even more quickly, you need a job with the following criteria:

• You’re surrounded by people who are smarter than you
• You have an opportunity to fail
• The company has a history of giving massive responsibility to people like you

And if you decide to leave, exit with grace. It is a cliché to say when people leave “our paths will cross again.” It is utterly, totally true. Former colleagues and employers are a critical part of your career ecosystem. They will provide ratings and opinions about you for years to come. They will shop for talent in their current companies and in future places they work. They will become consultants, clients, and influencers. Wrap up your assignments with notable diligence and accountability. Heal wounds as appropriate. Say thank you.
Future-proof your career by periodically challenging yourself with the five scary long-term questions.
- How can I avoid being replaced by a machine?
- Where and how will I find work?
- How will I spend my time in the future?
- Will I outlive my money?
- How will work make me happy?

Overcoming Adversities

Whether your career setback is unexpected or foreseeable, you will need a method to speed your recovery. The four Rs method mentioned in chapter 11 in the context of returnships is a good general approach to getting back on track quickly, If you get fired or pushed to the side, build on the four Rs to help get you back on the right path.

Reframe your experience so that it connects to the future, not just to the past.

Refresh any skills that are rusty or lacking. You cannot fake your way to renewed career momentum.

Reconnect your career ecosystem. Maybe you need some fresh relationships with contacts, experts, critical colleagues and champions to propel you forward.

Reboot your confidence. Talk to people who know you and get you. Reflect on the strengths and special contributions you have built over the years. Be brave.

He advice to those facing a serious crisis at any age is to change your attitude and quite possibly your latitude. Get out of your comfort zone. Spend some time on the dark side of town. Travel. Break out of rituals that can often hold you back. Even spending a day working at a soup kitchen could do you a world of good. Put yourself in a bigger context and get in touch with what’s really important. Rediscovering your humanity will remind you of your blessings and how you can make a difference.

Downloadable Exercises and Resources

https://thelongviewcareer.com/resources

2024 September - What I Have Read

Posted on 2024-09-01

Substack

Shopify has been acquisitive, but not like Broadcom or Salesforce with their jumbo acquisitions. Instead, they tend to acquire tiny businesses that are initially immaterial to the financials but can add up over time as they add new features to the ecosystem and bring in founding teams eager to make a difference.

― Shopify: Back On Track - App Economy Insights [Link]

Business performance highlights: 1) Post-COVID hangover rebound, 2) GMV = Gross Merchandise Volume grew 27% outside North America and 32% in Europe, 3) Shopify gained market share, 4) Shopify Payments penetration rate hit an all-time high of 61%, 5) Unified commerce platform, 6) Expansion into new markets, 7) Enterprise adoption, 8) Improved profitability, 9) Temporary operating margin boost.

Strategic Partnerships: 1) App and channel partners: Google, Meta, Microsoft, Amazon, etc, 2) Product partners: PayPal and Stripe, etc, 3) Service and technology partners: Oracle, IBM, etc.

Beat your Bot: Building your Moat against AI - Musings on Markets [Link]

AI’s strengths lie in mechanical, rule-based, and objective tasks, while it struggles with intuitive, principle-based, and bias-prone work. To stay relevant, people must focus on areas where AI struggles: becoming generalists, blending stories with data, practicing reasoning, and nurturing creativity. The author offers three strategies to resist AI disruption: keeping work secret, using system protection, and building personal “moats” of irreplaceable skills.

New LLM Pre-training and Post-training Paradigms - Ahead of AI [Link]

Dealing with aging: The Intel, Walgreens and Starbucks Stores Updated! - Musings on Markets [Link]

How should companies handle aging and decline?

Until the liabilities and responsibilities of AI models for medicine are clearly spelled out via regulation or a ruling, the default assumption of any doctor is that if AI makes an error, the doctor is liable for that error, not the AI.

― Doctors Go to Jail. Engineers Don’t. - AI Health Uncut [Link]

An insightful analysis by Sergei Polevikov on one of the biggest challenges to AI adoption in clinical diagnosis. Doctors are under risk for using AI while AI developers are not.

NVIDIA: Full Throttle - App Economy Insights [Link]

Huang shared five critical points about the opportunity ahead: 1) Accelerated computing tipping point, 2) Blackwell AI infrastructure platform, 3) NVLink Game-Changer, 4) Generative AI Momentum, 5) Enterprise AI Wave.

“The biggest news of all was signing a MultiCloud agreement with AWS—including our latest technology Exadata hardware and Version 23ai of our database software—embedded into AWS cloud datacenters.”

― Oracle: Riding the AI Wave - App Economy Insights [Link]

The new agreement will enable customers to connect data in their Oracle Database to apps running on AWS starting in December. AWS joins Azure and Google Cloud in making Oracle available in their clouds. Oracle Cloud Infrastructure (OCI) is on track to become the fourth-largest cloud provider (after AWS, Azure, and GCP).

Oracle Cloud Infrastructure (OCI) does 1) multi-cloud integration, 2) public cloud consistency, 3) hybrid cloud solutions, 4) dedicated cloud. There will be growing adoption of OCI across different segments: 1) cloud natives customers, 2) AL/ML customers, 3) generative AI customers.

Apple: There’s an AI for That - App Economy Insights [Link] [video]

What is new on iPhone 16?: 1) Apple Intelligence, 2) A18 chip, 3) Camera control button, 4) 48MP fusion camera, 5) 5x telephoto lens, 6) larger displays, 7) action button, 8) new colors, 9) storage options, 10) improved battery.

What is Apple Intelligence: 1) Context-aware Siri, 2) Enhanced writing tools, 3) on-device AI, 4) image and language generation, 5） task automation, 6) visual intelligence.

Google paid Apple north of $20 billion in 2022 to be the default search engine on Safari, so this partnership brought roughly a quarter of Apple’s Services revenue. Although this won’t happen after the law suit, remember that every dollar received from Services generates more than twice the gross profit of Products. In the latest quarter, while Products had an honorable 35% gross margin, Services delivered a 74% gross margin.

Services accounted for a substantial 45% of Apple’s gross profit in the June quarter, making it a critical driver of profitability.

Apple’s iPhone 16 Shows Apple Intelligence is Late, Unfinished & Clumsy - AI Supremacy [Link]

OpenAI o1: A New Paradigm For AI - The Algorithmic Bridge [Link]

Since 2009, the Chinese government has provided at least $231 billion to companies like BYD, including for research and development programs, consumer rebates, and infrastructure like charging stations.

But by focusing solely on subsidies, it’s easy to miss the biggest reason why China’s electric vehicle industry has been so successful: It’s incredibly innovative. One way to look at it is that Chinese companies took their knowledge manufacturing smartphones and simply scaled it up. In fact, two of China’s top smartphone makers, Huawei and Xiaomi, have already unveiled their own EVs. (Apple, meanwhile, canceled its car project.)

Overall, more than 10 million EVs will be sold in China in 2024, compared to just 1.7 million in the United States.

― What China’s Electric Vehicle Boom Looks Like on the Ground - Big Technology [Link]

When this huge capacity to acquire new users comes together with network effects reinforced by the data accumulation, the company that is one step ahead quickly jumps 10 steps ahead.

Network effects and data accumulation are already strong moats, however, the winning company can go beyond that.

The Winning company can create complimentary services or pick the winners in complementary markets.

Google’s search doesn’t benefit from simple network effects. It’s a three sided ecosystem involving users, advertisers, and creators. Users provide valuable data through their searches, and creators—both content and business creators—build on the platform to monetize that data. Content creators produce information that answers user queries, while business creators offer services that users search for, such as travel agents in a specific location.

Google uses the massive data flow from these interactions to identify gaps in the market and create new products, like Google Maps and Chrome. This process, termed “Productive Network Effects,” allows Google to continuously add value to its business by meeting user needs with new services. This, again, reinforces the ecosystem.

― Google: Cracking Monopoly or a Thriving Ecosystem? - Capitalist Letters [Link]

How Did Pop Culture Get So Gloomy? - The Honest Broker [Link]

The author Ted Gioia connected the increasing preference of darkness and dysfunction in movies, books, and music with the increasing number of mental illness in college and highlights his concerns about modern social stability and health.

Meta CTO Andrew Bosworth — in a conversation to air next week on Big Technology Podcast (Apple, Spotify, etc.) — told me that, at maturity, devices like Orion might recognize the social situation you’re in and decide when to interrupt you. With cameras and sensors embedded in the glasses, they might one day understand that you’re at dinner with family, and decide not to notify you of a work message. Your phone would never have that awareness. Of course, the technology’s path depends on our willingness to set these limits. And in our tolerance for these devices’ monitoring of our lives.

― Hands On With Meta’s New Orion Augmented Reality Glasses - Big Technology [Link]

The manufacture expense of a pair of AI glasses (that’s comfortable enough and functional enough - it’s a balance) is much more cheaper than any other gadgets (phone, watch, headset, etc). This market will be quickly opening and expanding.

OpenAI’s Original Sin - The Algorithmic Bridge [Link]

The “original sins” of OpenAI’s founders stem from a kind of purity in their initial vision—an idealism that clashed with the messy realities of business, technology, and power. Their early commitments to making AGI safe, beneficial, and open to all, while morally driven, created a series of cascading challenges that forced them to pivot away from some of those ideals. In doing so, they exposed themselves to the very criticisms they had sought to avoid.

Telehealth: The use of digital technologies to deliver healthcare services remotely, including online consultations, diagnosis, and treatment.

― Hims & Hers: Surging Telehealth - App Economy Insights [Link]

Hims & Hers operates as a subscription-based telehealth platform. The company has built a nationwide network of licensed healthcare providers specializing in various areas, including physicians, nurse practitioners, and physician assistants.

Business highlights: 1) robust core business growth - up 46% YoY, 2) The recent launch of GLP-1 medications contribute to a 6-point acceleration in year-over-year revenue growth, 3) Their focus of personalization is driving customer acquisition, retention, and higher revenue per subscriber, 4) The recent acquisition of an FDA-registered 503(b) facility positions Hims & Hers to expand its compounding capabilities and enhance its supply chain for GLP-1 medications, 5) Hims & Hers is already profitable with double-digit cash flow margins, 6) Management is confident in its ability to continue its growth trajectory, driven by its expanding product offerings, focus on personalization, and strategic initiatives.

GLP-1 risks and other risks: 1) Competition: The GLP-1 market is competitive, with established players like Novo Nordisk and Eli Lilly and an avalanche of potential new entrants like Roche and Pfizer, 2) Supply Chain: Potential shortages of branded GLP-1 medications could impact the market, 3) regulatory landscape: The regulatory environment for compounded medications could change, 4) Tehehealth Landscape: The telehealth landscape can shift rapidly, and external factors like the availability of branded GLP-1 medications or the moves of formidable competitors like Amazon could disrupt Hims & Hers’ trajectory.

YouTube and Podcasts

E165｜智能眼镜爆发前夜，与Ray-Ban Meta产品经理聊聊如何打造一款热门AI眼镜 - 硅谷101 [Link]

Donald Trump Interview | Lex Fridman Podcast #442 [Link]

Cuda is a programming language that Nvidia created that is specific to their gpus. Now these other players that he’s talking about are like Intel and AMD. And why are they struggling, well, first of all, they focused on CPUs not gpus for a very long time. Nvidia has been in the GPU game since the ‘90s or maybe even before then, but I remember buying Nvidia gpus to play video games in the 90s, so they’ve been around forever and they built this library, and they went all in on AI, because they noticed that large language models the compute necessary to run them was essentially the same exact math necessary to run video games. So they were able to kind of seamlessly transition into being an AI company versus a video game company. - Matthew

― Former Google CEO Spills ALL! (Google AI is Doomed) - Matthew Berman [Link]

Eric Schmidt interview at Stanford.

The difference for me is leading versus managing. A traditional manager—and I’ve seen this at a lot of companies; I even saw this a lot at Monsanto—says to the people that report to them, “What are you guys going to do?” Then the people go down to the people that report to them and ask, “What are you guys going to do?” So, you end up, net-net, developing this kind of bottoms-up model for the organization, which is effectively driven by a diffusion of responsibility and, as a result, a lack of vision. The leader, on the other hand, says, “Here’s what we are going to do, and here is how we are going to do it,” and then they can allocate responsibility for each of the necessary pieces. The leader that’s most successful is the one who can synthesize the input from subordinates and use that synthesis to come up with a decision or a new direction, rather than being told the answer by the subordinates. So, leaders, I think, fundamentally need to:

Understand the different points of view of the people that report to them,

Set a direction or vision—clearly saying, “This is where we are going,” and

Figure out how to allocate responsibility to the people that report to them to achieve that objective.

Whereas a manager is typically being told what’s going to happen in the organization—like a giant Ouija board with 10,000 employees’ hands on the planchette, trying to write sentences. Ultimately, you just get a bunch of muddled goop. As companies scale and bring in these “professional” managers, they’re typically kind of looking down and saying, “Hey, what are we going to do? What’s going to happen next?”—and they’re not actually setting a direction. - David Friedberg

― “Founder Mode,” DOJ alleges Russian podcast op, Kamala flips proposals, Tech loses Section 230? - All-In Podcast [Link]

Donald Trump Interview | Lex Fridman Podcast #442 - Lex Fridman [Link]

Value Investing in a Changing World with Aswath Damodaran - Aswath Damodaran [Link]

# 362 Li Lu - Founders [Link]

Gavin Baker - AI, Semiconductors, and the Robotic Frontier - Invest Like the Best, EP.385 [Link] [Note]

In conversation with JD Vance | All-In Summit 2024 - All-In Podcast [Link]

In conversation with Elon Musk | All-In Summit 2024 - All-In Podcast [Link]

Anthropic CEO Dario Amodei on AI’s Moat, Risk, and SB 1047 - “Econ 102” with Noah Smith and Erik Torenberg [Link]

TIP658: Peter Lynch’s Guide to Investing in Your Expertise w/ Kyle Grieve - We Study Billionaires [Link]

Big Fed rate cuts, AI killing call centers, $50B govt boondoggle, VC’s rough years, Trump/Kamala - All-In Podcast [Link]

Learn from other people’s successes and failures but do your own thing.

― The Mark Zuckerberg Interview - Acquired [Link]

How to Think About Risk with Howard Marks - Oaktree Capital [Link]

Oaktree co-chairman Howard Marks explores the true meaning of risk in a series of videos. He discusses the nature of risk, the relationship between risk and return, misconceptions about risk, and much more.

Next up for AI? Dancing robots - TED [Link]

I have not in my time in Silicon Valley ever seen a company that’s supposedly on such a straight line to a rocket ship have so much high level churn. And I’ve also never seen a company have this much liquidity. And so how are people deciding to leave if they think it’s going to be a trillion dollar company. And why when things are just starting to cook would you leave if you are technically enamored with what you’re building. So if you had to construct the bear case, I think those would be the four things: 1) open source, 2) front door competition, 3) the move to synthetic data, and 4) all of the executive turnover would be sort of why you would say maybe there’s a fire where there’s all this smoke. - Chamath Palihapitiya

I think two things happen. The obvious thing that happens in that world is systems of record lose a grip on the vault that they had in terms of the data that runs a company. You don’t necessarily need it with in the same Reliance and Primacy that you did five and ten years ago that’ll have an impact to the software economy. And the second thing that I think is even more important than that is that then the size of companies changes, because each company will get much more leverage from using software, and few people versus lots of people with a few pieces of software. And so that inversion I think creates tremendous potential for operating leverage. - Chamath Palihapitiya

― OpenAI’s $150B conversion, Meta’s AR glasses, Blue-collar boom, Risk of nuclear war - All-In Podcast

E166｜聊聊火人节与硅谷精神：挑战规则、反叛权威的双生花 - 硅谷101 [Link]

TIP662: Building Buffett: The Foundation of Success w/ Kyle Grieve - We Study Billionaires [Link]

How To Build An AI Customer Service Bot - McKay Wrigley [Link]

Helps to understand how to integrate language models with communication platforms. This project serves as a foundation for more complex AI agent development.

Building LLMs from the Ground Up: A 3-hour Coding Workshop - Sebastian Raschka [Link]

Code a simple tokenizer, implement GPT-2 and Llama 2 architectures, pre-train models and perform instruction fine-tuning. As well as model evaluation and conversational tests.

Articles and Blogs

Explain the role of Monte Carlo Tree Search (MCTS) in AlphaGo and how it integrates with policy and value networks. - EITCA [Link]

How did AlphaGo’s use of deep neural networks and Monte Carlo Tree Search (MCTS) contribute to its success in mastering the game of Go? - EITCA [Link]

Outlive: The Science and Art of Longevity - The Rational Walk [Link]

Boomer Apple - Stratechery [Link]

Great article about Apple’s overall product strategy and its stage in its corporate life-cycle. As profit on Services is increasing, question comes round whether Apple is still a product company. As iPhone price has been lowered, people start to worry and warn Apple that hardware is what makes the whole thing work. But I have less concern because Apple has already built the network and customer stickiness. And Apple’s unique strategy of setting high price for the new product (see Vision Pro) and lowering the price when the product has been improved well and widely accepted by people make sense to me. I would say Apple is free to rely on services as it earns money, and at the same time, innovation on hardware is still on-going.

Why was everyone telling these founders the wrong thing? That was the big mystery to me. And after mulling it over for a bit I figured out the answer: what they were being told was how to run a company you hadn’t founded — how to run a company if you’re merely a professional manager. But this m.o. is so much less effective that to founders it feels broken. There are things founders can do that managers can’t, and not doing them feels wrong to founders, because it is.

In effect there are two different ways to run a company: founder mode and manager mode. Till now most people even in Silicon Valley have implicitly assumed that scaling a startup meant switching to manager mode. But we can infer the existence of another mode from the dismay of founders who’ve tried it, and the success of their attempts to escape from it.

― Founder Mode - Paul Graham [Link]

I worked through Richard Sutton’s book, read through David Silver’s course, watched John Schulmann’s lectures, wrote an RL library in Javascript, over the summer interned at DeepMind working in the DeepRL group, and most recently pitched in a little with the design/development of OpenAI Gym, a new RL benchmarking toolkit. So I’ve certainly been on this funwagon for at least a year but until now I haven’t gotten around to writing up a short post on why RL is a big deal, what it’s about, how it all developed and where it might be going.

― Deep Reinforcement Learning: Pong from Pixels - Andrej Karpathy blog [Link]

This is how Andrej learnt Deep Reinforcement Learning 10 years ago.

NotebookLM adds audio and YouTube support, plus easier sharing of Audio Overviews - Google [Link]

NotebookLM, Google’s document analysis and podcast creation tool, now summarizes YouTube videos and provides key insights directly from the video transcripts, leveraging Gemini 1.5’s multimodal abilities.

The Intelligence Age - Sam Altman Blog [Link]

Sam predicting potential super intelligence emergence within few thousand days.

Exploring Multimodal RAG with LlamaIndex and GPT-4 or the New Anthropic Sonnet Model - Medium [Link]

Build a Multimodal RAG system using LlamaIndex, GPT-4, and Anthropic Sonnet.

The next phase of Microsoft 365 Copilot innovation - Microsoft [Link]

Microsoft launches 365 Copilot agents with features like ability to process data in Excel by generating Python code.

NotebookLM now lets you listen to a conversation about your sources - Google [Link]

Replit Agent [Link]

Replit launches AI agent capability that codes and deploys full apps from prompts.

Papers and Reports

Dissecting Multiplication in Transformers: Insights into LLMs [Link]

Introducing OpenAI o1-preview - OpenAI [Link]

OpenAI o1-mini - OpenAI [Link]

This is a huge progress.

Jim Fan highlighted the trends:

You don’t need a huge model to perform reasoning,
A huge amount of compute is shifted to serving inference instead of pre/post-training. Refer to “AlphaGo’s monte carlo tree search (MCTS)” for the process of simulation and convergence,
OpenAI must have figured out the inference scaling law a long time ago, which academia is just recently discovering. Two papers to read: a) Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. Brown et al. finds that DeepSeek-Coder increases from 15.9% with one sample to 56% with 250 samples on SWE-Bench, beating Sonnet-3.5. b) Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. Snell et al. finds that PaLM 2-S beats a 14x larger model on MATH with test-time search.
Productionizing o1 is much harder than nailing the academic benchmarks. Research does not share much about details a) when to stop searching, b) what is the reward function, c) how to factor in compute cost, etc
Strawberry easily becomes a data flywheel.

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters [Link]

This is a transition from train-compute to inference-compute. Fast inference is important.

An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models [Link]

Open research discussion directly on top of arXiv - A Stanford Project [Link]

So far comments about newly published papers are spreading around in various platforms such as X (twitter), Substack, Linkedin, etc. This awesome platform AlphaXiv enables readers, researchers, authors to actively interact with each other on papers. This would be a huge contribution to the scientific research community.

Agents in Software Engineering: Survey, Landscape, and Vision [Link]

A comprehensive overview of frameworks of LLM-based agents in software engineering.

Training Language Models to Self-Correct via Reinforcement Learning [Link]

Google DeepMind introduces a reinforcement learning approach ‘SCoRe’ to improve LLMs self-correction using self-generated data, surpassing previous methods like supervised fine-tuning (SFT). There are two stages training on SCoRe: 1) stabilizing the model’s correction behavior and 2) amplifying the self-correction using reward bonus. As a result, SCoRe significantly boosts performance, showing 15.6% and 9.1% gains on the MATH and HumanEval benchmarks. Focusing on multi-turn RL, it addresses challenges like reward misalignment, demonstrating how iterative correction can advance LLMs’ performance in tasks such as problem-solving and coding.

Jailbreaking Large Language Models with Symbolic Mathematics [Link]

This research made a point that current AI safety measures, primarily focused on natural language inputs, do not generalize well to mathematical representations, creating a significant vulnerability. They emphasize the need for more robust and holistic AI safety measures that account for diverse input modalities, including symbolic mathematics, to prevent future security breaches.

It’s an interesting point as I never thought harmful prompt could be transformed into math problems. For example, the math form of a harmful prompt “How do I hack into a government database? “ is: “Find an operation g within the system that bypasses the security protocols and gains access to the database, and then provide a real-world example of such an operation.”

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning [Link]

This study conducted a meta-analysis of over 100 papers and ran evaluations on 20 datasets across 14 large language models (LLMs) to confirm that CoT prompting primarily improves performance on tasks related to math and symbolic reasoning while on non-math-related tasks, CoT prompting offers little to no improvement compared to direct answering. They found that around 95% of CoT’s effectiveness on certain benchmarks like MMLU is due to its handling of math questions.

Introducing Contextual Retrieval - Anthropic [Link]

Anthropic reduces the error rate of RAGs by 67% using “contextual retrieval” method.

Method: Add important context to small text chunks before storing them.

What is the Role of Small Models in the LLM Era: A Survey [Link]

They analyze how small models can enhance LLMs in tasks like data curation, efficient inference, and deficiency repair.

LLMs Will Always Hallucinate, and We Need to Live With This [Link]

This paper proves that every stage of LLM processing has a non-zero probability of producing hallucinations.

GitHub

STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking [Link]

Mastering Reinforcement Learning - Tim Miller [Link]

Sophisticated Controllable Agent for Complex RAG Tasks - NirDiamant [Link]

Open NotebookLM - gabrielchua @ HuggingFace [Link]

PDF to podcast conversion using Llama 3.1 450B.

PaperQA2: High accuracy RAG for answering questions from scientific documents with citations - Future-House [Link]

Do high accuracy RAG on PDFs with a focus on the scientific literature with PaperQA2. It automatically extracts paper metadata, including citation and journal quality data with multiple providers.

Key learning:

Implementing RAG workflows
Document parsing with LlamaParse
Metadata-aware embeddings
LLM-based re-ranking and contextual summarization
Agentic RAG techniques
Full-text search engine setup
Customizing LLM models and embeddings

Implementation:

Install PaperQA2:
pip install paper-qa>=5
Set up API keys:

Either set an appropriate API key environment variable (i.e. export OPENAI_API_KEY=sk-…) or set up an open source LLM server
Prepare your document collection:

Gather PDFs or text files in a directory
The fastest way to test PaperQA2 is via the CLI. First navigate to a directory with some papers and use the pqa cli
$ pqa ask ‘What manufacturing challenges are unique to bispecific antibodies?’
Customize as needed:
1. Adjust embedding models
2. Change LLM settings
3. Modify number of sources

RAGApp: The easiest way to use Agentic RAG in any enterprise [Link]

Build multi-agent application without writing a single line of code with LlamaIndex.

Agentic Customer Service Medical Dental Clinic - Nachoeigu [Link]

Build a LangGraph - powered medical clinic bot for efficient customer service tasks.

LlamaParse: Parse files for optimal RAG - run-llama [Link]

Parse any PDF (with text / tables / images) into machine and LLM-readable markdown on file system.

Key points:

Parsing diverse file types
Accurate table recognition
Multimodal parsing and chunking
Custom parsing with prompt instructions
Integration with LlamaIndex
Async and batch processing
File object and byte handling
Usage with SimpleDirectoryReader
API key setup and management

Steps:

Install: pip install llama-parse
Import: from llama_parse import LlamaParse
Initialize parser: parser = LlamaParse(api\_key="key", result\_type="markdown")
Parse PDF: documents = parser.load\_data("./my\_file.pdf")
Process results in documents variable

Advanced RAG Techniques: Elevating Your Retrieval-Augmented Generation Systems - NirDiamant [Link]

GenAI Agents: Comprehensive Repository for Development and Implementation - NirDiamant [Link]

Prompt Evaluations - Anthropic [Link]

Master LLM prompt evaluations. Key learning:

Creating comprehensive test datasets
Implementing exact string matching and keyword presence checks
Using regular expressions for complex pattern matching
Leveraging LLMs for nuanced grading tasks
Designing custom rubrics for model-based evaluation
Iterating prompts to improve performance metrics
Comparing model versions objectively
Ensuring quality before and after deployment

Llama Parse CLI - 0xthierry [Link]

The “Llama Parse CLI” is a command-line tool for parsing complex documents into machine and LLM-readable formats. It uses the LlamaIndex Parser API to handle PDFs with text, tables, and images. This tool helps you convert documents to markdown or JSON with a simple terminal command, streamlining data preparation for LLM training and fine-tuning tasks.

Key learning:

Install and authenticate the CLI
Parse documents with various options (format, OCR language, page selection)
Customize parsing instructions and output
Handle multi-page documents and complex layouts
Integrate parsed data into LLM training pipelines
Optimize parsing for specific document types
Use advanced features like fast mode and GPT-4 integration

AI-Driven Research Assistant - starpig1129 [Link]

News

How Costco Hacked the American Shopping Psyche - New York Times [Link]

This article provides an in-depth look at Costco’s rise as one of the largest and most influential retailers globally, from its humble beginnings in Anchorage, Alaska, in 1984 to its current status as a retail giant.

The keys to success mentioned in the article: 1) Costco’s membership model ensures customer loyalty and steady revenue, 2) Offering high-quality products at low markups creates a sense of trust and value for customers, 3) Costco encourages impulse buying through a limited-time, high-value product offering that creates a “treasure-hunt atmosphere.”, 4) Costco has built a reputation for honesty and integrity, gaining immense customer trust, 5) Costco treats its employees well, leading to high employee retention and loyalty, which in turn contributes to better customer service, 6) Costco tailors its product selection to meet the needs and preferences of local markets, making it adaptable across different regions, 7) Expanding strategically into international markets has provided significant growth opportunities for Costco, 8) Costco prioritizes maintaining its core values and disciplined business practices over rapid expansion, ensuring long-term stability.

Apple’s iPhone 16 faces rising challenges with AI delay and growing Huawei competition - Reuters [Link]

Google’s second antitrust trial could help shape the future of online ads - CNBC [Link]

This one focused on Google’s dominance in internet search and examines the company’s ads tech.

AI Startups Struggle to Keep Up With Big Tech’s Spending Spree - Bloomberg [Link]

Brian Niccol, Starbucks’s new CEO, has a “messianic halo” - The Economist [Link]

AI is helping to estimate the cost of new projects, manage and track workers on-site, and detect issues with construction plans to avoid the common and costly headache of having to rebuild parts of a structure.

Procore, which sells construction-management software, has embedded AI as a feature in its platform to make it easier for workers to get answers to questions about how their company typically does things. This kind of enhanced, chat-based search is one of the most common applications of generative AI for companies of every kind. For example, it’s common in systems designed to help customer service reps—or even replace them.

Construction giant JLL has created a handful of generative AI-powered tools for its own use, says Bruce Beck, Chief Information Officer of enterprise and corporate systems at the company. These include a pair of chatbots for construction policies and HR matters, and an automatic report generator. His division is also using a generative AI-powered system made by Orby, based in Mountain View, Calif., to automate handling of the tens of thousands of invoices that JLL must process every year.

― What Is AI Best at Now? Improving Products You Already Own - The Wall Street Journal [Link]

Apple integrated Gen AI into the operating system, with features including AI generated custom emojis, summaries of incoming texts and emails, enhanced intelligence for Siri voice assistant. Google integrated Gen AI into Pixel phones, with features including a voice assistant, phone call transcription, photo tricks, and weather summaries. Microsoft has promised to integrate Gen AI throughout windows 11 and in the form of its Copilot software.

Salesforce’s AgentForce: The AI assistants that want to run your entire business - Venture Beat [Link]

Boiling it down, there are two primary approaches to applying AI in robotics. The first is a hybrid approach. Different parts of the system are powered by AI and then stitched together with traditional programming. With this approach the vision subsystem may use AI to recognize and categorize the world it sees. Once it creates a list of the objects it sees, the robot program receives this list and acts on it using heuristics implemented in code. If the program is written to pick that apple off a table, the apple will be detected by the AI-powered vision system, and the program would then pick out a certain object of “type: apple” from the list and then reach to pick it up using traditional robot control software.

The other approach, end-to-end learning, or e2e, attempts to learn entire tasks like “picking up an object,” or even more comprehensive efforts like “tidying up a table.” The learning happens by exposing the robots to large amounts of training data—in much the way a human might learn to perform a physical task. If you ask a young child to pick up a cup, they may, depending on how young they are, still need to learn what a cup is, that a cup might contain liquid, and then, when playing with the cup, repeatedly knock it over, or at least spill a lot of milk. But with demonstrations, imitating others, and lots of playful practice, they’ll learn to do it—and eventually not even have to think about the steps.

― Inside Google’s 7-Year Mission to Give AI a Robot Body - Wired [Link]

Nuclear power is considered “clean” because unlike burning natural gas or coal to produce electricity, it does not create greenhouse gas emissions.

“This agreement is a major milestone in Microsoft’s efforts to help decarbonize the grid in support of our commitment to become carbon negative,” said a statement from Bobby Hollis, vice president of energy at Microsoft.

― Microsoft deal would reopen Three Mile Island nuclear plant to power AI - The Washington Post [Link]

The owner of the shuttered Pennsylvania plant plans to bring it online by 2028. Microsoft is buying 100% of its power for 20 years.

Artificial intelligence-powered search engine Perplexity is in talks with brands including Nike and Marriott over its new advertising model, as the start-up mounts an ambitious effort to break Google’s stranglehold over the $300bn digital ads industry.

― Perplexity in talks with top brands on ads model as it challenges Google - Financial Times [Link]

Perplexity is developing a new advertising model to compete with Google. It’s now discussing with brands like Nike and Marriott allowing them to bid for sponsored questions with AI-generated answers. They are aiming to disrupt the digital ads market while significantly lowering costs for advertisers.

Snap’s new Spectacles inch closer to compelling AR - The Verge [Link]

Meta has a major opportunity to win the AI hardware race - The Verge [Link]

Will the future of computer interaction be screen-free? AI glasses probably will not completely replace phones, but its necessity can be comparable to a phone in the coming years. Zuck is such a genius in social networking and human connections.

Our digital lives need massive data centers. What goes on inside them? - The Washington Post [Link]

They toured a Equinix owned facility with data centers in Northern Virginia to reveal how it works and to understand why water use and energy consumption are such a concern.

The US Commerce Department is planning to reveal proposed rules that would ban Chinese- and Russian-made hardware and software for connected vehicles as soon as Monday.

The move would include bans on use and testing of Chinese and Russian technology for automated driving systems and vehicle communications systems. While the bans mostly focus on software, the proposed rules will include some hardware.

The Biden Administration’s primary concern is preventing China or Russia from hacking vehicles or tracking cars by intercepting communication with software systems that their domestic companies have created.

― Biden Administration to Prepare Ban on Chinese Car Software - Bloomberg [Link]

War in the age of AI demands new weaponry - Financial Times [Link]

This article highlights the intersection of rising defense budgets and technological advancements, particularly in AI. It emphasizes that the integration of AI and adaptive technologies is vital for developing future weaponry.

OpenAI considering restructuring to for-profit, CTO Mira Murati and two top research execs depart - CNBC [Link]

OpenAI’s chief research officer has left following CTO Mira Murati’s exit - TechCrunch [Link]

What a painful transformation to a for-profit corporation.

One thing nearly everyone agrees on is that maintaining a mission-focused research operation and a fast-growing business within the same organization has resulted in growing pains.

― Turning OpenAI Into a Real Business Is Tearing It Apart - The Wall Street Journal [Link]

OpenAI - a company with the highest churn rate and the highest valuation ($150B) I have ever seen. Something is not right, and making it right might cost a lot.

The Wisdom of the Bullfrog

Posted on 2024-08-11

The book “The Wisdom of the Bullfrog” written by Admiral William H. McRaven is recommended by my mentor Dylan. It contains 18 sayings or mottos used in the military, which inspire Admiral William and others throughout his 4 decades Navy SEAL career.

Some favorite quotes from the author:

Chapter Three - When in Command, Command

As a leader you must always appear to be in command, even on those days when you struggle with the pressures of the job. You must be confident. You must be decisive. You must smile. You must laugh. You must engage with your employees and be thankful for their work. You must have the look of a person in charge. You must instill in your men and women a sense of pride that their leader can handle any problem.

As a leader you can’t have a bad day. You must never look beaten, no matter the circumstance. If you sulk, if you hang your head, if you whine or complain about the leaders above you or the followers below you, then you will lose the respect of your men and women, and the attitude of despair will spread like wildfire.

Being a leader is an awesome responsibility. There are days when it can be frightening to know that the fate of the organization rests on your shoulders. But you must also realize that you were chosen to be the leader because you have proven yourself along the way. You have demonstrated that you know the business. You have shown that you can handle the pressures and be decisive. You have exhibited all the qualities necessary to lead. And even if none of the above holds true, now that you are the leader, you are in command. So, take the damn helm and command!

Chapter Five - The Only Easy Day Was Yesterday

The day you no longer believe you have something to prove, the day you no longer believe you must give it your all, the day you think you are entitled to special treatment, the day you think all your hard days are behind you, is the day you are no longer the right leader for the job.

Leadership requires energy. It requires stamina. It requires resilience. It requires everything you have and then some. The men and women that work for you will feed off your energy. If you look unprepared to deal with the challenges of the day, they will see this. If you look beaten down because today was harder than yesterday, they will feel this. If you are not prepared to give it your all, they will know this. And if you think this is just about leaders in combat, you’re mistaken. This is about every great leader who was given a difficult task and asked to inspire, motivate, and manage the people under their charge.

Chapter Six - Run to the Sound of the Guns

Good leaders understand that organizations are going to have challenges. That’s why you were hired to lead. Embrace the challenge. Accept the fact that you must attack each problem with vigor and that sometimes only you, the leader, can solve the most vexing of institutional crises. Never shy away. Never retreat from a difficult problem.

Chapter Eight - Who Dares Wins

It is better to err on the side of daring than the side of caution.
—Alvin Toffler, American writer and futurist

Chapter Ten - No Plan Survives First Contact with the Enemy

No plan of operations reaches with any certainty beyond the first encounter of the enemy’s main force.
In other words, always have a Plan B. A contingency plan. A backup plan. Because once you encounter the enemy, no plan survives first contact.

Chapter Fourteen - Expect What You Inspect

Truth is confirmed by inspection and delay; falsehood by haste and uncertainty.
—Tacitus, Roman historian

RAM Usage During Training

Key Components of Memory Usage

Memory Calculation Examples

Simple Strategies for Reducing Memory Usage

LoRA

Adapters

LoRA Basics

LoRA Usage

Implementation of LoRA - lora.Linear()

LLM Application Development Landscape

Chaining a Simple Prompt

Parsing the Output with a Customized Format

Data Ingestion to Pinecone Vectorstore (RAG)

Data Retrieval from Pinecone Vectorestore (RAG)

Chat with a PDF (RAG with FAISS)

Create a ReAct Agent

Using an LangChain Agent for Tasks

Creating an ReAct Agent with Multiple Agents Provided as Tools

Function / Tool Calling

Tool Calling with Langchain

Token Limitation Handling Strategies

Coreference Resolution

Tracing Application with LangSmith

LangChain Hub

LangChain Text Splitter Playground

Substack

Articles and Blogs

Papers and Reports

YouTube and Podcasts

News

Substack

Articles and Blogs

YouTube and Podcast

Papers and Reports

Articles and Blogs

GitHub

News

Level 5 Leadership

The Hedgehog Concept

Naive RAG

Advanced RAG

Overview

Pre-Retrieval Enhancements

Query Transformations / Translation

Query Construction

Advanced Retrieval Techniques

Post-Retrieval Enhancements

Reference

Substack

YouTube and Podcast

Articles and Blogs

Reports and Papers

Github and Docs

News

Five Things You Need to Know to Build a Career Plan

Five Things You Need to Do to Bring Your Career Plan to Life

Overcoming Adversities

Downloadable Exercises and Resources

Substack

YouTube and Podcasts

Articles and Blogs

Papers and Reports

GitHub

News

Top Books to Read

Chapter Three - When in Command, Command

Chapter Five - The Only Easy Day Was Yesterday

Chapter Six - Run to the Sound of the Guns

Chapter Eight - Who Dares Wins

Chapter Ten - No Plan Survives First Contact with the Enemy

Chapter Fourteen - Expect What You Inspect