Home / AI Arena / Building AI Agents / LLM Clients and Chains

LLM Clients and Chains

5 min read ai-arena LangChain Python

This is part of the AI Agents series. All code is at github.com/achintmehta/langchain.

ChatOpenAI vs OpenAI

LangChain gives you two client classes for talking to an OpenAI-compatible API. Understanding the difference is important because they call different endpoints and return different types.

Class	Endpoint	Input	Output
`ChatOpenAI`	`/v1/chat/completions`	List of `Message` objects	`AIMessage`
`OpenAI`	`/v1/completions`	Plain string	Plain string

ChatOpenAI is what you should use for almost everything. All modern models, GPT-4, Claude, Llama, Mistral, are chat models. They expect a conversation structured as a list of messages (system, human, assistant), and that structure is how you give them a persona, inject context, and maintain history.

OpenAI (the completion client) is for older or specialised models that take a raw string and continue it. You will rarely need this in new code.

The side-by-side comparison is in openai_vs_chatopenai_example.py. Here is what ChatOpenAI looks like in practice:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

llm = ChatOpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed",
    model="llama"
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="What is the capital of France?")
]

response = llm.invoke(messages)
print(response.content)   # "Paris"
print(type(response))     # <class 'langchain_core.messages.ai.AIMessage'>

The SystemMessage sets the model's behaviour for the whole conversation. The HumanMessage is the user's turn. When you send multiple messages, the model sees the full conversation history and can refer back to earlier turns.

LCEL, LangChain Expression Language

Once you have an LLM client, the next thing you will want to do is build chains, sequences of steps where the output of one step becomes the input of the next. The most common pattern is: format a prompt, send it to the model, parse the output.

LCEL lets you express this with the | pipe operator. It is similar to Unix pipes: data flows left to right, each component transforms it and passes it on.

from langchain_core.prompts import ChatPromptTemplate

template = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("human", "What is the capital of {country}?")
])

chain = template | llm
response = chain.invoke({"country": "France"})
print(response.content)

ChatPromptTemplate is a reusable prompt with placeholders, {country} here. When you call chain.invoke({"country": "France"}), LangChain fills in the placeholder, formats the messages, sends them to the LLM, and returns the response. You never have to manually build the message list.

You can keep extending the chain. If you want the output as a plain Python string rather than an AIMessage object, pipe the result through StrOutputParser:

from langchain_core.output_parsers import StrOutputParser

chain = template | llm | StrOutputParser()
result = chain.invoke({"country": "France"})
print(result)        # "Paris", plain string, not an AIMessage
print(type(result))  # <class 'str'>

This composability is the core insight of LCEL. Any LangChain component that accepts and returns compatible types can be snapped into a chain. You will find these patterns throughout the rest of this series.

The full working example is inside llama_server_client.py in the lcel_example() function.

Invoke, Batch, and Stream

Any LangChain runnable, a prompt, a chain, an LLM, an output parser, supports three invocation modes. You will use all three in real applications.

Invoke

invoke sends a single input and blocks until the full response is ready. This is what you want for simple, synchronous calls:

response = llm.invoke([HumanMessage(content="Hello")])
print(response.content)

Batch

batch sends multiple inputs in one call and runs them in parallel. The return value is a list of responses in the same order as the inputs:

responses = llm.batch([
    [HumanMessage(content="What is 2 + 2?")],
    [HumanMessage(content="Name a planet.")],
    [HumanMessage(content="Who wrote Hamlet?")]
])

for r in responses:
    print(r.content)

Batch is useful when you have a set of independent questions to answer, for example, generating summaries for a batch of documents. Under the hood LangChain uses a thread pool, so the calls happen concurrently rather than sequentially.

Stream

stream returns a generator that yields response chunks as soon as they are available, token by token. This is what you use to display a live "typewriter" effect in a UI rather than making the user wait for the full response:

for chunk in llm.stream([HumanMessage(content="Tell me a short story.")]):
    print(chunk.content, end="", flush=True)
print()  # newline after the stream ends

Each chunk is a partial AIMessage. Its .content attribute holds whatever tokens have arrived so far in this chunk. By printing without a newline and flushing stdout immediately, the output appears progressively in the terminal.

All three patterns are demonstrated in llama_server_client.py. The async equivalents (ainvoke, abatch, astream) work exactly the same way but are awaitable, which is what you need in a FastAPI or async web server context.

What's next

Now that you have a working chain, the next challenge is feeding it documents that are too large to fit in a single prompt. The next part covers chunking strategies, how to break documents into pieces that are small enough to embed and retrieve efficiently.