Jonathan Bennett

Local AI: Streaming AI Responses

We are close to having a basic, working local AI system. The last technically issue to address is streaming the response. To stream a response, we need to enable server_sent_events on our client:

# app/models/conversation.rb
def ollama_client
	@client ||= Ollama.new(
		credentials: { address: "http://localhost:11434" },
		options: { server_sent_events: true }
	)
end

Next we will want to break the thread updating into creation of the new message, and updating the message:

class Conversation < ApplicationRecord
	def send_thread
		# collect the preceding messages
		messages = self.messages.order(:created_at).map { {
		  role: _1.role,
		  content: _1.message
		} }
		
		# create the blank initial response
		response = self.messages.create({
		  role: "assistant",
		  message: ""
		})
		
		ollama_client.chat({
		  model: "llama3",
		  stream: true,
		  messages: messages
		}) do |event, raw|
		  response.message += event["message"]["content"]
		  response.save
		end
	end
end

Notably, we enable the stream: true option and provide a block to collect the responses.

Because we are saving after piece of the message is received, this will trigger an update to the message, which is broadcast to the page.


If you are looking to have a conversation to see if sprinkling some AI love onto your application makes sense, book free a call.