이 글의 주제

챕터 4의 주제는 “GenAI Model Optimization for Domain-Specific Use Cases”입니다.

책에서 이야기하는 건 GPT, claude 등 알려진 AI 모델(Foundation model)은 모든 분야를 다루기 때문에, 특정 도메인 문제를 해결하려면 방법이 필요하다 것입니다. 방법들은 프롬프트 엔지니어링, knowledge 연결, 파인튜닝이 있습니다. 그리고 자연스럽게 LangChain으로 Tool을 실습하고 RAG, 파인튜닝을 다룹니다.

원본 예제: https://github.com/PacktPublishing/Kubernetes-for-Generative-AI-Solutions
수정한 예제: https://github.com/choisungwook/portfolio/tree/master/computer_science/ai/RAG

사전지식 - AI 모델을 잘 쓰기 위해 등장한 소프트웨어 - AI agent

이전 블로그에 설명한 것 처럼 AI모델을 잘 사용하기 위해 소프트웨어가 등장했습니다. claude code, codex CLI 등이 이 소프트웨어에 해당이 됩니다. 2025년 12월부터 사람들에게 입소문이난 하네스엔지니어링이 소프트웨어 기능에 해당이 됩니다. AI 소프트웨어를 오늘날 AI agent라고 부릅니다.
- AI agent 설명 이전 글: https://malwareanalysis.tistory.com/926
- codex CLI 동작를 설명하는 글 (추천): https://openai.com/index/unrolling-the-codex-agent-loop/

AI agent의 대표적인 기능은 AI 모델에게 전송할 프롬프트를 조립하는 것입니다. 사용자의 요청이 최대한 원하는 대답을 얻기 위해 AI agent는 사용자 프롬프트와 여러 context를 더해서 프롬프트를 조립합니다. 그리고 조립된 프롬프트로 AI model에 요청합니다.

그 이외에 context window, tools, agent loop 등이 있습니다.

Langchain 이란

AI agent를 처음 부터 만드는 것 입문자에게 매우 어렵습니다. 그래서 쉽게 구현하기 위한 여러 오픈소스들이 나왔는데 Langchain이 그 중 하나입니다.

langchain으로 AI agent를 만들려면 아래처럼 단 몇줄이면 됩니다. 아래는 openAI모델을 사용했습니다.

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-5.5")
agent = create_agent(
  model=llm,
  system_prompt="{시스템 프롬프트}"
)
result = agent.invoke({
  "messages": [
    {
      "role": "user",
      "content": f"{사용자 메세지를 포함한 조립된 메세지}",
    }
  ]
})

AI 모델 선택

이 예제에서 사용하는 AI 모델은 openAI gpt-5-nano를 사용했습니다.

예제 1 - tools

이론

AI 모델이 내 컴퓨터의 파일을 조회하거나 파일을 쓰는 작업은 어떻게 할까요? 바로 조회하거나 파일을 쓰는 기능을 AI 모델에게 재공하는 겁니다. 이 기능을 tools라고 부릅니다. tools는 AI agent를 실행하는 pc에서 실행됩니다.

가령 서울날씨는 조회하는 기능을 AI모델에게 제공하면, AI모델은 실시간 서울날씨를 조회할 수 있습니다.

@tool
def get_seoul_weather() -> dict | str:
    """서울의 현재 날씨를 반환한다. 기온(섭씨), 풍속, 날씨 코드를 포함."""
    url = "https://api.open-meteo.com/v1/forecast"
    params = {"latitude": 37.5665, "longitude": 126.9780, "current_weather": "true"}
    try:
        r = httpx.get(url, params=params, timeout=10)
        r.raise_for_status()
        return r.json()["current_weather"]
    except (httpx.HTTPError, KeyError, ValueError) as e:
        return f"ERROR: open-meteo 호출 실패 — {type(e).__name__}: {e}"


llm = ChatOpenAI(model=chat_model)
agent = create_agent(
  model=llm,
  tools=[get_seoul_weather],
  system_prompt="Use tools for deterministic calculations.",
  debug=True,
)

실시간 서울 날씨는 AI agent가 실행하여 AI 모델에게 전달합니다.

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model=chat_model)
agent = create_agent(
  model=llm,
  system_prompt="Use tools for deterministic calculations.",
  debug=True,
)

실습

과일이 몇개인지 세는 문제를 AI에게 시키는게 예제 1번입니다. 단, tools를 사용하여 과일이 몇개인지 세야합니다.

fruit_list = ["Apple", "Banana", "Apple", "Peaches"]

count_items라는 tool을 생성합니다. 이 tool은 파이썬 collection 패키지를 사용해서 key의 value가 몇개인지 셉니다.

from collections import Counter

from langchain_core.tools import tool

@tool
def count_items(items: list[str]) -> dict[str, int]:
  """Count duplicate strings in a list."""
  return dict(Counter(items))

langchain으로 AI agent를 생성할때 tool을 설정합니다.

from langchain.agents import create_agent
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model='gpt-5-nano')
agent = create_agent(
  model=llm,
  tools=[count_items],
  system_prompt="Use tools for deterministic calculations.",
  debug=True,
)

그리고 invoke함수를 실행하여 AI모델에게 과일을 세라고 요청합니다.

result = agent.invoke({
  "messages": [
    {
      "role": "user",
      "content": f"Count each fruit in this list: {fruit_list}",
    }
  ]
})

result["messages"][-1].content

결과를 출력하면 Apple:2, Banana:1, Peaches:1로 세었다고 보입니다. 그리고 debug로그로 tools을 호출하는 것을 확인할 수 있습니다.

LANGSMITH를 사용하면 AI모델 호출과정을 트레이싱 할 수 있습니다.

예제 2 - RAG

이론

RAG(Retrieval-Augmented Generation)는 검색증강생성으로서, 학습된 데이터 이외에 새로운 데이터를 참조하도록 하는 프로세스입니다. 참조하는 새로운 데이터를 knowledge source라고 부르며, 보통 vector DB에서 데이터를 관리합니다. 외부 데이터를 참조하는 행위를 retrieval(검색)이라고 합니다.

외부 데이터를 Vector DB에 저장하려면, RDMS처럼 데이터를 그대로 바로 저장할 수 없고 임베딩이라는 과정을 거쳐야합니다. 사람이 쓰는 자연어는 컴퓨터가 다룰 없는 자료형이기 때문에 숫자배열로 변환해야 하는데 이 변환 과정을 임베딩이라고 합니다. 임베딩과정에서 임베딩 언어 AI모델이 사용됩니다. 또한, 임베딩할 데이터 크기가 크면 글자를 분리하는 과정이 필요한데, 이 과정을 chunk라고 합니다.

AI agent에서는 RAG를 사용하면, 프롬프트를 조립할때 retrieval 결과를 추가합니다.

retrieval result(Enhanced context) + system prompt + user prompt

AWS https://aws.amazon.com/ko/what-is/retrieval-augmented-generation/

실습

사용자에 맞는 셔츠를 추천해주는 시스템을 실습합니다.

AI agent는 RAG를 사용해서 사용자 정보에 맞는 셔츠제품정보를 AI모델에게 전달합니다.

[1. 데이터 다운로드]
셔츠 상품 데이터는 kaggle에서 csv파일로 다운로드 받을 수 있습니다.
- kaggle 주소: https://www.kaggle.com/datasets/shivamb/fashion-clothing-products-catalog

[2. Vector DB에 데이터 저장]
다운로드 받은 셔츠 상품 데이터는 임베딩 과정을 거쳐 Vector DB에 저장합니다. Vector DB는 FAISS를 사용합니다.

임베딩 모델은 openAI text-embedding-3-small을 사용합니다. 그리고 차원은 1536으로 설정합니다.

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
  model='text-embedding-3-small',
  dimensions=1536,
)

임베딩 결과가 궁금하면 embed_query로 테스트해보세요. 아래 예제는 Apple단어를 임베딩한 결과입니다.

sample_vector = embeddings.embed_query("Apple")
print(f"embedding dimensions: {len(sample_vector)}")
print(f"embedding preview: {sample_vector[:10]}")
print("embedding vector:")
print(sample_vector)

임베딩은 vector DB에 데이터를 넣을 때 수행합니다. 이 예제에서는 vector DB로 FAISS를 사용합니다. vector DB는 데이터를 저장할 때 인덱싱과정도 같이 수행합니다. 인덱싱 또한 vector DB마다 구현이 다릅니다.

from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import CSVLoader

# csv 파일 로드
loader = CSVLoader(file_path=str(data_path), encoding="utf-8")
documents = loader.load()

vector_store = FAISS.from_documents(documents, embeddings)
vector_store.index.ntotal

[3. Retrieval 예시]
사용자가 요청한 제품이 vector DB에 있는지 조회합니다. Vector DB의 데이터 검색(Retrieval) 품질은 Vector DB 알고리즘에 따라 다릅니다. 이 예제에서는 FAISS를 vector DB로 사용하니 FAISS 알고리즘이 품질을 좌지우지 합니다.

검색 결과는 유사도 입니다. 즉, 확률입니다. 아래 예제는 유사도가 높은 것의 k개를 가져옵니다.

query = (
  "Shirts which are good for Men, have regular fit and not the slim fit, "
  "can be used for a formal occasion, and have a color of either Blue or White"
)

results = vector_store.similarity_search_with_score(query, k=top_k)

[4. 프롬프트 조립]
Vector DB에 가져온 데이터를 AI모델에게 제공하기 위해 프롬프트를 조립합니다. 아래 예제에서는 vector DB결과를 Product context에 넣습니다. 그리고 사용자의 요청은 Question에 넣습니다.

from IPython.display import Markdown, display
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model=chat_model)

results = vector_store.similarity_search_with_score(query, k=top_k)
context = "".join(
  f"[Document {rank}]\n{doc.page_content}\n"
  for rank, (doc, _score) in enumerate(results, start=1)
)

prompt = f"""
You are a product recommendation assistant.
Use only the product context below. If the context is insufficient, say so.

Product context:
{context}

Question:
{query}

Respond in Korean. First provide a compact markdown table, then recommend one product with a reason.
""".strip()

response = llm.invoke(prompt)
display(Markdown(response.content))

AI모델의 답변은 아래와 같습니다.

langsmith로 AI모델을 추적하면, tools하고 다르게 AI모델을 한번 호출합니다.

3번째 예제: 파인튜닝

이론

파인튜닝은 기존의 모델을 다시 학습하는 방법입니다. 학습을 하는 것이기 때문에 시간이 오래걸리고 GPU 자원도 필요합니다.

시간이 지나면서 시간와 자원을 최소화하는 방법이 계속 연구되고 있는데, 이 책은 Lora를 사용하여 파인튜닝 시간과 자원을 줄입니다. 저는 mlx.lora를 사용했습니다.

실습

[1. 모델 다운로드]
저는 Qwen2.5-0.5B-Instruct 모델을 다운로드 받았습니다. lm studio에서 다운로드 받았습니다.

다운로드한 모델은 $HOME/.lmstudio/models에 있습니다.

~/.lmstudio/models/lmstudio-community/Qwen2.5-0.5B-Instruct-MLX-4bit

[2. 파인튜닝 전 결과 확인]

저는 subprocess로 mlx를 테스트했습니다.

# mlx 실행하는 함수
def run_command(command):
  result = subprocess.run(
    command,
    cwd=project_root,
    text=True,
    capture_output=True,
  )
  if result.stdout:
    print(result.stdout)
  if result.returncode != 0:
    if result.stderr:
      print(result.stderr)
    raise RuntimeError(f"command failed: {' '.join(command)}")
  return result.stdout.strip()
  
base_model_id='~/.lmstudio/models/lmstudio-community/Qwen2.5-0.5B-Instruct-MLX-4bit'
test_prmopt='[MyElite Loyalty Program FAQ]: What is the cost of the MyElite Loyalty Program?
'
before_output = run_command([
  "uv",
  "run",
  "--extra",
  "mlx",
  "mlx_lm.generate",
  "--model",
  base_model_id,
  "--prompt",
  test_prompt,
  "--max-tokens",
  "120",
])

Qwen모델은 학습된 데이터가 없기 때문에 추상적으로 대답합니다.

[3. 파인튜닝 학습 데이터셋 준비]
저는 codex CLI로 학습데이터셋을 준비했습니다.

train_examples = [
    {
      "prompt": "[MyElite Loyalty Program FAQ]: What is the cost of the MyElite Loyalty Program?",
      "completion": "The MyElite Loyalty Program costs 99 USD per year. This fee is non-refundable, even if you cancel the membership.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: Can I cancel my MyElite Loyalty Program membership?",
      "completion": "Yes, you can cancel your membership at any time. You will continue to receive benefits until the end of your current subscription period.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: Is the MyElite annual fee refundable?",
      "completion": "No. The 99 USD annual fee is non-refundable, including when the membership is canceled before the subscription period ends.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: How long do benefits continue after cancellation?",
      "completion": "Benefits continue until the end of the paid subscription period, even after cancellation.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: What is the maximum cashback I can earn?",
      "completion": "MyElite members can earn up to 2% cashback on eligible purchases.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: Does MyElite cashback apply to every purchase?",
      "completion": "No. Cashback applies only to eligible purchases under the MyElite Loyalty Program terms.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: Can I transfer my MyElite membership to another person?",
      "completion": "No. MyElite membership is tied to the account holder and cannot be transferred to another person.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: When does a MyElite subscription renew?",
      "completion": "A MyElite subscription renews annually unless the member cancels before the renewal date.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: What happens if I cancel before renewal?",
      "completion": "If you cancel before renewal, you will not be charged for the next year and your current benefits remain active until the subscription period ends.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: Do I lose benefits immediately after cancellation?",
      "completion": "No. You keep MyElite benefits until the end of the subscription period that has already been paid for.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: Is MyElite billed monthly or yearly?",
      "completion": "MyElite is billed yearly at 99 USD per year.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: What should I say if the FAQ does not contain the answer?",
      "completion": "If the MyElite FAQ does not contain the answer, say that the information is not available in the FAQ instead of guessing.",
    },
  ]

  valid_examples = [
    {
      "prompt": "[MyElite Loyalty Program FAQ]: How much is MyElite each year?",
      "completion": "MyElite costs 99 USD per year.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: If I cancel today, do I get my money back?",
      "completion": "No. The annual fee is non-refundable, but benefits continue until the end of the subscription period.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: What cashback rate can members receive?",
      "completion": "Members can earn up to 2% cashback on eligible purchases.",
    },
  ]

  test_examples = [
    {
      "prompt": "[MyElite Loyalty Program FAQ]: What is the cost of the MyElite Loyalty Program?",
      "completion": "The MyElite Loyalty Program costs 99 USD per year. The fee is non-refundable.",
    },
    {
      "prompt": "[MyElite Loyalty Program FAQ]: Can I cancel and still use benefits?",
      "completion": "Yes. After cancellation, benefits remain available until the current subscription period ends.",
    },
  ]

[4. 파인튜닝]
mlx_lm.lora를 사용하여 파인튜닝을 했습니다. 파인튜닝은 15초정도 걸렸습니다.

train_command = [
  "uv",
  "run",
  "--extra",
  "mlx",
  "mlx_lm.lora",
  "--model",
  base_model_id,
  "--train",
  "--data",
  str(data_dir),
  "--adapter-path",
  str(adapter_dir),
  "--iters",
  str(max_iters),
  "--batch-size",
  str(batch_size),
  "--num-layers",
  str(num_layers),
  "--mask-prompt",
]

if report_to != "none":
  print("wandb_project is enabled")
  train_command.extend(["--report-to", report_to, "--project-name", wandb_project])

train_output = run_command(train_command)

wandb가 설정되어 있다면 wandb 콘솔에서 학습을 추적할 수 있습니다. 학습 데이터에 대한 오차(train_loss), 검증 데이터에 대한 오차(val_loss)가 학습 횟수가 증가할 수록 줄어들었습니다.

[5. 파인튜닝된 모델로 다시 질문]
파인튜닝한 모델을 사용하여 다시 질문하면, 원하는 99 USE per year 응답을 얻습니다.

저작자표시 비영리 변경금지 (새창열림)

'전공영역 공부 기록' 카테고리의 다른 글

[스터디] Generative AI on Kubernetes 5장: 실험 환경부터 모델 배포까지(upyterHub, LoRA, RAG) (0)	2026.05.24
스터디-로컬 RAG를 AWS로 마이그레이션 - Bedrock과 S3Vectors (0)	2026.05.18
Kubernetes v1.36 업그레이드 전에 확인할 운영 영향과 핸즈온 (1)	2026.05.10
스터디 챕터 8 정리 - LLM As Judge, AI Agent (0)	2026.05.05
스터디 챕터 7 정리 - 프러덕션을 위해, Streamlit을 FastAPI로 마이그레이션 (0)	2026.05.04

최신글

GenerativeAiOnKubernetes 스터디 - 챕터 4장 RAG, Lora 파인튜닝

이 글의 주제

사전지식 - AI 모델을 잘 쓰기 위해 등장한 소프트웨어 - AI agent

Langchain 이란

AI 모델 선택

예제 1 - tools

이론

실습

예제 2 - RAG

이론

실습

3번째 예제: 파인튜닝

이론

실습

'전공영역 공부 기록' 카테고리의 다른 글

티스토리툴바

최신글

이 글의 주제

사전지식 - AI 모델을 잘 쓰기 위해 등장한 소프트웨어 - AI agent

Langchain 이란

AI 모델 선택

예제 1 - tools

이론

실습

예제 2 - RAG

이론

실습

3번째 예제: 파인튜닝

이론

실습

'전공영역 공부 기록' 카테고리의 다른 글

티스토리툴바

예제 1 - tools