⚡ Data‑First vs Heavy AI Models

Real‑Time Use Case in Healthcare Queries

📌 In many scenarios, a data‑first approach can outperform heavy AI models. Instead of relying on large language models for every query, structured datasets can deliver faster, more accurate results — especially when the problem is deterministic.

🔄 Flow Diagram

User Input → Text Processing → Dataset Search → Result

⚡ Approach

📂 Load dataset (JSON format)
🔎 Match question with disease name
💊 Return medicine instantly
🤖 Fallback logic (LLM only if dataset fails)

✅ Key Improvements

⏱️ Reduced response time (minutes → seconds)
🎯 Improved accuracy (no guessing, direct mapping)
🛠️ Error handling for invalid inputs
⚡ Removed dependency on heavy AI models

📊 Example

Input: ulcer Output: Sucralfate

🖥️ Code Walkthrough

python

import json, os, torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load lightweight model (only for fallback)
model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Load dataset
data = []
file_path = os.path.join(os.path.dirname(__file__), "medical_dataset_100.jsonl")
with open(file_path, "r") as f:
    for line in f:
        data.append(json.loads(line))

# Query loop
while True:
    question = input("\nAsk your question (or type 'exit'): ")
    if not question:
        print("Please enter a valid question")
        continue
    if question.lower() == "exit":
        break

    question_lower = question.lower()
    found = False

    for item in data:
        disease = item.get("disease", "").lower()
        medicine = item.get("medicine", "")
        if disease in question_lower:
            print("\nAnswer:", ", ".join(medicine) if isinstance(medicine, list) else medicine)
            found = True
            break

    if not found:
        print("Not found in dataset → using LLM...")
        prompt = f"""Rules:
- Give only medicine name
- No explanation
- If not found say: No data available

Question: {question}

Answer:"""
        inputs = tokenizer(prompt, return_tensors="pt")
        with torch.no_grad():
            output = model.generate(**inputs, max_new_tokens=15, do_sample=False)
        answer = tokenizer.decode(output[0], skip_special_tokens=True)
        print("\nAnswer:", answer.split("Answer:")[-1].strip())

🚀 Why This Matters

This approach shows how data engineering and automation can complement AI. Instead of overloading every query with generative models, structured datasets can deliver instant, reliable answers — saving compute, cost, and time.

🔖 Hashtags

#AI #GenAI #Python #MachineLearning #RAG #ArtificialIntelligence #Innovation #Learning #DataEngineering #Automation #TechLearning

AimToSKY

Search This Blog

🚀Simple Medical LLM - AI System (No Paid Model, No Ollama)