πŸš€Simple Medical LLM - AI System (No Paid Model, No Ollama)

⚡ Data‑First vs Heavy AI Models

Real‑Time Use Case in Healthcare Queries

πŸ“Œ In many scenarios, a data‑first approach can outperform heavy AI models. Instead of relying on large language models for every query, structured datasets can deliver faster, more accurate results — especially when the problem is deterministic.




πŸ”„ Flow Diagram

User Input → Text Processing → Dataset Search → Result

⚡ Approach

  • πŸ“‚ Load dataset (JSON format)

  • πŸ”Ž Match question with disease name

  • πŸ’Š Return medicine instantly

  • πŸ€– Fallback logic (LLM only if dataset fails)

✅ Key Improvements

  • ⏱️ Reduced response time (minutes → seconds)

  • 🎯 Improved accuracy (no guessing, direct mapping)

  • πŸ› ️ Error handling for invalid inputs

  • Removed dependency on heavy AI models

πŸ“Š Example

Input: ulcer Output: Sucralfate

πŸ–₯️ Code Walkthrough

python
import json, os, torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load lightweight model (only for fallback)
model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Load dataset
data = []
file_path = os.path.join(os.path.dirname(__file__), "medical_dataset_100.jsonl")
with open(file_path, "r") as f:
    for line in f:
        data.append(json.loads(line))

# Query loop
while True:
    question = input("\nAsk your question (or type 'exit'): ")
    if not question:
        print("Please enter a valid question")
        continue
    if question.lower() == "exit":
        break

    question_lower = question.lower()
    found = False

    for item in data:
        disease = item.get("disease", "").lower()
        medicine = item.get("medicine", "")
        if disease in question_lower:
            print("\nAnswer:", ", ".join(medicine) if isinstance(medicine, list) else medicine)
            found = True
            break

    if not found:
        print("Not found in dataset → using LLM...")
        prompt = f"""Rules:
- Give only medicine name
- No explanation
- If not found say: No data available

Question: {question}

Answer:"""
        inputs = tokenizer(prompt, return_tensors="pt")
        with torch.no_grad():
            output = model.generate(**inputs, max_new_tokens=15, do_sample=False)
        answer = tokenizer.decode(output[0], skip_special_tokens=True)
        print("\nAnswer:", answer.split("Answer:")[-1].strip())

πŸš€ Why This Matters

This approach shows how data engineering and automation can complement AI. Instead of overloading every query with generative models, structured datasets can deliver instant, reliable answers — saving compute, cost, and time.

πŸ”– Hashtags


#AI #GenAI #Python #MachineLearning #RAG #ArtificialIntelligence #Innovation #Learning #DataEngineering #Automation #TechLearning


Comments