⚡ Data‑First vs Heavy AI Models
Real‑Time Use Case in Healthcare Queries
π In many scenarios, a data‑first approach can outperform heavy AI models. Instead of relying on large language models for every query, structured datasets can deliver faster, more accurate results — especially when the problem is deterministic.
π Flow Diagram
User Input → Text Processing → Dataset Search → Result
⚡ Approach
π Load dataset (JSON format)
π Match question with disease name
π Return medicine instantly
π€ Fallback logic (LLM only if dataset fails)
✅ Key Improvements
⏱️ Reduced response time (minutes → seconds)
π― Improved accuracy (no guessing, direct mapping)
π ️ Error handling for invalid inputs
⚡ Removed dependency on heavy AI models
π Example
Input: ulcer Output: Sucralfate
π₯️ Code Walkthrough
import json, os, torch
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load lightweight model (only for fallback)
model_name = "distilgpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Load dataset
data = []
file_path = os.path.join(os.path.dirname(__file__), "medical_dataset_100.jsonl")
with open(file_path, "r") as f:
for line in f:
data.append(json.loads(line))
# Query loop
while True:
question = input("\nAsk your question (or type 'exit'): ")
if not question:
print("Please enter a valid question")
continue
if question.lower() == "exit":
break
question_lower = question.lower()
found = False
for item in data:
disease = item.get("disease", "").lower()
medicine = item.get("medicine", "")
if disease in question_lower:
print("\nAnswer:", ", ".join(medicine) if isinstance(medicine, list) else medicine)
found = True
break
if not found:
print("Not found in dataset → using LLM...")
prompt = f"""Rules:
- Give only medicine name
- No explanation
- If not found say: No data available
Question: {question}
Answer:"""
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=15, do_sample=False)
answer = tokenizer.decode(output[0], skip_special_tokens=True)
print("\nAnswer:", answer.split("Answer:")[-1].strip())
π Why This Matters
This approach shows how data engineering and automation can complement AI. Instead of overloading every query with generative models, structured datasets can deliver instant, reliable answers — saving compute, cost, and time.
π Hashtags
#AI #GenAI #Python #MachineLearning #RAG #ArtificialIntelligence #Innovation #Learning #DataEngineering #Automation #TechLearning
.png)
Comments
Post a Comment