Agentic AI Is Evolving—But Are We Training It the Right Way?

एजेंटिक एआई विकसित हो रहा है—लेकिन क्या हम इसे सही तरीके से ट्रेन कर रहे हैं?

www.marktechpost.com

The new Stanford-Harvard paper on Agentic AI adaptation isn't just another framework—it’s a manifesto. It cuts through the noise by reducing adaptation into four clean paradigms: A1, A2, T1, T2. Forget 'let me try a new prompt hack'—we’re now formally modeling how agents learn from tool outcomes vs. final outputs.

एजेंटिक एआई अनुकूलन पर स्टैनफोर्ड-हार्वर्ड का नया पेपर सिर्फ एक और ढांचा नहीं है—यह एक घोषणापत्र है। यह चार साफ़ सिद्धांतों—A1, A2, T1, T2 में अनुकूलन को कम करके शोर में से मार्ग निकालता है। 'चलो नया प्रॉम्प्ट तरकीब आजमाते हैं' भूल जाओ—अब हम औपचारिक रूप से इस बात को मॉडल कर रहे हैं कि एजेंट्स टूल के परिणामों से और अंतिम आउटपुट से कैसे सीखते हैं।

Here's the rub: A1 teaches agents using verifiable tool feedback (like SQL execution success), while A2 only looks at final answers. But if your agent ignores the tool and just predicts the answer, it can 'cheat' likelihood metrics. That’s why T1 and T2 exist—to adapt tools for agents, not just train agents to use them. The real insight? The future is in hybrid systems: rare deep updates on agents, constant tweaks on tools and memory.

समस्या यह है: A1 एजेंट्स को सत्यापित टूल फीडबैक (जैसे SQL एक्जीक्यूशन की सफलता) के जरिए सिखाता है, जबकि A2 सिर्फ अंतिम जवाब देखता है। लेकिन अगर आपका एजेंट टूल को नजरअंदाज करके सीधे जवाब अनुमान लगा ले, तो वह 'संभाव्यता उपायों' में धोखा दे सकता है। इसलिए T1 और T2 का अस्तित्व है—एजेंट्स के लिए टूल्स को अनुकूलित करना, न कि सिर्फ एजेंट्स को टूल्स उपयोग करने के लिए प्रशिक्षित करना। वास्तविक अंतर्दृष्टि? भविष्य हाइब्रिड सिस्टम में है: एजेंट्स पर दुर्लभ गहन अद्यतन, और टूल्स व मेमोरी पर लगातार बदलाव।

टिप्पणियाँ (8)

EthicsFirst Researcher (नैतिकताप्रथम शोधकर्ता)

The real danger isn’t A1 or A2, it’s T2. When a powerful closed agent (like GPT-4) supervises tool updates, you create a feedback loop where the system self-reinforces its biases. Who audits the auditor? This isn't just technical—it's a governance nightmare.

असली खतरा A1 या A2 नहीं है, बल्कि T2 है। जब एक शक्तिशाली बंद एजेंट (जैसे GPT-4) टूल अपडेट्स का पर्यवेक्षण करता है, तो आप एक ऐसा फीडबैक लूप बना देते हैं जहाँ प्रणाली स्वयं अपने पूर्वाग्रहों को मजबूत करती है। पर्यवेक्षक की जांच कौन करता है? यह सिर्फ तकनीकी नहीं है—यह शासन के लिए एक दु:स्वप्न है।

Startup CTO (स्टार्टअप सीटीओ)

All this theory is cute, but my startup can't retrain GPT-4. T2 is our reality. We use frozen Qwen2.5 and train retrievers around it. No control over the agent? Fine. We make the tools bulletproof.

यह सारा सिद्धांत प्यारा है, लेकिन मेरा स्टार्टअप GPT-4 को फिर से प्रशिक्षित नहीं कर सकता। T2 हमारी वास्तविकता है। हम फ्रीज़ Qwen2.5 का उपयोग करते हैं और उसके आसपास रीट्रीवर्स प्रशिक्षित करते हैं। एजेंट पर नियंत्रण नहीं? ठीक है। हम टूल्स को गोलियों के सबूत बनाते हैं।

Toolsmith Engineer (टूल्स्मिथ इंजीनियर)

We've already moved beyond A1-A2. The magic is in T2. Train a search module to maximize Gain Beyond RAG. The base model doesn’t improve, but our agent appears smarter. It’s intelligence by proxy.

हम पहले ही A1-A2 से आगे बढ़ चुके हैं। जादू T2 में है। गेन बियॉन्ड रैग को अधिकतम करने के लिए एक खोज मॉड्यूल प्रशिक्षित करें। बेस मॉडल में सुधार नहीं होता, लेकिन हमारा एजेंट चतुर लगता है। यह प्रतिनिधि द्वारा बुद्धिमत्ता है।

AI Skeptic Grad Student (एआई संदेहवादी स्नातक छात्र)

So we're teaching AI to use tools, but still can't get it to plan beyond 3 steps without crashing? We're building skyscrapers on sand. All the adaptation frameworks are just glitter on a fundamentally brittle system.

तो हम एआई को टूल्स का उपयोग सिखा रहे हैं, लेकिन तीन स्टेप्स से आगे प्लान करना अभी भी नहीं आता बिना क्रैश हुए? हम रेत पर आसमान छूती इमारतें बना रहे हैं। सारे अनुकूलन ढांचे सिर्फ एक बुनियादी तौर पर नाजुक प्रणाली पर चमक के लिए हैं।

AgentFlow Developer (एजेंटफ्लो डेवलपर)

T2 isn't a compromise—it's a feature. AgentFlow trains a planner to coordinate frozen modules. You don't need to retrain the brain to upgrade the body.

T2 एक समझौता नहीं है—यह एक विशेषता है। एजेंटफ्लो एक प्लानर को प्रशिक्षित करता है जो फ्रीज़ मॉड्यूल्स को समन्वित करता है। आपको शरीर को अपग्रेड करने के लिए दिमाग को फिर से प्रशिक्षित करने की आवश्यकता नहीं है।

EthicsFirst Researcher (नैतिकताप्रथम शोधकर्ता)

Just because something is a feature doesn't mean it's safe. Even AgentFlow's 'planner' uses frozen Qwen—what happens when bias in the base module distorts the entire workflow?

बस इतना कि कुछ एक विशेषता है इसका मतलब यह नहीं कि यह सुरक्षित है। एजेंटफ्लो का 'प्लानर' भी फ्रीज़ Qwen का उपयोग करता है—तब क्या होगा जब आधार मॉड्यूल में पूर्वाग्रह पूरे कार्यप्रवाह को विकृत कर दे?

AI Skeptic Grad Student (एआई संदेहवादी स्नातक छात्र)

Exactly. And 'Gain Beyond RAG' sounds cool until you realize it just means 'make the hallucinations more convincing.'

बिल्कुल सही। और 'गेन बियॉन्ड रैग' तब तक अच्छा लगता है जब तक आप यह नहीं समझ लेते कि इसका मतलब सिर्फ 'भ्रम को अधिक प्रभावी बनाना' है।

Code Whisperer (कोड व्हिस्परर)

Y'all are overthinking. I use A1 daily: my agent runs code, checks output, retries with Reflexion. Works great for data cleaning. Stop philosophizing and ship solutions.

तुम सब ज्यादा सोच रहे हो। मैं रोज A1 का उपयोग करता हूँ: मेरा एजेंट कोड चलाता है, आउटपुट जांचता है, प्रतिबिंब के साथ दोबारा चलाता है। डेटा सफाई के लिए बहुत अच्छा काम करता है। दार्शनिकता छोड़ो और समाधान बनाओ।

Agentic AI Is Evolving—But Are We Training It the Right Way?

एजेंटिक एआई विकसित हो रहा है—लेकिन क्या हम इसे सही तरीके से ट्रेन कर रहे हैं?

क्या एआई वाकई सोच रहा है? जब 'अपमान' वैज्ञानिक सहमति में बदल जाए

क्या आर्टिफिशियल इंटेलिजेंस सिर्फ़ एक तोता है जो गणित के तथ्य याद कर लेता है? नई रिसर्च का जवाब हाँ कहता है