AI Tech Observer

AI Tech Observer https://ifnodoraemon.github.io/ Focusing on AI foundation models and tech insights en Wed, 22 Apr 2026 06:27:45 GMT Reject Benchmark Hacking: How to Build an LLM Evaluation System for Your Business (LLM-as-a-Judge) https://ifnodoraemon.github.io/en/articles/llm-evaluation-guide/ https://ifnodoraemon.github.io/en/articles/llm-evaluation-guide/ Wed, 22 Apr 2026 00:00:00 GMT Cease the obsession with writing more code; shift focus to deep evaluation thinking. We deconstruct LLM-as-a-Judge biases, the mathematics behind metrics, and reshaping CI/CD defenses for probabilistic systems. LLM Quantization Hands-On Guide: Four Routes from Zero to Production https://ifnodoraemon.github.io/en/articles/quantization-hands-on-guide/ https://ifnodoraemon.github.io/en/articles/quantization-hands-on-guide/ Wed, 22 Apr 2026 00:00:00 GMT Stop theorizing, start quantizing. From downloading pre-quantized models, to hands-on weight compression with AWQ/GPTQ/GGUF, to vLLM FP8 zero-calibration production deployment and QLoRA fine-tuning—four routes, each with complete copy-paste code. The Critical Crossroads in AI History: Why Was *That One* Chosen Every Time? https://ifnodoraemon.github.io/en/articles/ai-history-choices/ https://ifnodoraemon.github.io/en/articles/ai-history-choices/ Wed, 15 Apr 2026 00:00:00 GMT A retrospective of six pivotal technology crossroads in AI's seventy-year history, dissecting the compute constraints, data dividends, and scalability logic behind each historical choice. vLLM Online Inference in Production: From Architecture to Token Billing https://ifnodoraemon.github.io/en/articles/vllm-serving-guide/ https://ifnodoraemon.github.io/en/articles/vllm-serving-guide/ Tue, 14 Apr 2026 00:00:00 GMT A deep dive into vLLM's core architecture (PagedAttention, continuous batching, APC prefix caching, speculative decoding) for online serving. Covers OpenAI-compatible API setup, performance tuning, token billing systems, and complete Docker deployment with Prometheus monitoring. Mapping the NVIDIA GPU Driver Stack: From Kernel Modules to Container Runtimes https://ifnodoraemon.github.io/en/articles/nvidia-gpu-package-architecture/ https://ifnodoraemon.github.io/en/articles/nvidia-gpu-package-architecture/ Sun, 05 Apr 2026 00:00:00 GMT A deep dive into the complex Linux NVIDIA GPU driver package structures. Understand the 5-layer architecture bridging nvidia-dkms, libnvidia, nvidia-utils, and driver metapackages. Plus, discover enterprise best practices and troubleshooting guides for 4 core deployment scenarios, including Docker model servers and DGX clusters. LLM Quantization Precision Guide: From FP32 to 1-bit, How Much Quality Do You Actually Lose? https://ifnodoraemon.github.io/en/articles/quantization-precision-guide/ https://ifnodoraemon.github.io/en/articles/quantization-precision-guide/ Tue, 31 Mar 2026 00:00:00 GMT A comprehensive comparison of FP32, BF16, FP16, FP8, INT8, INT4, NF4, FP4, 1.58-bit and all major quantization formats — with real benchmark data and an in-depth FP8 vs INT8 technical analysis. 7 Runtime Practices for Building AI Agents https://ifnodoraemon.github.io/en/articles/agent-runtime-practices/ https://ifnodoraemon.github.io/en/articles/agent-runtime-practices/ Sun, 15 Mar 2026 00:00:00 GMT Based on a real data analysis agent project, this article distills 7 reusable Agent Runtime practices covering state exposure, tool design, context control, guardrails, delegation, and trace-driven iteration. MCP Deep Dive: The USB-C Port for AI https://ifnodoraemon.github.io/en/articles/mcp-guide/ https://ifnodoraemon.github.io/en/articles/mcp-guide/ Thu, 12 Mar 2026 00:00:00 GMT From architecture to hands-on development, a complete guide to the Model Context Protocol. Includes Python SDK tutorial, security mechanisms, and ecosystem comparison. Skills Deep Dive: Give Your AI Coding Assistant a Professional Brain https://ifnodoraemon.github.io/en/articles/skills-guide/ https://ifnodoraemon.github.io/en/articles/skills-guide/ Thu, 12 Mar 2026 00:00:00 GMT From core principles to cross-platform practice, a complete guide to the AI coding assistant Skills system. Covers SKILL.md mechanics, six-platform comparison, hands-on writing guide, and best practices. Prompt Engineering Practice Guide https://ifnodoraemon.github.io/en/articles/prompt-engineering-guide/ https://ifnodoraemon.github.io/en/articles/prompt-engineering-guide/ Mon, 09 Mar 2026 00:00:00 GMT An in-depth exploration of designing effective prompts to improve model output quality. Covers core techniques like Few-Shot, Chain-of-Thought, and ReAct with practical examples. Deep Dive into 6 AI Foundation Model Trends in 2026 https://ifnodoraemon.github.io/en/articles/ai-trends-2026/ https://ifnodoraemon.github.io/en/articles/ai-trends-2026/ Sat, 07 Mar 2026 00:00:00 GMT From Thinking reasoning modes to Agentic applications, a deep dive into the top 6 trends in AI foundation models for 2026. Building AI Agent Applications from Scratch https://ifnodoraemon.github.io/en/articles/build-ai-agent/ https://ifnodoraemon.github.io/en/articles/build-ai-agent/ Thu, 05 Mar 2026 00:00:00 GMT A step-by-step guide to building intelligent agent systems using LangChain and the Claude API. Includes complete code and architecture design. Retrieval-Augmented Generation (RAG) in Practice https://ifnodoraemon.github.io/en/articles/rag-in-practice/ https://ifnodoraemon.github.io/en/articles/rag-in-practice/ Tue, 03 Mar 2026 00:00:00 GMT From vector database selection to Embedding strategies, a complete guide to building an enterprise-grade RAG system. Includes a practical comparison between Pinecone and Weaviate. 2026 Mainstream Foundation Models Comparison: GPT-5.4 vs Claude Opus 4.6 vs Gemini 3.1 Pro https://ifnodoraemon.github.io/en/articles/model-comparison-2026/ https://ifnodoraemon.github.io/en/articles/model-comparison-2026/ Sun, 01 Mar 2026 00:00:00 GMT A comprehensive comparison of the top three foundation models in 2026, covering reasoning, coding, context windows, API pricing, and selection strategies. Multimodal AI Models Starter Guide https://ifnodoraemon.github.io/en/articles/multimodal-guide/ https://ifnodoraemon.github.io/en/articles/multimodal-guide/ Sat, 28 Feb 2026 00:00:00 GMT Explore the vision and text capabilities of multimodal models like GPT-5.4 and Gemini 3.1 Pro, with practical use cases in image and video analysis. A Comprehensive Guide to LLM Fine-Tuning Workflows https://ifnodoraemon.github.io/en/articles/fine-tuning-guide/ https://ifnodoraemon.github.io/en/articles/fine-tuning-guide/ Wed, 25 Feb 2026 00:00:00 GMT A comparison of LoRA, QLoRA, and Full Fine-tuning. A complete workflow and best practices from data preparation to model deployment.