活的課綱
研究與論文精選
每週從三類論文中精選。我們對抗靜態 MOOC 的護城河:與領域同步演化的課綱。
論文篩選框架
我們刻意平衡三類論文——避免課綱被 LLM 熱潮綁架,忽略實施科學與法規。
方法論文
新模型、基準測試、評估框架
臨床驗證
真實世界部署、前瞻性研究、工作流影響
批判分析
偏差、資料漂移、幻覺、失敗模式、治理
每篇論文的五個問題
研究問題是什麼?
資料從哪來?有沒有選擇偏差?
模型與對照組是否合適?
指標對臨床決策有意義嗎?
這能進教材嗎?能進臨床嗎?
「最危險的論文不是明顯有缺陷的——而是包裝得很漂亮的半成品。」
精選論文——2026 年 3 月
A clinical environment simulator for dynamic AI evaluation
Luo L, et al. · Nature Medicine · 2026 Mar 12
DOI: 10.1038/s41591-026-04252-6 · PMID: 41820673
為什麼這篇論文重要
Medical AI cannot be judged by static benchmark scores alone. This paper proposes a Clinical Environment Simulator (CES) that evaluates LLMs within a digital hospital where each decision changes subsequent patient states — mimicking real clinical path-dependency.
教學要點
- Why USMLE-style benchmarks are insufficient for clinical AI
- Dynamic vs static evaluation: sequential decisions accumulate errors
- Foundation for FDA/deployment science and post-deployment monitoring
A 軌:建構
Evaluation design, offline benchmark vs dynamic evaluation, task formulation
B 軌:評估
Clinical decision support, human-AI collaboration, safety evaluation
C 軌:部署
Post-deployment monitoring frameworks, regulatory evaluation standards
The role of agentic artificial intelligence in healthcare: a scoping review
Collaco BG, et al. · npj Digital Medicine · 2026 Mar 14
DOI: 10.1038/s41746-026-02517-5 · PMID: 41832341
為什麼這篇論文重要
As AI moves from chatbots to autonomous agents, healthcare needs a clear taxonomy. This scoping review maps the landscape of agentic AI — distinguishing copilots, tool-using agents, and multi-agent systems, while noting the field remains early and immature.
教學要點
- Taxonomy: chatbot vs copilot vs tool-using agent vs multi-agent system
- Mapping exercise: which clinical tasks merit which automation level?
- Risk framing: accountability, tool misuse, hallucination amplification
A 軌:建構
From generative AI to agentic AI: planning, tool use, autonomy levels
B 軌:評估
Clinical orchestration, documentation, triage, workflow automation
C 軌:部署
Agent governance, deployment boundary-setting, approval frameworks
Cautious optimism on foundation models in medical imaging: balancing privacy and innovation
Santos R, et al. · npj Digital Medicine · 2026
DOI: 10.1038/s41746-026-02533-5 · PMID: 41833961
為什麼這篇論文重要
Foundation models in medical imaging may retain patient-identifiable signals. Retinal imaging re-identification rates reach 94%. This perspective argues for dual-track defense: technical safeguards (DP-SGD, feature disentanglement) plus policy frameworks.
教學要點
- “Removing names” does not equal anonymization in imaging data
- Privacy leakage mechanisms: demographic/identity signals in embeddings
- Dual defense: PII scrubbing, DP-SGD, homomorphic encryption + policy
A 軌:建構
Representation learning, privacy leakage, de-identification limits
B 軌:評估
Imaging AI governance, data stewardship, responsible deployment
C 軌:部署
Institutional privacy policy, vendor due diligence, data agreements
延伸閱讀
基準測試與評估
Holistic evaluation of large language models for medical tasks with MedHELM
Bedi S, et al. · Nature Medicine · 2026
Introduces MedHELM: 5 task categories, 22 subcategories, 121 tasks, 37 evaluations across 9 frontier LLMs. Key finding: no single score represents medical ability — task decomposition matters more than leaderboard rankings.