活的課綱

研究與論文精選

每週從三類論文中精選。我們對抗靜態 MOOC 的護城河：與領域同步演化的課綱。

論文篩選框架

我們刻意平衡三類論文——避免課綱被 LLM 熱潮綁架，忽略實施科學與法規。

方法論文

新模型、基準測試、評估框架

臨床驗證

真實世界部署、前瞻性研究、工作流影響

批判分析

偏差、資料漂移、幻覺、失敗模式、治理

每篇論文的五個問題

研究問題是什麼？

資料從哪來？有沒有選擇偏差？

模型與對照組是否合適？

指標對臨床決策有意義嗎？

這能進教材嗎？能進臨床嗎？

「最危險的論文不是明顯有缺陷的——而是包裝得很漂亮的半成品。」

精選論文——2026 年 3 月

Evaluation & Methodology

A clinical environment simulator for dynamic AI evaluation

Luo L, et al. · Nature Medicine · 2026 Mar 12

DOI: 10.1038/s41591-026-04252-6 · PMID: 41820673

為什麼這篇論文重要

Medical AI cannot be judged by static benchmark scores alone. This paper proposes a Clinical Environment Simulator (CES) that evaluates LLMs within a digital hospital where each decision changes subsequent patient states — mimicking real clinical path-dependency.

教學要點

Why USMLE-style benchmarks are insufficient for clinical AI
Dynamic vs static evaluation: sequential decisions accumulate errors
Foundation for FDA/deployment science and post-deployment monitoring

A 軌：建構

Evaluation design, offline benchmark vs dynamic evaluation, task formulation

B 軌：評估

Clinical decision support, human-AI collaboration, safety evaluation

C 軌：部署

Post-deployment monitoring frameworks, regulatory evaluation standards

Architecture & Applications

The role of agentic artificial intelligence in healthcare: a scoping review

Collaco BG, et al. · npj Digital Medicine · 2026 Mar 14

DOI: 10.1038/s41746-026-02517-5 · PMID: 41832341

為什麼這篇論文重要

As AI moves from chatbots to autonomous agents, healthcare needs a clear taxonomy. This scoping review maps the landscape of agentic AI — distinguishing copilots, tool-using agents, and multi-agent systems, while noting the field remains early and immature.

教學要點

Taxonomy: chatbot vs copilot vs tool-using agent vs multi-agent system
Mapping exercise: which clinical tasks merit which automation level?
Risk framing: accountability, tool misuse, hallucination amplification

A 軌：建構

From generative AI to agentic AI: planning, tool use, autonomy levels

B 軌：評估

Clinical orchestration, documentation, triage, workflow automation

C 軌：部署

Agent governance, deployment boundary-setting, approval frameworks

Governance & Privacy

Cautious optimism on foundation models in medical imaging: balancing privacy and innovation

Santos R, et al. · npj Digital Medicine · 2026

DOI: 10.1038/s41746-026-02533-5 · PMID: 41833961

為什麼這篇論文重要

Foundation models in medical imaging may retain patient-identifiable signals. Retinal imaging re-identification rates reach 94%. This perspective argues for dual-track defense: technical safeguards (DP-SGD, feature disentanglement) plus policy frameworks.

教學要點

“Removing names” does not equal anonymization in imaging data
Privacy leakage mechanisms: demographic/identity signals in embeddings
Dual defense: PII scrubbing, DP-SGD, homomorphic encryption + policy

A 軌：建構

Representation learning, privacy leakage, de-identification limits

B 軌：評估

Imaging AI governance, data stewardship, responsible deployment

C 軌：部署

Institutional privacy policy, vendor due diligence, data agreements

延伸閱讀

基準測試與評估

Holistic evaluation of large language models for medical tasks with MedHELM

Bedi S, et al. · Nature Medicine · 2026

Introduces MedHELM: 5 task categories, 22 subcategories, 121 tasks, 37 evaluations across 9 frontier LLMs. Key finding: no single score represents medical ability — task decomposition matters more than leaderboard rankings.

加入討論

每週論文導讀會議對已報名學員開放。學會閱讀、批判、並應用最新研究。

立即報名