I am currently pursuing an M.S. in Computer Science at ShanghaiTech University under the guidance of Prof. Qian Wang (王乾). I also completed my Bachelor’s degree in Computer Science at ShanghaiTech University. I have published serveral papers about medical image and computer vision with total .
My research is driven by the goal of advancing multimodal healthcare in the foundation model era. Below are my key areas of interest:
- Application of pre-trained models in medical imaging scenarios.
- Deep Learning Multimodal Research in Images, Text, and 3D.
- 3D human body reconstruction and 3D interaction between humans and objects.
🔥 News
- 2024.12: 🎉🎉 One paper accepted by AAAI 2025 and selected for oral presentation.
- 2024.05: 🎉🎉 One paper accepted by IEEE TMI.
- 2024.02: 🎉🎉 Two papers accepted by ISBI 2024, one selected for oral presentation.
📖 Educations

ShanghaiTech University, Shanghai, China
Sept. 2022 - Present
M.S. in Computer Science
Supervisor: Prof. Qian Wang

ShanghaiTech University, Shanghai, China
Sept. 2018 - 2022
B.E. in Computer Science
📝 Publications

MITracker: Multi-View Integration for Visual Object Tracking
Mengjie Xu*, Yitao Zhu*, Haotian Jiang, Jiaming Li, Zhenrong Shen, Sheng Wang, Haolin Huang, Xinyu Wang, Han Zhang, Qing Yang, Qian Wang+
- Introduces MVTrack, a large-scale dataset with 234K frames and precise annotations for 27 object categories, providing a benchmark for class-agnostic multi-view object tracking.
- Proposes MITracker, a method leveraging BEV-guided 3D feature volumes and spatial-enhanced attention for robust target recovery in multi-view tracking.
- Demonstrates that MITracker achieves state-of-the-art performance, improving recovery rates from 56.7% to 79.2% on MVTrack and GMTD datasets.

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction
Yitao Zhu*, Sheng Wang*, Mengjie Xu, Zixu Zhuang, Zhixin Wang, Kaidong Wang, Han Zhang, Qian Wang+
Project |
- Introduces a technique for accurately reconstructing 3D human poses and shapes from images captured by uncalibrated cameras.
- Utilizes pre-trained monocular models to estimate camera positions and employs a distance distribution optimization strategy for precise joint fusion, addressing self-occlusion issues.
- Deploys a model to reweight human surface for accurate body shape estimation.outputs.

Chatcad+: Towards a Universal and Reliable Interactive CAD using LLMs
Zihao Zhao*, Sheng Wang*, Jinchen Gu*, Yitao Zhu*, Lanzhuju Mei, Zixu Zhuang, Zhiming Cui, Qian Wang, Dinggang Shen+
Project |
- Integrates medical imaging and a professional knowledge base to enhance the reliability of Large Language Models in healthcare.
- Trains CLIP models on various medical imaging modalities for disease classification and designs an efficient mechanism to retrieve relevant medical expertise based on user statements.
- Uses the retrieved information to provide references, improving the trustworthiness of LLM outputs.

Melo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis
Yitao Zhu, Zhenrong Shen, Zihao Zhao, Sheng Wang, Xin Wang, Xiangyu Zhao, Dinggang Shen, Qian Wang+
Project |
- Transfers natural image pre-trained models to medical image diagnostic tasks using just 0.17% trainable parameters, achieving performance comparable to full model fine-tuning across various medical imaging modalities.
- Provides rapid task-switching capabilities and reduced memory usage in clinical deployment scenarios.outputs.

Doctorglm: Fine-tuning Your Chinese Doctor is not a Herculean Task
Honglin Xiong*, Sheng Wang*, Yitao Zhu*, Zihao Zhao*, Yuxiao Liu, Linlin Huang, Qian Wang, Dinggang Sheng+
Project |
- Developed the first Chinese medical dialogue model in China using a subset of Chinese medical dialogues, supplemented with translated high-quality English medical data and Q&A responses generated from Chinese medical textbooks.
- Employed advanced fine-tuning techniques like LoRA and p-tuning to optimize training strategies, supported by an active open-source community and enriched by over 40,000 pieces of user feedback.outputs.
- Inter-slice Super-resolution of Magnetic Resonance Images by Pre-training and Self-supervised Fine-tuning, Xin Wang*, Zhiyun Song*, Yitao Zhu, Sheng Wang, Lichi Zhang, Dinggang Shen, Qian Wang, ISBI 2024
🎖 Activities
Reviewer for:
- Pattern Recognitiion
- CVPR
💻 Teaching Assistant
- 2023.3 - 2023.7, BME2106 Medical Big-Data and Artificial Intelligence, ShanghaiTech University.