I am currently pursuing an M.S. in Computer Science at ShanghaiTech University under the guidance of Prof. Qian Wang (王乾). I also completed my Bachelor’s degree in Computer Science at ShanghaiTech University. I have published serveral papers about medical image and computer vision with total .

My research is driven by the goal of advancing multimodal healthcare in the foundation model era. Below are my key areas of interest:

  • Application of pre-trained models in medical imaging scenarios.
  • Deep Learning Multimodal Research in Images, Text, and 3D.
  • 3D human body reconstruction and 3D interaction between humans and objects.

🔥 News

  • 2024.12:  🎉🎉 One paper accepted by AAAI 2025 and selected for oral presentation.
  • 2024.05:  🎉🎉 One paper accepted by IEEE TMI.
  • 2024.02:  🎉🎉 Two papers accepted by ISBI 2024, one selected for oral presentation.

📖 Educations

image

ShanghaiTech University, Shanghai, China
Sept. 2022 - Present
M.S. in Computer Science
Supervisor: Prof. Qian Wang

image

ShanghaiTech University, Shanghai, China
Sept. 2018 - 2022
B.E. in Computer Science

📝 Publications

Arxiv
sym

MITracker: Multi-View Integration for Visual Object Tracking

Mengjie Xu*, Yitao Zhu*, Haotian Jiang, Jiaming Li, Zhenrong Shen, Sheng Wang, Haolin Huang, Xinyu Wang, Han Zhang, Qing Yang, Qian Wang+

  • Introduces MVTrack, a large-scale dataset with 234K frames and precise annotations for 27 object categories, providing a benchmark for class-agnostic multi-view object tracking.
  • Proposes MITracker, a method leveraging BEV-guided 3D feature volumes and spatial-enhanced attention for robust target recovery in multi-view tracking.
  • Demonstrates that MITracker achieves state-of-the-art performance, improving recovery rates from 56.7% to 79.2% on MVTrack and GMTD datasets.
AAAI 2025 (oral)
sym

MUC: Mixture of Uncalibrated Cameras for Robust 3D Human Body Reconstruction

Yitao Zhu*, Sheng Wang*, Mengjie Xu, Zixu Zhuang, Zhixin Wang, Kaidong Wang, Han Zhang, Qian Wang+

Project |

  • Introduces a technique for accurately reconstructing 3D human poses and shapes from images captured by uncalibrated cameras.
  • Utilizes pre-trained monocular models to estimate camera positions and employs a distance distribution optimization strategy for precise joint fusion, addressing self-occlusion issues.
  • Deploys a model to reweight human surface for accurate body shape estimation.outputs.
IEEE TMI
sym

Chatcad+: Towards a Universal and Reliable Interactive CAD using LLMs

Zihao Zhao*, Sheng Wang*, Jinchen Gu*, Yitao Zhu*, Lanzhuju Mei, Zixu Zhuang, Zhiming Cui, Qian Wang, Dinggang Shen+

Project |

  • Integrates medical imaging and a professional knowledge base to enhance the reliability of Large Language Models in healthcare.
  • Trains CLIP models on various medical imaging modalities for disease classification and designs an efficient mechanism to retrieve relevant medical expertise based on user statements.
  • Uses the retrieved information to provide references, improving the trustworthiness of LLM outputs.
ISBI 2024 (oral)
sym

Melo: Low-rank Adaptation is Better than Fine-tuning for Medical Image Diagnosis

Yitao Zhu, Zhenrong Shen, Zihao Zhao, Sheng Wang, Xin Wang, Xiangyu Zhao, Dinggang Shen, Qian Wang+

Project |

  • Transfers natural image pre-trained models to medical image diagnostic tasks using just 0.17% trainable parameters, achieving performance comparable to full model fine-tuning across various medical imaging modalities.
  • Provides rapid task-switching capabilities and reduced memory usage in clinical deployment scenarios.outputs.
Arxiv
sym

Doctorglm: Fine-tuning Your Chinese Doctor is not a Herculean Task

Honglin Xiong*, Sheng Wang*, Yitao Zhu*, Zihao Zhao*, Yuxiao Liu, Linlin Huang, Qian Wang, Dinggang Sheng+

Project |

  • Developed the first Chinese medical dialogue model in China using a subset of Chinese medical dialogues, supplemented with translated high-quality English medical data and Q&A responses generated from Chinese medical textbooks.
  • Employed advanced fine-tuning techniques like LoRA and p-tuning to optimize training strategies, supported by an active open-source community and enriched by over 40,000 pieces of user feedback.outputs.

🎖 Activities

Reviewer for:

  • Pattern Recognitiion
  • CVPR

💻 Teaching Assistant

  • 2023.3 - 2023.7, BME2106 Medical Big-Data and Artificial Intelligence, ShanghaiTech University.

🌍 Visitors