Curriculum Vitae 👨💻
📚 EDUCATION BACKGROUND
University College London (QS 9), 09/2024 - Present
MSc in Computer Graphics, Vision and Imaging
- Compulsory modules: Machine Vision, Image Processing, Computer Graphics, Machine Learning for Visual Computing
- Optional modules: Inverse Problems in Imaging, Acquisition and Processing of 3D Geometry, Numerical Optimisation, Virtual Environments.
Hefei University of Technology (Project 211), 09/2020 - 06/2024
BEng in Computer Science and Technology
- Score: 90.10% (3.85/4.0); 5/152
🧪 RESEARCH
Medical Dermatology Research on LLM-Based Question-Answering [paper][github], 12/2024 - present
Researcher
- Currently, we have collected over 200 cases of medical record data from real hospitals. The data includes skin images of various parts of the body taken at different time periods for each patient, chief complaints, case characteristics, history of present illness, symptoms, tongue diagnosis, pulse diagnosis, specialist examination reports, Traditional Chinese Medicine (TCM) syndrome differentiation, treatment methods, prescriptions, and Chinese herbal medicine.
Research on Skeleton-Based Micro-Action Recognition [paper][github][huggingface], 10/2024 - present
Researcher
- Divided skeletal-temporal relationships into four types and applied partition-specific self-attention for each type, focusing on efficiently capturing fine-grained skeletal-temporal correlations to optimize micro-action recognition algorithms, enhancing their accuracy and computational efficiency.
- Exploring an Adapter-based model tuning approach to replace the traditional Fine-tuning paradigm, aiming to reduce parameter update costs and enhance the efficiency of transfer learning from images to videos in large-scale models.
Performance Analysis of Traditional VQA Models Under Limited Computational Resources [paper], 06/2024 - 12/2024
First Author
- In real-world applications where computational resources are limited, effectively integrating visual and textual information for Visual Question Answering (VQA) presents significant challenges. This paper investigates the performance of traditional models under computational constraints, focusing on enhancing VQA performance, particularly for numerical and counting questions. We evaluate models based on Bidirectional GRU (BidGRU), GRU, Bidirectional LSTM (BidLSTM), and Convolutional Neural Networks (CNN), analyzing the impact of different vocabulary sizes, fine-tuning strategies, and embedding dimensions. Experimental results show that the BidGRU model with an embedding dimension of 300 and a vocabulary size of 3000 achieves the best overall performance without the computational overhead of larger models. Ablation studies emphasize the importance of attention mechanisms and counting information in handling complex reasoning tasks under resource limitations. Our research provides valuable insights for developing more efficient VQA models suitable for deployment in environments with limited computational capacity.
Enhancing the Baseline Performance of OrienterNet for Visual Localization, 01/2024 - 06/2024
Collaborative Researcher, Yan Da (Advisor)
- Our task aims to further optimize OrienterNet, a neural network-based visual localization method designed to achieve accurate localization using 2D public maps (e.g., planar maps). The original approach matches camera-captured images with public maps, effectively addressing localization challenges in GPS-denied scenarios, especially in indoor and complex urban environments.
Sub-reviewer for SIGIR’24, 02/2024
- Served as a sub-reviewer for The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’24), one of the premier conferences in the field of information retrieval.
End-to-End Sign Language Recognition using Transformers, 12/2022 - 10/2023
Researcher, LMC-VUT (Lab)
- Enhanced a Transformer-based model for sign language recognition, achieving improved translation accuracy and reduced word error rate.
Navigation System for Visually Impaired People Based on Visual Ambient Intelligence (Undergraduate Graduation Project) [thesis][github], 06/2021 - 6/2024
Project Leader, Guo Dan (Advisor)
- This thesis proposes a visually aware navigation system to assist visually impaired individuals in outdoor travel. By integrating computer vision, artificial intelligence, and cloud computing technologies, the system captures environmental information using a binocular camera and processes the data through advanced algorithms such as object detection and semantic segmentation. Spatial awareness is enhanced through 3D audio feedback. The hardware configuration includes an OAK-D-PRO camera, a head-mounted headset, a Raspberry Pi 5, and a Quectel RM500Q-GL module, with cloud servers employed for efficient data processing. The research conducted in this thesis aims to deliver a safe, convenient, and efficient navigation solution for visually impaired individuals, while also providing theoretical and methodological support for future studies in the field.
- The primary contributions of this thesis are as follows: (1) the design and implementation of a navigation system based on visual environmental perception; (2) the proposal of a deep intelligent interaction-based outdoor assistance method tailored for visually impaired individuals; (3) the development of an offset warning system leveraging semantic segmentation techniques; (4) the introduction of a collision warning method utilizing image-based object detection and visual depth estimation; (5) the formulation of a route planning method founded on weighted undirected graph principles; and (6) the construction of a road image dataset specifically for the Hefei University of Technology campus.
🏅 HONOURS & AWARDS
Honors and Awards
🏅 Outstanding Graduate of Hefei University of Technology, Class of 2024
🏅 Outstanding Graduation Thesis (Design), Class of 2024, Hefei University of Technology
🏅 “Three Good Students” at School Level, Academic Years 2022 & 2023
🏅 First-Class Scholarship, Academic Year 2023
🏅 Second-Class Scholarship, Academic Years 2021 & 2022
Patents
📜 Blind Travel Obstacle Avoidance Assistance System V1.0 (Authorized: 2023SR0517944)
📜 Outdoor visual impairment assisting method based on deep intelligent interaction (CN114724053A)
📜 Preferential direction deviation early warning system and method based on semantic segmentation (CN114723946A)
📜 Route planning method for visually impaired people based on weighted undirected graph (CN116448130A)
📜 Collision early warning method based on image target detection and visual depth estimation (CN116403146A)
- Competitions
🥇 Gold Award, 9th Internet+ Student Innovation and Entrepreneurship Competition, Hefei University of Technology
🥇 Outstanding at National Level, Student Innovation and Entrepreneurship Program, Hefei University of Technology
🥇 Outstanding at School Level, Student Innovation and Entrepreneurship Program, Hefei University of Technology
🥈 Second Prize, “Challenge Cup” Extracurricular Academic and Scientific Works Competition (School-level Selection)
🥉 Third Prize, 25th China Robot and Artificial Intelligence Competition, Anhui Division
🥉 Bronze Award, 8th Internet+ School-level Competition, Hefei University of Technology
🥉 Third Prize, Chinese College Student Computer Design Competition (School-level)
🥉 Third Prize, 17th School Programming Contest (ACM Selection)
🥉 Third Prize, School of Computer and Information Science Fun Programming Competition
- Certifications
- ✍️ UCL Pre-sessional English Course (75 S71|R77|L72|W79)
- 🎓 Huawei Big Data Skills Certification (Analyzing E-commerce Real-time Business Data Using DLI Flink SQL)
💼 PROFESSIONAL EXPERIENCE
Shenzhen Boshengteng Technology Co., Ltd.
Embedded Development Assistant, 07/2023 - 08/2023
Conducted comprehensive research on AI chip design, focusing on embedded neural network design, RISC-V architecture, Verilog programming, and power optimization.
Contributed to the design and optimization of an AI chip processing unit, involving architecture selection, instruction set optimization, and module creation using Verilog.
Shenzhen Boshengteng Technology Co., Ltd.
Algorithm Development Assistant, 03/2023 - 06/2023
Participated in APA and HPP projects for advanced driving technology, contributing to data annotation, model testing, and optimization of parking and object detection algorithms.
Developed technical documentation for perception algorithms, detailing design, implementation, and evaluation.
Beijing Tuosida Technology Development Co., Ltd.
Python Development Engineer Assistant, 01/2023 - 02/2023
Developed machine learning models using PyTorch and TensorFlow, including data preprocessing, feature engineering, and model evaluation.
Documented system requirements, design specifications, and test protocols.
🔧 TECHNICAL SKILLS
- Programming: C, C++, Python, Verilog, Java, JavaScript, HTML
- Embedded Systems: Multi-cycle CPU design, ARM pipelined CPU design, embedded software testing
📚 教育背景
伦敦大学学院 (University College London) (QS 9),2024年9月 - 至今
计算机图形学、视觉与成像硕士 (MSc in Computer Graphics, Vision and Imaging)
- 必修课程:机器视觉、图像处理、计算机图形学、视觉计算中的机器学习
- 选修课程:成像中的逆问题、3D几何的获取与处理、数值优化、虚拟环境
合肥工业大学 (211工程),2020年9月 - 2024年6月
计算机科学与技术学士 (BEng in Computer Science and Technology)
- 成绩:90.10%(3.85/4.0);5/152
🧪 科研经历
基于LLM的皮肤多模态问答研究,2024年12月 - 至今
研究员
- 目前我们收集了200余例来自真实医院的病历数据,数据包括每个病人多个时期的全身各部分皮肤图像、主诉、病例特点、现病史、症见、舌象、脉象、专科检查报告、中医辨证、治法、处方、中药。
微动作识别的研究,2024年10月 - 至今
研究员
- 将骨骼-时间关系分为四种类型并针对每种类型进行分区自注意力计算,聚焦于高效捕获骨骼-时间的细粒度相关性,优化微动作识别算法,提升其精度与计算效率。
- 探索基于Adapter的模型调优方法,替代传统的Fine-tuning范式,以降低参数更新成本,提高从图像到视频的迁移学习效率。
传统视觉问答(VQA)模型在有限计算资源下的性能分析,2024年6月 - 2024年12月
第一作者
- 研究在实际应用中如何在有限计算资源下高效整合视觉与文本信息解决视觉问答问题,特别是数值和计数类问题。论文探讨了基于双向GRU(BidGRU)、GRU、双向LSTM(BidLSTM)和卷积神经网络(CNN)的模型在不同词汇量、调优策略及嵌入维度下的性能表现。实验表明,具有300维嵌入和3000词汇量的BidGRU模型在不增加额外计算开销的情况下表现最佳。消融实验强调了注意力机制和计数信息在处理复杂推理任务中的重要性。本研究为有限计算能力环境下更高效的VQA模型开发提供了宝贵的见解。
提升OrienterNet在视觉定位中的基准性能,2024年1月 - 2024年6月
协作研究员,严达(导师)
- 优化基于神经网络的视觉定位方法OrienterNet,该方法利用二维公开地图(如平面地图)进行准确定位,特别解决了室内及复杂城市环境中GPS失效的场景。
SIGIR’24 会议副审稿员 (Sub-reviewer for SIGIR’24),2024年2月
- 作为第47届ACM国际信息检索会议(SIGIR’24)的副审稿员,评审了相关学术论文。
基于Transformer的手语识别端到端研究,2022年12月 - 2023年10月
研究员, LMC-VUT (Lab)
- 基于Transformer的手语识别模型进行优化,提高了翻译准确率并降低了词错误率(WER)。
基于视觉环境感知的视障人士出行导航系统(本科毕业设计),2021年6月 - 2024年6月
项目负责人, 郭丹 (导师)
- 本文设计了一种基于视觉环境感知的视障人士出行导航系统。通过结合计算机视觉、人工智能和云计算技术,该系统使用双目相机捕获环境信息,并通过目标检测和语义分割等先进算法处理数据,同时通过3D音频反馈增强视障人士的空间感知能力。系统硬件由OAK-D-PRO相机、头戴式耳机、树莓派5和移远RM500Q-GL模块组成,并结合云端服务器实现高效的数据处理。本研究旨在为视障人士提供一种安全、便捷、高效的出行导航解决方案,同时为相关领域的研究与设计提供理论支持和方法参考。
- 本文的主要贡献包括: (1) 设计并实现了一个基于视觉环境感知的导航系统; (2) 提出了基于深度智能交互的室外视障辅助方法; (3) 开发了基于语义分割的偏移预警系统; (4) 提出了基于图像目标检测和视觉深度估计的碰撞预警方法; (5) 构建了基于带权无向图的路线规划方法; (6) 完成了合肥工业大学校园道路图像数据集的构建。
🏅 荣誉与奖项
- 荣誉:
- 🏅 合肥工业大学2024届优秀毕业生
- 🏅 合肥工业大学2024届优秀毕业论文(设计)
- 🏅 校级三好学生(2022、2023)
- 🏅 校一等奖学金(2023)
- 🏅 校二等奖学金(2021,2022)
- 专利:
- 📜 盲人避障出行辅助系统V1.0 [2023SR0517944]
- 📜 一种基于深度智能交互的室外辅助方法 [CN114724053A]
- 📜 一种基于语义分割的择优式方法偏移预警系统和方法 [CN114723946A]
- 📜 一种基于带权无向图的视障人士路线规划方法 [CN116448130A]
- 📜 一种基于图像目标检测和视觉深度估计得碰撞预警方法 [CN116403146A]
- 竞赛:
- 🥇 合肥工业大学第九届互联网大学生创新创业大赛(互联网+) 金奖
- 🥇 合肥工业大学大学生创新创业项目国家级 优秀
- 🥇 合肥工业大学大学生创新创业项目校级 优秀
- 🥈 “挑战杯”大学生课外学术科技作品竞赛校内选拔赛 二等奖
- 🥉 第二十五届中国机器人及人工智能大赛安徽赛区 三等奖
- 🥉 合肥工业大学第八届互联网+校赛 铜奖
- 🥉 中国大学生计算机设计大赛校赛 三等奖
- 🥉 第17届校程序设计比赛(ACM选拔) 三等奖
- 🥉 校计算机与信息学院趣味编程竞赛 三等奖
- 证书:
- ✍️ UCL学术英语预科课程(75 S71|R77|L72|W79)
- 🎓 华为大数据技能认证(使用DLI Flink SQL进行电商实时业务数据分析)
💼 专业经历
深圳市博锶腾科技有限公司 (Shenzhen Boshengteng Technology Co., Ltd.),2023年7月 - 2023年8月
嵌入式开发助理
- 研究AI芯片设计,专注于嵌入式神经网络设计、RISC-V架构、Verilog编程和功耗优化。
- 参与AI芯片处理单元的设计与优化,包括架构选择、指令集优化及Verilog模块实现。
深圳市博锶腾科技有限公司 (Shenzhen Boshengteng Technology Co., Ltd.),2023年3月 - 2023年6月
算法开发助理
- 参与APA和HPP项目的高级驾驶技术,负责数据标注、模型测试及停车和目标检测算法的优化。
- 编写感知算法技术文档,涵盖设计、实现与评估。
北京拓思达科技发展有限公司 (Beijing Tuosida Technology Development Co., Ltd.),2023年1月 - 2023年2月
Python开发工程师助理
- 使用PyTorch和TensorFlow开发机器学习模型,包括数据预处理、特征工程和模型评估。
- 编写系统需求、设计规范和测试协议文档。
🔧 技术技能
- 编程语言:Python、C、C++、Verilog、Java、JavaScript、HTML
- 嵌入式系统:多周期CPU设计、ARM流水线CPU设计、嵌入式软件测试