Curriculum Vitae 👨💻
📚 EDUCATION BACKGROUND
University College London (QS 9), 09/2024 - Present
MSc in Computer Graphics, Vision and Imaging
Compulsory modules: Machine Vision, Image Processing, Computer Graphics, Machine Learning for Visual Computing
Optional modules: Inverse Problems in Imaging, Acquisition and Processing of 3D Geometry, Numerical Optimisation, Virtual Environments.
Hefei University of Technology (Project 211), 09/2020 - 06/2024
BEng in Computer Science and Technology
- Score: 90.10% (3.85/4.0); 5/152
🧪 RESEARCH
Text-guided Anomaly Detection in Videos (Master thesis) [github], 03/2025 - present
Researcher, Kaan Akşit (Advisor)
- This project proposes a new video anomaly detection method that integrates a multimodal autoencoder with large language models (LLM) to efficiently detect and semantically interpret abnormal events in surveillance videos. The system first leverages a pre-trained off-the-shelf variational autoencoder (VAE/VQ-VAE) to extract features from video frames, mapping high-dimensional visual data into a low-dimensional latent space. Meanwhile, a CLIP model is employed for text-visual embedding to capture critical multimodal information. Based on differences in the latent feature distributions, the system identifies anomalous regions and utilizes an LLM for contextual semantic inference, generating intuitive anomaly descriptions and providing decision support.
Medical Dermatology Research on LLM-Based Question-Answering [paper][github], 12/2024 - present
Researcher
- Currently, we have collected over 200 cases of medical record data from real hospitals. The data includes skin images of various parts of the body taken at different time periods for each patient, chief complaints, case characteristics, history of present illness, symptoms, tongue diagnosis, pulse diagnosis, specialist examination reports, Traditional Chinese Medicine (TCM) syndrome differentiation, treatment methods, prescriptions, and Chinese herbal medicine.
Skeleton-based Micro-Action Recognition [paper][github], 10/2024 - 04/2025
First Author
- Micro-Actions (MAs) are an important form of non-verbal communication in social interactions, with potential applications in human emotional analysis. However, existing methods in Micro-Action Recognition often overlook the inherent subtle changes in MAs, which limits the accuracy of distinguishing MAs with subtle changes. To address this issue, we present a novel Motion-guided Modulation Network (MMN) that implicitly captures and modulates subtle motion cues to enhance spatial-temporal representation learning. Specifically, we introduce a Motion-guided Skeletal Modulation module (MSM) to inject motion cues at the skeletal level, acting as a control signal to guide spatial representation modeling. In parallel, we design a Motion-guided Temporal Modulation module (MTM) to incorporate motion information at the frame level, facilitating the modeling of holistic motion patterns in micro-actions. Finally, we propose a motion consistency learning strategy to aggregate the motion cues from multi-scale features for micro-action classification. Experimental results on the Micro-Action 52 and iMiGUE datasets demonstrate that MMN achieves state-of-the-art performance in skeleton-based micro-action recognition, underscoring the importance of explicitly modeling subtle motion cues.
Performance Analysis of Traditional VQA Models Under Limited Computational Resources [paper], 06/2024 - 12/2024
First Author
- In real-world applications where computational resources are limited, effectively integrating visual and textual information for Visual Question Answering (VQA) presents significant challenges. This paper investigates the performance of traditional models under computational constraints, focusing on enhancing VQA performance, particularly for numerical and counting questions. We evaluate models based on Bidirectional GRU (BidGRU), GRU, Bidirectional LSTM (BidLSTM), and Convolutional Neural Networks (CNN), analyzing the impact of different vocabulary sizes, fine-tuning strategies, and embedding dimensions. Experimental results show that the BidGRU model with an embedding dimension of 300 and a vocabulary size of 3000 achieves the best overall performance without the computational overhead of larger models. Ablation studies emphasize the importance of attention mechanisms and counting information in handling complex reasoning tasks under resource limitations. Our research provides valuable insights for developing more efficient VQA models suitable for deployment in environments with limited computational capacity.
Enhancing the Baseline Performance of OrienterNet for Visual Localization, 01/2024 - 06/2024
Collaborative Researcher, Yan Da (Advisor)
- Our task aims to further optimize OrienterNet, a neural network-based visual localization method designed to achieve accurate localization using 2D public maps (e.g., planar maps). The original approach matches camera-captured images with public maps, effectively addressing localization challenges in GPS-denied scenarios, especially in indoor and complex urban environments.
End-to-End Sign Language Recognition using Transformers, 12/2022 - 10/2023
Researcher, LMC-VUT (Lab)
- Enhanced a Transformer-based model for sign language recognition, achieving improved translation accuracy and reduced word error rate.
Navigation System for Visually Impaired People Based on Visual Ambient Intelligence (Undergraduate Graduation Project) [thesis][github], 06/2021 - 6/2024
Project Leader, Guo Dan (Advisor)
- This thesis proposes a visually aware navigation system to assist visually impaired individuals in outdoor travel. By integrating computer vision, artificial intelligence, and cloud computing technologies, the system captures environmental information using a binocular camera and processes the data through advanced algorithms such as object detection and semantic segmentation.
- The primary contributions of this thesis are as follows: (1) the design and implementation of a navigation system based on visual environmental perception; (2) the proposal of a deep intelligent interaction-based outdoor assistance method tailored for visually impaired individuals; (3) the development of an offset warning system leveraging semantic segmentation techniques; (4) the introduction of a collision warning method utilizing image-based object detection and visual depth estimation; (5) the formulation of a route planning method founded on weighted undirected graph principles; and (6) the construction of a road image dataset specifically for the Hefei University of Technology campus.
📝 Paper Review
Conference
Reviewer for The 2025 International Joint Conference on Neural Networks (IJCNN’25), 03/2025
Reviewer for The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’24), 02/2024
Journal
- Reviewer for Engineering Applications of Artificial Intelligence(EAAI), 04/2025
- Reviewer for Intelligent Data Analysis(IDA), 04/2025
🏅 HONOURS & AWARDS
- Honors and Awards
Outstanding Graduate of Hefei University of Technology, Class of 2024
Outstanding Graduation Thesis (Design), Class of 2024, Hefei University of Technology
“Three Good Students” at School Level, Academic Years 2022 & 2023
First-Class Scholarship, Academic Year 2023
Second-Class Scholarship, Academic Years 2021 & 2022
Patents
Blind Travel Obstacle Avoidance Assistance System V1.0 (Authorized: 2023SR0517944)
Outdoor visual impairment assisting method based on deep intelligent interaction (CN114724053A)
A Semantic Segmentation-Based Preferential Direction Deviation Early Warning System and Method (CN114723946A)
Route planning method for visually impaired people based on weighted undirected graph (CN116448130A)
A collision warning method based on image target detection and visual depth estimation (CN116403146A)
- Competitions
Gold Award, 9th Internet+ Student Innovation and Entrepreneurship Competition, Hefei University of Technology
Outstanding at National Level, Student Innovation and Entrepreneurship Program, Hefei University of Technology
Outstanding at School Level, Student Innovation and Entrepreneurship Program, Hefei University of Technology
Second Prize, “Challenge Cup” Extracurricular Academic and Scientific Works Competition (School-level Selection)
Third Prize, 25th China Robot and Artificial Intelligence Competition, Anhui Division
Bronze Award, 8th Internet+ School-level Competition, Hefei University of Technology
Third Prize, Chinese College Student Computer Design Competition (School-level)
Third Prize, 17th School Programming Contest (ACM Selection)
Third Prize, School of Computer and Information Science Fun Programming Competition
- Certifications
- UCL Pre-sessional English Course (75 S71|R77|L72|W79)
- Huawei Big Data Skills Certification (Analyzing E-commerce Real-time Business Data Using DLI Flink SQL)
💼 PROFESSIONAL EXPERIENCE
Shenzhen Boshengteng Technology Co., Ltd.
Embedded Development Assistant, 07/2023 - 08/2023
Conducted comprehensive research on AI chip design, focusing on embedded neural network design, RISC-V architecture, Verilog programming, and power optimization.
Contributed to the design and optimization of an AI chip processing unit, involving architecture selection, instruction set optimization, and module creation using Verilog.
Shenzhen Boshengteng Technology Co., Ltd.
Algorithm Development Assistant, 03/2023 - 06/2023
Participated in APA and HPP projects for advanced driving technology, contributing to data annotation, model testing, and optimization of parking and object detection algorithms.
Developed technical documentation for perception algorithms, detailing design, implementation, and evaluation.
Beijing Tuosida Technology Development Co., Ltd.
Python Development Engineer Assistant, 01/2023 - 02/2023
Developed machine learning models using PyTorch and TensorFlow, including data preprocessing, feature engineering, and model evaluation.
Documented system requirements, design specifications, and test protocols.
🔧 TECHNICAL SKILLS
- Programming: C, C++, Python, Verilog, Java, JavaScript, HTML
- Embedded Systems: Multi-cycle CPU design, ARM pipelined CPU design, embedded software testing