Curriculum Vitae 👨‍💻

📚 EDUCATION BACKGROUND

University College London (QS 9), 09/2024 - Present

MSc in Computer Graphics, Vision and Imaging

Compulsory modules: Machine Vision, Image Processing, Computer Graphics, Machine Learning for Visual Computing
Optional modules: Inverse Problems in Imaging, Acquisition and Processing of 3D Geometry, Numerical Optimisation, Virtual Environments.

Hefei University of Technology (Project 211), 09/2020 - 06/2024

BEng in Computer Science and Technology

Score: 90.10% (3.85/4.0); 5/152

🧪 RESEARCH

Text-guided Anomaly Detection in Videos (Master thesis) [github], 03/2025 - present

Researcher, Kaan Akşit (Advisor)

This project proposes a new video anomaly detection method that integrates a multimodal autoencoder with large language models (LLM) to efficiently detect and semantically interpret abnormal events in surveillance videos. The system first leverages a pre-trained off-the-shelf variational autoencoder (VAE/VQ-VAE) to extract features from video frames, mapping high-dimensional visual data into a low-dimensional latent space. Meanwhile, a CLIP model is employed for text-visual embedding to capture critical multimodal information. Based on differences in the latent feature distributions, the system identifies anomalous regions and utilizes an LLM for contextual semantic inference, generating intuitive anomaly descriptions and providing decision support.

Medical Dermatology Research on LLM-Based Question-Answering [paper][github], 12/2024 - present

Researcher

Currently, we have collected over 200 cases of medical record data from real hospitals. The data includes skin images of various parts of the body taken at different time periods for each patient, chief complaints, case characteristics, history of present illness, symptoms, tongue diagnosis, pulse diagnosis, specialist examination reports, Traditional Chinese Medicine (TCM) syndrome differentiation, treatment methods, prescriptions, and Chinese herbal medicine.

Skeleton-based Micro-Action Recognition [paper][github], 10/2024 - 04/2025

First Author

Micro-Actions (MAs) are an important form of non-verbal communication in social interactions, with potential applications in human emotional analysis. However, existing methods in Micro-Action Recognition often overlook the inherent subtle changes in MAs, which limits the accuracy of distinguishing MAs with subtle changes. To address this issue, we present a novel Motion-guided Modulation Network (MMN) that implicitly captures and modulates subtle motion cues to enhance spatial-temporal representation learning. Specifically, we introduce a Motion-guided Skeletal Modulation module (MSM) to inject motion cues at the skeletal level, acting as a control signal to guide spatial representation modeling. In parallel, we design a Motion-guided Temporal Modulation module (MTM) to incorporate motion information at the frame level, facilitating the modeling of holistic motion patterns in micro-actions. Finally, we propose a motion consistency learning strategy to aggregate the motion cues from multi-scale features for micro-action classification. Experimental results on the Micro-Action 52 and iMiGUE datasets demonstrate that MMN achieves state-of-the-art performance in skeleton-based micro-action recognition, underscoring the importance of explicitly modeling subtle motion cues.

Performance Analysis of Traditional VQA Models Under Limited Computational Resources [paper], 06/2024 - 12/2024

First Author

In real-world applications where computational resources are limited, effectively integrating visual and textual information for Visual Question Answering (VQA) presents significant challenges. This paper investigates the performance of traditional models under computational constraints, focusing on enhancing VQA performance, particularly for numerical and counting questions. We evaluate models based on Bidirectional GRU (BidGRU), GRU, Bidirectional LSTM (BidLSTM), and Convolutional Neural Networks (CNN), analyzing the impact of different vocabulary sizes, fine-tuning strategies, and embedding dimensions. Experimental results show that the BidGRU model with an embedding dimension of 300 and a vocabulary size of 3000 achieves the best overall performance without the computational overhead of larger models. Ablation studies emphasize the importance of attention mechanisms and counting information in handling complex reasoning tasks under resource limitations. Our research provides valuable insights for developing more efficient VQA models suitable for deployment in environments with limited computational capacity.

Enhancing the Baseline Performance of OrienterNet for Visual Localization, 01/2024 - 06/2024

Collaborative Researcher, Yan Da (Advisor)

Our task aims to further optimize OrienterNet, a neural network-based visual localization method designed to achieve accurate localization using 2D public maps (e.g., planar maps). The original approach matches camera-captured images with public maps, effectively addressing localization challenges in GPS-denied scenarios, especially in indoor and complex urban environments.

End-to-End Sign Language Recognition using Transformers, 12/2022 - 10/2023

Researcher, LMC-VUT (Lab)

Enhanced a Transformer-based model for sign language recognition, achieving improved translation accuracy and reduced word error rate.

Navigation System for Visually Impaired People Based on Visual Ambient Intelligence (Undergraduate Graduation Project) [thesis][github], 06/2021 - 6/2024

Project Leader, Guo Dan (Advisor)

This thesis proposes a visually aware navigation system to assist visually impaired individuals in outdoor travel. By integrating computer vision, artificial intelligence, and cloud computing technologies, the system captures environmental information using a binocular camera and processes the data through advanced algorithms such as object detection and semantic segmentation.
The primary contributions of this thesis are as follows: (1) the design and implementation of a navigation system based on visual environmental perception; (2) the proposal of a deep intelligent interaction-based outdoor assistance method tailored for visually impaired individuals; (3) the development of an offset warning system leveraging semantic segmentation techniques; (4) the introduction of a collision warning method utilizing image-based object detection and visual depth estimation; (5) the formulation of a route planning method founded on weighted undirected graph principles; and (6) the construction of a road image dataset specifically for the Hefei University of Technology campus.

📝 Paper Review

Conference

Reviewer for The 2025 International Joint Conference on Neural Networks (IJCNN’25), 03/2025
Reviewer for The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR’24), 02/2024

Journal

Reviewer for Engineering Applications of Artificial Intelligence(EAAI), 04/2025
Reviewer for Intelligent Data Analysis(IDA), 04/2025

🏅 HONOURS & AWARDS

Honors and Awards
- Outstanding Graduate of Hefei University of Technology, Class of 2024
- Outstanding Graduation Thesis (Design), Class of 2024, Hefei University of Technology
- “Three Good Students” at School Level, Academic Years 2022 & 2023
- First-Class Scholarship, Academic Year 2023
- Second-Class Scholarship, Academic Years 2021 & 2022
Patents
- Blind Travel Obstacle Avoidance Assistance System V1.0 (Authorized: 2023SR0517944)
- Outdoor visual impairment assisting method based on deep intelligent interaction (CN114724053A)
- A Semantic Segmentation-Based Preferential Direction Deviation Early Warning System and Method (CN114723946A)
- Route planning method for visually impaired people based on weighted undirected graph (CN116448130A)
- A collision warning method based on image target detection and visual depth estimation (CN116403146A)
Competitions
- Gold Award, 9th Internet+ Student Innovation and Entrepreneurship Competition, Hefei University of Technology
- Outstanding at National Level, Student Innovation and Entrepreneurship Program, Hefei University of Technology
- Outstanding at School Level, Student Innovation and Entrepreneurship Program, Hefei University of Technology
- Second Prize, “Challenge Cup” Extracurricular Academic and Scientific Works Competition (School-level Selection)
- Third Prize, 25th China Robot and Artificial Intelligence Competition, Anhui Division
- Bronze Award, 8th Internet+ School-level Competition, Hefei University of Technology
- Third Prize, Chinese College Student Computer Design Competition (School-level)
- Third Prize, 17th School Programming Contest (ACM Selection)
- Third Prize, School of Computer and Information Science Fun Programming Competition
Certifications
- UCL Pre-sessional English Course (75 S71|R77|L72|W79)
- Huawei Big Data Skills Certification (Analyzing E-commerce Real-time Business Data Using DLI Flink SQL)

💼 PROFESSIONAL EXPERIENCE

Shenzhen Boshengteng Technology Co., Ltd.
Embedded Development Assistant, 07/2023 - 08/2023
- Conducted comprehensive research on AI chip design, focusing on embedded neural network design, RISC-V architecture, Verilog programming, and power optimization.
- Contributed to the design and optimization of an AI chip processing unit, involving architecture selection, instruction set optimization, and module creation using Verilog.
Shenzhen Boshengteng Technology Co., Ltd.
Algorithm Development Assistant, 03/2023 - 06/2023
- Participated in APA and HPP projects for advanced driving technology, contributing to data annotation, model testing, and optimization of parking and object detection algorithms.
- Developed technical documentation for perception algorithms, detailing design, implementation, and evaluation.
Beijing Tuosida Technology Development Co., Ltd.
Python Development Engineer Assistant, 01/2023 - 02/2023
- Developed machine learning models using PyTorch and TensorFlow, including data preprocessing, feature engineering, and model evaluation.
- Documented system requirements, design specifications, and test protocols.

🔧 TECHNICAL SKILLS

Programming: C, C++, Python, Verilog, Java, JavaScript, HTML
Embedded Systems: Multi-cycle CPU design, ARM pipelined CPU design, embedded software testing

谷纪豪 Geo Gu