I am currently a Research Assistant at the Computational Light Laboratory, University College London, working under the supervision of Assoc. Kaan Akşit. I am also in the process of applying for PhD programs in Computer Vision.

I recently completed my MSc in Computer Graphics, Vision and Imaging at University College London. Under the supervision of Assoc. Kaan Akşit, my research centers on text-guided video anomaly detection (VAD) based on Large Vision-Language Models (LVLMs), aiming to enable fine-grained, interpretable, and human-centered video understanding.

I received my BEng in Computer Science and Technology from Hefei University of Technology, School of Computer Science and Information Engineering (School of Artificial Intelligence), where I was supervised by Prof. Dan Guo. During my undergraduate studies, I conducted research on visual perception systems for assisting visually impaired individuals, exploring multimodal sensing, intelligent interaction, and navigation technologies.

My research interests span computer vision, multimodal learning, and vision-language understanding, with an emphasis on perception, reasoning, and generation in visual intelligence. I am particularly interested in building systems that connect human motion, emotion, and cognition through multimodal signals, with additional interests in autonomous driving and environmental perception.

If you are seeking any form of academic collaboration, please feel free to contact me.

🔥 News

  • 2025.07: 🎉🎉🎉 Our paper accepted by ACM MM 2025.
  • 2025.05: 🏆🏆🏆 Our work won the champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI 2025.
  • 2024.06: 🏆🏆🏆 My undergraduate thesis was recognized as the Best Thesis Award at Hefei University of Technology.

📝 Publications

CVPR 2026
sym

MA-Bench: Towards Fine-grained Micro-Action Understanding (under-review)

[CVPR’26] [Paper] [Project]

Kun Li, Jihao Gu, Fei Wang, zhiliang wu, Hehe Fan, Dan Guo

Eurographics 2026
sym

Text-guided Fine-Grained Video Anomaly Detection (under-review)

[Eurographics’26] [Paper] [Project]

Jihao Gu, Kun Li, Wang He, Akşit Kaan

ACM MM 2025
sym

Motion Matters: Motion-guided Modulation Network for Skeleton-based Micro-Action Recognition

[ACM MM’25] [Paper] [Project]

Jihao Gu, Kun Li, Fei Wang, Yanyan Wei, Zhiliang Wu, Hehe Fan, Meng Wang

IJCAI 2025
sym

MM-Gesture: Towards Precise Micro-Gesture Recognition through Multimodal Fusion

[IJCAI’25 Workshop] [Paper] [Project]

Jihao Gu, Fei Wang, Kun Li, Yanyan Wei, Zhiliang Wu, Dan Guo

🏆 The Champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI2025.

PRML 2025
sym

Performance Analysis of Traditional VQA Models Under Limited Computational Resources

[IEEE PRML’25] [Paper]

Jihao Gu

🔬 Projects

  • 2025.09 - 2025.11: MABench: Towards Fine-grained Micro-Action Understanding

    Jihao Gu, Kun Li

    • Proposed MA-Bench, a comprehensive benchmark comprising 1,000 videos and a three-tier evaluation architecture that progressively examines micro-action perception, relational comprehension, and interpretive reasoning
    • Constructed MA-Bench-Train, a large-scale training corpus with 20K videos annotated for detailed motion patterns and finetuned Qwen3-VL-8B on MA-Bench-Train, achieving consistent gains across micro-action reasoning tasks
  • 2025.03 - 2025.09: Text-guided Fine-Grained Video Anomaly

    [Postgraduate Project] [Thesis]

    Jihao Gu, Kaan Akşit (Supervisor), He Wang (Co-supervisor)

    • Proposed Text-guided Fine-Grained Video Anomaly Detection (T-VAD), a framework built upon Large Vision-Language Model (LVLM)
    • Introduced an Anomaly Heatmap Decoder (AHD) that performs pixel-wise visual-textual feature alignment to generate fine-grained anomaly heatmaps
    • Designed a Region-aware Anomaly Encoder (RAE) that transforms the heatmaps into learnable textual embeddings, guiding the LVLM to accurately identify and localize anomalous events in videos
    • Achieved SOTA performance by demonstrating 94.8% micro-AUC and 67.8% / 76.7% accuracy in anomaly heatmaps (RBDC / TBDC) on the UBnormal dataset
  • 2024.01 - 2024.06: Enhancing the Baseline Performance of OrienterNet for Visual Localization

    Jihao Gu, Yan Da (Supervisor)

    Our task aims to further optimize OrienterNet, a neural network-based visual localization method designed to achieve accurate localization using 2D public maps (e.g., planar maps). The original approach matches camera-captured images with public maps, effectively addressing localization challenges in GPS-denied scenarios, especially in indoor and complex urban environments.

  • 2022.12 - 2023.10: End-to-End Sign Language Recognition using Transformers, 12/2022 - 10/2023

    Jihao Gu, Shengeng Tang

    • Read and analyzed two papers in detail, and learned the application of the Transformer architecture in sign language recognition and the design idea of the dual-block module
    • Used the PyTorch toolkit to reproduce the model in the paper, and achieved similar BLEU and WER scores on the validation set and test set
    • Combined the dual-block module with the reproduced model, and enhanced the BLEU and WER scores on the validation set and test set by parameter adjustment and multiple rounds of training
  • 2021.06 - 2024.06: Navigation System for Visually Impaired People Based on Visual Ambient Intelligence

    [Undergraduate Project] [Thesis] [Project]

    Jihao Gu, Guo Dan (Supervisor), Meng Wang (Co-supervisor)

    • Designed and implemented a navigation system based on visual environmental perception, with accuracy increased by approximately 15% compared with traditional navigation systems
    • Proposed a deep intelligent interaction-based outdoor assistance method tailored for visually impaired individuals
    • Developed an offset warning system leveraging semantic segmentation techniques
    • Introduced a collision warning method utilizing image-based object detection and visual depth estimation
    • Formulated a route planning method founded on weighted undirected graph principles
    • Constructed a road image dataset specifically for the Hefei University of Technology campus, which contains over 12,000 images

🧾 Patents

  • Blind Travel Obstacle Avoidance Assistance System V1.0 [2023SR0517944]
  • Outdoor Visual Impairment Assisting Method based on Deep Intelligent Interaction [CN114724053A]
  • Semantic Segmentation-Based Preferential Direction Deviation Early Warning System [CN114723946A]
  • Route Planning Method for Visually Impaired People [CN116448130A]
  • Collision Warning Method based on Image Target Detection and Depth Estimation [CN116403146A]

🎖 Honors and Awards

  • 2024.06: Outstanding Graduate of Hefei University of Technology.
  • 2024.06: Outstanding Graduation Thesis (Design)
  • 2023.09: First-Class Scholarship
  • 2023.09: “Three Good Students” Award
  • 2022.09: Second-Class Scholarship
  • 2022.09: “Three Good Students” Award
  • 2021.09: Second-Class Scholarship

💼 Internships

  • 2025.09 - now: Research Assistant, Computational Light Laboratory, University College London, London, UK
  • 2023.07 - 2023.08: Embedded Development Assistant, Shenzhen Boshengteng Technology Co., Ltd., Shenzhen, China
  • 2023.03 - 2023.06: Algorithm Development Assistant, Shenzhen Boshengteng Technology Co., Ltd., Shenzhen, China
  • 2023.01 - 2023.02: Python Development Engineer Assistant, Beijing Tuosida Technology Development Co., Ltd., Beijing, China

🤝 Services

  • 2025.12: Data Chair for The 3rd Micro-Action Analysis Grand Challenge (ACM MM’26)
  • 2025.11: Reviewer for IEEE Transactions on Multimedia (TMM)
  • 2025.11: Reviewer for ACM Transactions on Multimedia Computing Communications, and Applications (TOMM)
  • 2025.08: Volunteer for International Joint Conference on Artificial Intelligence (IJCAI’25, Guangzhou)
  • 2025.08: Reviewer for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
  • 2025.04: Reviewer for Engineering Applications of Artificial Intelligence (EAAI×4)
  • 2025.04: Reviewer for Intelligent Data Analysis (IDA×2)
  • 2025.03: Reviewer for the 2025 International Joint Conference on Neural Networks (IJCNN’25)
  • 2024.02: Reviewer for the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’24)

📖 Educations

  • 09/2024 - 09/2025: Postgraduate, Computer Graphics, Vision and Imaging , University College London, UK

    Modules: Machine Vision, Image Processing, Computer Graphics, Machine Learning for Visual Computing

    Optional: Inverse Problems in lmaging, Acquisition and Processing of 3D Geometry, Numerical Optimisation, Virtual Environments

    Score: Pass with Distinction (78.30/100)

  • 09/2020 - 06/2024: Undergraduate, Computer Science and Technology, Hefei University of Technology, China

    GPA: 90.10% (3.85/4.0), Rank: 5/152