S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition

S3T-Former is the first purely spike-driven Transformer architecture designed for energy-efficient skeleton action recognition, overcoming the power consumption limits of traditional AI.

Computer Science > Computer Vision and Pattern Recognition

Title:S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition

View PDF HTML (experimental)Abstract:Skeleton-based action recognition is crucial for multimedia applications but heavily relies on power-hungry Artificial Neural Networks (ANNs), limiting their deployment on resource-constrained edge devices. Spiking Neural Networks (SNNs) provide an energy-efficient alternative; however, existing spiking models for skeleton data often compromise the intrinsic sparsity of SNNs by resorting to dense matrix aggregations, heavy multimodal fusion modules, or non-sparse frequency domain transformations. Furthermore, they severely suffer from the short-term amnesia of spiking neurons. In this paper, we propose the Spiking State-Space Topology Transformer (S3T-Former), which, to the best of our knowledge, is the first purely spike-driven Transformer architecture specifically designed for energy-efficient skeleton action recognition. Rather than relying on heavy fusion overhead, we formulate a Multi-Stream Anatomical Spiking Embedding (M-ASE) that acts as a generalized kinematic differential operator, elegantly transforming multimodal skeleton features into heterogeneous, highly sparse event streams. To achieve true topological and temporal sparsity, we introduce Lateral Spiking Topology Routing (LSTR) for on-demand conditional spike propagation, and a Spiking State-Space (S3) Engine to systematically capture long-range temporal dynamics without non-sparse spectral workarounds. Extensive experiments on multiple large-scale datasets demonstrate that S3T-Former achieves highly competitive accuracy while theoretically reducing energy consumption compared to classic ANNs, establishing a new state-of-the-art for energy-efficient neuromorphic action recognition.

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Source: arXiv cs.AI Recent

S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition

Computer Science > Computer Vision and Pattern Recognition

Title:S3T-Former: A Purely Spike-Driven State-Space Topology Transformer for Skeleton Action Recognition

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

More in this category

Generative Ontology Induction: Domain-Agnostic Schema Discovery from Document Corpora Using Large Language Models

JUMP: Single-Pass Membership Inference on Fine-Tuned Diffusion Language Models

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL

Democratizing AI with Small Language Models: Structured Benchmarking and Parameter-Efficient Fine-Tuning for Local Deployment

PlanFlip: Attacking Multi-Agent LLM Systems via Planning-Phase Prompt Injection

Rater State Bias in RLHF Preference Data: An Audit Framework

Most read

Discover All Categories