This paper presents a new superpixel-based hand gesture recognition system based on a novel superpixel earth mover’s distance metric, together with Kinect depth camera. The depth and skeleton information from Kinect are effectively utilized to produce markerless hand extraction. The hand shapes, corresponding textures and depths are represented in the form of superpixels, which effectively retain the overall shapes and color of the gestures to be recognized. Based on this representation, a novel distance metric, superpixel earth mover’s distance (SP-EMD), is proposed to measure the dissimilarity between the hand gestures. This measurement is not only robust to distortion and articulation, but also invariant to scaling, translation and rotation with proper preprocessing. The effectiveness of the proposed distance metric and recognition algorithm are illustrated by extensive experiments with our own gesture dataset as well as two other public datasets. Simulation results show that the proposed system is able to achieve high mean accuracy and fast recognition speed. Its superiority is further demonstrated by comparisons with other conventional techniques and two real-life applications.