A promising direction in deep learning research is to learn representations and simultaneously discover cluster structure in unlabeled data by optimizing a discriminative loss function. Contrary to supervised deep learning, this line of research is in its infancy and the design and optimization of a suitable loss function with the aim of training deep neural networks for clustering is still an open challenge. In this paper, we propose to leverage the discriminative power of information theoretic divergence measures, which have experienced success in traditional clustering, to develop a new deep clustering network. Our proposed loss function incorporates explicitly the geometry of the output space, and facilitates fully unsupervised training end-to-end. Experiments on real datasets show that the proposed algorithm achieves competitive performance with respect to other state-of-the-art methods.