In general, face alignment technologies, which are implemented with cascades of Convolutional Neural Networks (CNNs), experience at least the following drawbacks: lack of end-to-end training, hand-crafted feature extraction, and slow training speed. For example, without end-to-end training, the CNNs cannot be optimized jointly, thereby leading to a sub-optimal solution. In addition, these type of face alignment technologies often implement simple hand-crafted feature extraction methods, which do not take into account various facial factors, such as pose, expression, etc. Moreover, these cascades of CNNs typically have shallow frameworks, which are unable to extract deeper features by building upon the extracted features of early-stage CNNs. Furthermore, training for these CNNs is usually time-consuming because each of the CNNs is trained independently and sequentially and also because hand-crafted feature extraction is required between two consecutive CNNs.