Deep neural networks have shown to be efficient models for learning complex mapping functions. It is known that adding more layers improves the performance but it comes with the cost of adding more parameters which requires more training labeled data. Pre-training techniques have shown to be helpful in training deeper networks by exploiting unlabeled data. However, in practice, this technique requires a lot of effort to setup the raised hyper-parameters. We present in this talk a regularization scheme to alleviate this problem. We extend our approach for structured output problems.