A Robust Experimental Evaluation of Automated Multi-Label Classification Methods — arXiv2