In recent years, there has been a growing interest in applying deep learning techniques for automatic generation of software. To achieve this ambitious objective, a number of smaller research goals need to be reached, one of which is automatic categorization of software, used in numerous tasks of software intelligence. We present here an approach to this problem using a set of low-level features derived from lexical analysis of software code. We compare different feature sets for categorizing software and also apply different supervised machine learning algorithms to perform the classification task. The representation allows us to identify the most relevant libraries used for each class, and we use the best-performing classifier to accomplish this. We evaluate our approach by applying it to categorize popular Python projects from Github.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com