An algorithm for reducing dimension and size of sample for data exploration procedures
Autor
Kulczycki, Piotr
Łukasik, Szymon
Opublikowane w
International Journal of Applied Mathematics and Computer Science
Numeracja
vol. 24, nr 1
Strony
133-149
Data wydania
2014
Wydawca
De Gruyter Open
Język
angielski
DOI
10.2478/amcs-2014-0011
Abstrakt
The paper deals with the issue of reducing dimension and size of a data set (random sample) for purposes of exploratory data analysis procedures. The concept of the algorithm investigated here is based on linear transformation to a space of smaller dimension, while keeping as much as possible the same distances between particular elements. Elements of the transformation matrix are calculated using the metaheuristics of parallel fast simulated annealing. Moreover, the elimination or decrease in importance is performed on those data set elements which have undergone a significant change in location in relation to the others. The presented method can have universal application in a wide range of data exploration problems, offering flexible customization, possibility of use in the dynamic data environment and comparable or better performance with regards to the Principal Component Analysis. Its positive features were verified in detail for the domain's fundamental tasks of clustering, classification and detection of atypical elements (outliers).