Date of Award
2015
Document Type
Thesis
Degree Name
Master of Science
College
College of Arts and Sciences
Program
Engineering & Computer Science, MS
First Advisor
Roy Villafane
Abstract
Problem
In life we need to compare situations in order to select the best solution. The study in this paper is about analyzing data (variables), which is also called data mining. There are situations where it is not enough to compare variables among themselves at one specific moment. Sometimes it is necessary to compare the behavior of variables at different periods of time and know how they behave at different times in order to select the best arrangements for any situation.
Method
To find correlation among variables, traffic intersections were simulated so they could be compared, since the correlation coefficient matrix is normalized. This type of matrix was used to compare intersections in different time variances to find the most interesting information. By comparing each point from the first matrix with each point to the second matrix one can find the intersections that are busier and have a larger difference from the others. Also, two formulas were found to help find the most interesting correlations; in one of those I modified the harmonic mean formula to obtain a balance between two important details.
Results
By using these two new formulas the most interesting information between variables may be found, such as those that are the most popular or least popular (average value) and those that are very different from or very similar to each other (difference value) at different times. “Rank 1” is the value of the balance between the average and the difference, with values ranging between 0 and 0.6. A 0 means that those intersections have very low values in averages and differences, and 0.6 means the opposite. The formula “Rank 2” is based on assigning weight into the average and the difference categories. Once the formula is applied, the values would be between 0 and 1, where 0 will mean that their average or their difference is low, depending on which one was assigned more weight. A value of 1 would mean the opposite. The weight depends on what is needed for a specific situation.
Conclusions
By comparing two correlation coefficient matrices from any type of data in different time periods (since this type of matrix is already normalized) anybody can find out very interesting information for any situations where we need to know how different and popular any types of variables may be. Finally, the most interesting information may be identified by calculating the average or the difference between variables. As an example, these formulas were used to compare traffic intersections, and the result obtained was a rank with the most popular intersections to the less important intersections, which confirmed previously observed traffic patterns.
Subject Area
Data mining; Correlation (Statistics); Matrices; Variables (Mathematics)
Recommended Citation
Bravo Gonzalez, Yesica Daniela, "Ranking Interesting Changes in Correlation Coefficient Matrix Results from Varying Data Partitions in Causal Graphic Modeling" (2015). Master's Theses. 66.
https://dx.doi.org/10.32597/theses/66/
https://digitalcommons.andrews.edu/theses/66
Creative Commons License
This work is licensed under a Creative Commons Attribution-No Derivative Works 4.0 International License.
DOI
https://dx.doi.org/10.32597/theses/66/