Big Data Visualization
About the project:
This Java application has been created to automate the entire process of looking at data and analyzing it. The customer will import the spreadsheets that they want to be looked at, and the application will return a graph that best captures the variables that exist within the spreadsheet that is determined based off machine-learning algorithms. The possible types of graphs that can be returned to the user include: line graph, bar graph, pie chart, and scatter plots. The labels that are placed on these graphs to point out key data values in the dataset are created based off a scoring system to decide how “important” each category (including outliers, skewness, correlations, number of observations, as well as other summary statistics) is to alert to the user. The program will also return the summary statistics that the user specifies that they want to see indicated by the checkboxes in the menu, including correlations between variables that exist in these spreadsheets, as well as other information that may be of use to the user, including the modes of central tendency (mean, median, and mode), interquartile range, outliers existing in the data, the standard deviation and variance of the variables, and more. The program aims to make the user’s job of identifying what variables are of importance in a large data set easier without having to learn the commands of an existing statistical tool like SAS, STATA, or R as well as give them a quick way to understand how each of the variables in the spreadsheet are related through the visualization component. One of the central problems of the “big data” issue is that data will continue to be created at an accelerated rate, and the faster people have the ablility to analyze data with the help of an automated tool like this one, the more business, companies, and other people working with data will be able to make better informed decisions and not neglect essential data.
Bio:
Peter Mattoon and Sameer Rau are soon-to-be graduating seniors studying Computer Science at The George Washington University.