Data exploration is the first step in data analysis and typically involves summarizing the main characteristics of a dataset. It is commonly conducted using visual analytics tools, but can also be done in more advanced statistical software, such as R.
Before a formal data analysis can be conducted, the analyst must know how many cases are in the dataset, what variables are included, how many missing observations there are and what general hypotheses the data is likely to support. An initial exploration of the dataset helps answer these questions by familiarizing analysts about the data with which they are working.
Analysts commonly use data visualization software for data exploration because it allows users to quickly and simply view most of the relevant features of their dataset. From this step, users can identify variables that are likely to have interesting observations. By displaying data graphically -- for example, through scatter plots or bar charts -- users can see if two or more variables correlate and determine if they are good candidates for further in-depth analysis.