data sampling

Data sampling is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points in order to identify patterns and trends in the data set being examined as a whole.

Data sampling is a statistical analysis technique used to select, manipulate and analyze a representative subset of data points in order to identify patterns and trends in the larger data set being examined.

Sampling allows data scientists, predictive modelers and other data analysts to work with a small, manageable amount of data in order to build and run analytical models more quickly, while still producing accurate findings. Sampling can be particularly useful with data sets that are too large to efficiently analyze in full -- for example, in big data analytics applications. An important consideration, though, is the size of the required data sample. In some cases, a very small sample can tell all of the most important information about a data set. In others, using a larger sample can increase the likelihood of accurately representing the data as a whole, even though the increased size of the sample may impede ease of manipulation and interpretation. Either way, samples are best drawn from data sets that are as large and close to complete as possible.

There are many different methods for drawing samples from data, and the ideal one  depends on the data set and situation. Sampling can be based on probability, an approach that uses random numbers that correspond to points in the data set . This approach ensures that there is no correlation between points that are chosen for the sample. Further variations in probability sampling include simple, stratified and systematic random sampling and multi-stage cluster sampling. Once generated, a sample can be used for predictive analytics.  For example, a retail business might use data sampling to uncover patterns about customer behavior and  predictive modeling to create more effective sales strategies.

See also: R programming language

This was first published in July 2014

Continue Reading About data sampling

Glossary

'data sampling' is part of the:

View All Definitions

Dig deeper on Predictive analytics

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

File Extensions and File Formats

Powered by:

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

Close