Evaluate Weigh the pros and cons of technologies, products and projects you are considering.

Insights into buying big data analytics tools

Before companies purchase a big data analytics tool, they must identify their specific needs and match them to available products.

This article can also be found in the Premium Editorial Download: Business Information: Choices widen for BI big data tools:

Big data analytics tools enable users to analyze a wide variety of information -- from structured transaction data to social media posts, Web server log files and other forms of unstructured and semi-structured data. Once your organization has decided to buy a big data analytics tool, the next step is to create a process for evaluating the available products and then find the one that best fits your needs and requirements.

Let's examine the must-have features and specific attributes that can be used to assess how well the various big data analytics tools available will meet your organization's needs. You can then develop a request for proposal (RFP) by mapping how those needs are satisfied using these tools.

Evaluation criteria

Breadth and depth of modeling techniques. Vendors have applied different levels of effort and, correspondingly, have developed analytics capabilities with diverse levels of sophistication. The breadth of the analytical modeling that's supported by individual tools reflects the different approaches provided. Some examples include regression techniques, time series models that predict variable values based on an analysis of past trends, classification and regression trees (also known as CART), and neural networks.

The depth of the modeling techniques characterizes two aspects of the approaches employed: the algorithmic sophistication that provides greater accuracy and precision of the developed models, and the flexibility of the modeling techniques. In other words, what level of expertise in data mining and predictive analytics is necessary to understand what kinds of models can be developed, and how can they be built with a particular tool?

Less experienced data analysts may be interested in vendor products that provide a broad swath of analytics capabilities, whereas more expert analysts and statisticians may prefer tools with greater depth in specific types of analytic models.

Integration and accessibility. Big data analytics applications often rely on a growing number of internal and external data sources, containing both structured and unstructured data. This drives a need for functionality supporting data accessibility and systems integration. Features to consider in this area include the following:

  • Unstructured data utilization. Verify that the product is able to ingest the different types of unstructured data (documents, emails, images, videos, presentations, streams from social media channels, etc.) and can parse and make sense of the incoming information.
  • Big data accessibility. Compare how the vendors' tools connect to big data architectures, including distributed data stored in Hadoop, as well as files managed within other types of scale-out storage (for example, NoSQL databases such as MongoDB or Apache Cassandra).
  • Interoperability with existing platform components. This is crucial when there's an expectation of blending more traditional data management and BI practices with analytics methodologies. For example, many analytics tools allow analytical models to be invoked through traditional SQL queries. This form of interoperability allows the results of the predictive models to drive the kind of querying and reporting with which more traditional data analysts are typically comfortable.
  • Connectivity. It's important to assess connectivity, or how well the product can access other systems as well as act as a source to feed results to established platforms for reporting and analysis.

Ease of use. Some big data analytics products have been built from the ground up by vendors, while others are based on the open source statistical language R. In either scenario, this evaluation category focuses on how easy the product is to use for analyzing data, developing models and determining the efficacy and accuracy of the models. Evaluation criteria should include the following:

  • Usability for business analysts. Can business analysts without a statistical background easily develop analyses and applications? Check if the product provides visual methods that facilitate development and analytics uses.
  • Flexibility in deployment for different business use cases. As suggested in a previous article, the same algorithmic methods can be applied in many different business scenarios for different industries. If the kinds of analyses your organization plans to do are somewhat limited and are centered on more general use cases (such as customer lifetime value analysis, fraud analysis or retention prevention), you may be able to tolerate less flexible techniques. However, should your organization desire a broader and less constrained approach to analytics, look for a greater degree of modeling flexibility.
  • Model scoring. This includes additional tools that help analysts automatically compare the accuracy, efficacy and predictive value of different predictive models intended for similar business scenarios.
  • Collaboration. Isolated analysis and development can lead to replicated efforts and uncoordinated results. Providing a means of integrating collaboration capabilities and sharing analytical models as part of the big data analytics platform enables analysts to work together to refine their applications and subsequently reuse the same models, thereby lowering development costs while increasing consistency.

System management capabilities of big data analytics tools

The practical aspects of integrating a new technology into the organization must also be considered. Evaluating the simplicity of administration and configuration includes understanding any system requirements and dependencies for installation, configuration and ongoing management. For example, the big data analytics tools that take advantage of the statistical models in R require that the R environment be acquired and installed at the same time the products are installed. This will also include identifying the platforms on which the product may be installed, as well as determining the platforms that can embed the developed models and applications.

Other considerations include security associated with the designation of roles and access rights for both the analytics process and the incorporation of developed models into business applications. Explore what options the products provide for authentication, authorization and access control.

Performance

Most high-end Hadoop platforms and specialty appliances are engineered to provide multiple compute nodes for parallel processing and distributed computing. If a high level of execution performance is a requirement, it's critical that the products you evaluate take advantage of massively parallel processing (MPP) system configurations.

Using an MPP platform introduces a need for the selected tool to efficiently use the platform's performance optimizations, including:

  • Parallelism and data distribution. Parallel systems work best when parallel processes execute independently on data sets that are distributed in a way that minimizes network bandwidth and maximizes data locality. Review how the product's parallelization optimally dovetails with the data distribution strategy.
  • The product's push-down capability. This enables the analysis algorithms to take advantage of the inherent capabilities of the other components of the system stack. An example would be if a database management system provides parameterized modeling utilities as part of its tool suite, and those utilities have been natively optimized to take advantage of the architecture features in the DBMS. In this case, it's wise for the analytics tool to use the native capability rather than attempt to replicate it.
  • Scalability and elasticity. As data volumes expand and data management platforms are scaled out, assess how the different analytics products are intended to scale linearly with the increased processing and storage capacity.

The cost of big data analytics tools

In most cases with big data technologies, the prices of products understandably influence the buying decision. Some big data analytics tools are costly, while others are low-cost or, in some cases, free. Alternatively, a vendor will provide different features, capabilities or freedom from constraints (such as limitations on analyzed data volumes), depending on the price to be paid.

Another consideration is the need for special services. For each of the products to be evaluated, assess whether it's necessary to engage the software vendor or external experts to help with installation and training or to provide specialty development services.

Also, be sure to consider the long-term total cost of ownership (TCO) on the tools you're evaluating. TCO calculations should include annual maintenance fees and the allocated associated costs for the system stack supporting the product, as well as an allocation for operations and maintenance staff, data center space, cooling and other utilities.

Developing your RFP

Narrow down your set of candidate vendors by filtering out those whose products best address your organization's specific use cases. Examine how your organization's requirements map to the evaluation categories described above and create an RFP that, aside from the standard set of questions about integration, interoperability and corporate details, focuses on quantifying conformance with your expectations for factors such as analytical modeling, data volumes, necessary levels of expertise and data accessibility requirements.

Determine the most critical differentiating factors, such as the ability of the product to scale and perform well based on growing data volumes, its ability to consume unstructured data, and the breadth and depth of the modeling capabilities. At the same time, develop questions that reflect the needs of your user community, especially if there are analysts with different levels of expertise or there's a need for enterprise collaboration. In addition, key influencing factors for selecting a big data analytics tool include its initial price, its staffing requirements and its total cost of operations, making questions about cost and budget relevant to the evaluation.

Articulating and prioritizing the business needs and then specifying the expectations from the pool of vendor products will enable your acquisition team to map the business needs to the categories for evaluation. Configure your RFP by reviewing the list above, defining the questions that need to be asked, and specifying the acceptable responses to determine the degree to which any specific product meets your needs.

Next Steps

Good grip needed on big data analytics tools and applications

Guide to big data analytics tools, trends and best practices

Streaming data systems take big data analytics into real-time realm

Learn more about solid data integration techniques

This was last published in July 2015

PRO+

Content

Find more PRO+ content and other member only offers, here.

Buyer's Guide

A buyer's guide to selecting the right big data analytics software

Join the conversation

3 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What are your organization's most critical differentiating factors when it comes to buying a big data analytics tool?
Cancel
When my clients are deciding on big data analytic tool, performance and quick processing is a huge factor for them. Also, the ability to export results in the best formats matters. Other considerations are  - the existing integrations and the potential to have more integrations with other tools and workload management (cloud based). Oflate - the kind of data security offered plays a significant role in decision making.
Cancel
Brilliant article David. You indeed covered almost each and every parameter one can think of before buying a big data analytics tool. However, if I could add one thought to it - it would be data security. These days security is beginning to take significant priority for all. Any tool we select should reasonably protect the data from being exposed to unintended audience and should have sufficient security controls , so data cant be accessed without permission. To some extent data should be encrypted too.
Cancel

-ADS BY GOOGLE

SearchDataManagement

SearchAWS

SearchContentManagement

SearchCRM

SearchOracle

SearchSAP

SearchSQLServer

SearchSalesforce

Close