This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

B

Big Data

The world is one big data problem.

- Andrew McAfee, co-director of the MIT Initiative

Big data is a term that describes large, hard-to-manage volumes of data – both structured and unstructured – that inundate businesses on a day-to-day basis.

Examples

Following are a few examples of big data databases, just to give y’all an idea of how big this could be:

  • The New York Stock Exchange is an example of Big Data that generates about one terabyte ( 10^12 bytes ◉‿◉) of new trade data per day.

  • A single Jet engine can generate 10+ terabytes of data in 30 minutes of flight time. With many thousand flights per day, generation of data reaches up to many Petabytes.

Big Data

Why is Big Data Important?

The importance of big data doesn’t simply revolve around how much data you have. The value lies in how you use it. By taking data from any source and analyzing it, you can find answers that

  1. streamline resource management
  2. improve operational efficiencies
  3. optimize product development
  4. drive new revenue and growth opportunities
  5. enable smart decision making.

When you combine big data with high-performance analytics provided by Google Cloud services, you can accomplish business-related tasks such as:

  • Determining root causes of failures, issues and defects in near-real time.
  • Spotting anomalies faster and more accurately than the human eye.
  • Improving patient outcomes by rapidly converting medical image data into insights.
  • Recalculating entire risk portfolios in minutes.
  • Sharpening deep learning models’ ability to accurately classify and react to changing variables.
  • Detecting fraudulent behavior before it affects your organization.

How Google Cloud services helps?

Google Cloud Platform provides a bunch of different services, which cover all popular needs of data and Big Data applications.

We would be discussing two critical services i.e. BigQuery and BigTable here.

1 - BigQuery

Analyse your Big Data with fast and reliable BigQuery Analytics.

BigQuery’s serverless infrastructure lets you focus on your data instead of resource management. BigQuery combines a cloud-based data warehouse and powerful analytic tools.

BigQuery storage

  • BigQuery stores data using a columnar storage format that is optimized for analytical queries.

  • BigQuery presents data in tables, rows, and columns and provides full support for database transaction semantics (ACID).

  • BigQuery storage is automatically replicated across multiple locations to provide high availability.

BigQuery analytics

  • Descriptive and prescriptive analysis uses include business intelligence, ad hoc analysis, geospatial analytics, and machine learning.

  • You can query data stored in BigQuery or run queries on data where it lives using external tables or federated queries including Cloud Storage, Bigtable, Spanner, or Google Sheets stored in Google Drive.

Learn

Learn more from the official documentation.

2 - BigTable

Store your Big Data in fast and highly scalable BigTable storage services.

Cloud Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, enabling you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key. Bigtable is ideal for storing very large amounts of single-keyed data with very low latency. It supports high read and write throughput at low latency, and it is an ideal data source for MapReduce operations.

Bigtable is exposed to applications through multiple client libraries, including a supported extension to the Apache HBase library for Java. As a result, it integrates with the existing Apache ecosystem of open-source Big Data software.

Bigtable’s powerful back-end servers offer several key advantages over a self-managed HBase installation:

  • Incredible scalability Bigtable scales in direct proportion to the number of machines in your cluster. A self-managed HBase installation has a design bottleneck that limits the performance after a certain threshold is reached. Bigtable does not have this bottleneck, so you can scale your cluster up to handle more reads and writes.
  • Simple administration Bigtable handles upgrades and restarts transparently, and it automatically maintains high data durability. To replicate your data, simply add a second cluster to your instance, and replication starts automatically. No more managing replicas or regions; just design your table schemas, and Bigtable will handle the rest for you.
  • Cluster resizing without downtime You can increase the size of a Bigtable cluster for a few hours to handle a large load, then reduce the cluster’s size again—all without any downtime. After you change a cluster’s size, it typically takes just a few minutes under load for Bigtable to balance performance across all of the nodes in your cluster.

Learn

Learn more from the official documentation.