Cloud Platform

Run Apache Hadoop easily on Google Cloud Platform

Apache Hadoop running on Google Cloud Platform benefits from the quality of Google’s compute, storage and network infrastructure. In addition, Google has provided native connectors for Hadoop so that code running in Hadoop (via Java MapReduce, Hadoop Streaming, Pig, Hive, etc.) can directly access data stored in Google Cloud Storage, Google BigQuery and Google Cloud Datastore.

Benefits of Hadoop on Google Cloud Platform:

  • Easy to get started

    Use the best tools from the open source ecosystem. With one command you can start a cluster running either Hadoop, Hive, Pig, Spark or Shark in order to get up and running quickly without worrying about configuration hassles.

  • Scale and value

    Per-minute billing lets you optimize for scale and speed. Sustained-use discounts automatically reward you for long running clusters.

  • Quick startup times

    Compute Engine virtual machines start in seconds.

  • Shared storage

    Read, write and analyze data from Google Cloud Storage (an object store service), Datastore (a fully-managed NoSQL database) or BigQuery (a columnar database to run large-scale queries in seconds). Using such storage services allows you to turn down your cluster without losing any of your data and access your data within any of your Hadoop deployments.

  • Price/Performance

    An Accenture total-cost-of-ownership study found that using the Cloud Storage connector for Hadoop is typically cheaper than on-premise solutions because of reduced operational overhead, data accessibility between clusters and raw performance.

How to Get Started

Use Apache Hadoop

Follow our instructions to learn how to configure and use a Hadoop cluster. No prior Hadoop experience required! By default your cluster would be set up to read and write data from Google Cloud Storage.

Get some help

Contact sales

Let our sales team help you determine the best way to begin using Hadoop on Google Cloud Platform.

Contact sales

Work with a partner

MapR crushed the MinuteSort record using Hadoop and Compute Engine and they can help you too.

Learn more about MapR

Managed Hadoop

Managed Hadoop

Try Qubole's Hadoop-as-a-Service on Google Cloud Platform. Run a Hive query within minutes with an easy-to-use GUI, built-in connectors and data pipeline/workflow tools.

Learn more about Qubole

Customers using Hadoop on Cloud Platform

YouVersion

YouVersion

“Hadoop on Google Compute Engine has improved the reliability of our MapReduce jobs by over 50% relative to previous solutions. We've also been able to eliminate significant network bandwidth costs since we don't need to shuffle several terabytes of data outside of a single vendor's network, and have reduced our general operational costs due to the high performance, low overhead, and overall resilience of Google Compute Engine infrastructure.”

Josh Turmel YouVersion Architect

MapR Technologies

Qubole

"Google File System and Google MapReduce inspired the development of Hadoop. Now, we’re coming full circle with Hadoop available on GCE. We believe that Qubole delivers one of the most solid foundations for cloud-based Big Data processing and are pleased that we can contribute to its performance, ease of use and low cost."

Shrikanth Shankar VP Engineering, Qubole

MapR Technologies

MapR Technologies

MapR used Google Compute Engine to set a new world record for MinuteSort, sorting 15 billion 100-byte records (a total of 1.5 trillion bytes) in 60 seconds. The benchmark, often referred to as the World Cup of data sorting, demonstrated how quickly data can be sorted starting and ending on disks. While the previous MinuteSort record was achieved with custom hardware, MapR set the record using Google Compute Engine, Hadoop MapReduce and the MapR Distribution for Apache Hadoop.