Easily and effectively unlock the power of the Apache Hadoop framework on Google Cloud Platform.
Apache Hadoop running on Google Cloud Platform benefits from the quality of Google’s compute, storage and network infrastructure. In addition, Google has provided native connectors for Hadoop so that code running in Hadoop (via Java MapReduce, Hadoop Streaming, Pig, Hive, etc.) can directly access data stored in Google Cloud Storage, Google BigQuery and Google Cloud Datastore.
Use the best tools from the open source ecosystem. With one command you can start a cluster running either Hadoop, Hive, Pig, Spark or Shark in order to get up and running quickly without worrying about configuration hassles.
Per-minute billing lets you optimize for scale and speed. Sustained-use discounts automatically reward you for long running clusters.
Compute Engine virtual machines start in seconds.
Read, write and analyze data from Google Cloud Storage (an object store service), Datastore (a fully-managed NoSQL database) or BigQuery (a columnar database to run large-scale queries in seconds). Using such storage services allows you to turn down your cluster without losing any of your data and access your data within any of your Hadoop deployments.
An Accenture total-cost-of-ownership study found that using the Cloud Storage connector for Hadoop is typically cheaper than on-premise solutions because of reduced operational overhead, data accessibility between clusters and raw performance.
Follow our instructions to learn how to configure and use a Hadoop cluster. No prior Hadoop experience required! By default your cluster would be set up to read and write data from Google Cloud Storage.
Let our sales team help you determine the best way to begin using Hadoop on Google Cloud Platform.
MapR crushed the MinuteSort record using Hadoop and Compute Engine and they can help you too.
Try Qubole's Hadoop-as-a-Service on Google Cloud Platform. Run a Hive query within minutes with an easy-to-use GUI, built-in connectors and data pipeline/workflow tools.
“Hadoop on Google Compute Engine has improved the reliability of our MapReduce jobs by over 50% relative to previous solutions. We've also been able to eliminate significant network bandwidth costs since we don't need to shuffle several terabytes of data outside of a single vendor's network, and have reduced our general operational costs due to the high performance, low overhead, and overall resilience of Google Compute Engine infrastructure.”
YouVersion Architect
"Google File System and Google MapReduce inspired the development of Hadoop. Now, we’re coming full circle with Hadoop available on GCE. We believe that Qubole delivers one of the most solid foundations for cloud-based Big Data processing and are pleased that we can contribute to its performance, ease of use and low cost."
VP Engineering, Qubole
MapR used Google Compute Engine to set a new world record for MinuteSort, sorting 15 billion 100-byte records (a total of 1.5 trillion bytes) in 60 seconds. The benchmark, often referred to as the World Cup of data sorting, demonstrated how quickly data can be sorted starting and ending on disks. While the previous MinuteSort record was achieved with custom hardware, MapR set the record using Google Compute Engine, Hadoop MapReduce and the MapR Distribution for Apache Hadoop.