March 7, 2019
TABLE OF CONTENTS
On Thursday, February 21, 2019, Jamie Kim and Melissa Schiff of Levvel attended Google Cloud OnBoard, a full-day event that covered the capabilities of the Google Cloud Platform, including big data processing and machine learning. This OnBoard event was held in Midtown, New York City.
The most valuable asset on earth is no longer oil—it’s data.
While this is becoming a common saying in the tech industry, its shock value is no less diminished. Today, everyone has a data shadow, cast by everything from consumer and financial data to social media data. Of course, this means data has become a tremendous resource, and it is being monetized and used to drive value across many sectors. This vital new commodity has also led to the demand for something else—technology tools that enable businesses to properly manage and optimize that data.
One such tool is the the Google Cloud Platform (GCP), offering high-level services and advanced technologies like machine learning and artificial intelligence to derive value from the huge amounts of data collected today. To teach the market more about its platform, Google is hosting Google Cloud OnBoard, a series of free, full-day training events held throughout the world (including London, San Francisco, Mumbai, and other large cities) that cover various aspects of GCP.
Last week’s session was focused on data analysis in the cloud, data processing architecture, and machine learning capabilities. The Midtown Hilton Ballroom was filled with approximately 300 developers, data scientists, business analysts, and others interested in big data and ML, who immersed themselves in engaging presentations and demos of Google’s tools, such as Dataproc, BigQuery, TensorFlow, and Cloud Pub/Sub.
Han Wang, a Google Cloud customer engineer, kicked off the day with a broad overview of machine learning. She discussed the various use cases for ML across all industries, from quality assurance to search prediction. Given the rapid rate of development, ML adoption rates are predicted to increase dramatically in the coming years. For example, according to data sourced from PwC, the manufacturing sector’s adoption of ML and advanced analytics—used to improve predictive maintenance—will jump from 28% to 66% in the next 5 years.
According to Wang, this basically means that machine learning is no longer a heterodox concept—it’s becoming mainstream. It is the next transformation in technology, and is completely upending the programming paradigm.
However, Wang asserted, ML is not something that be leveraged quickly with an easy, plug-in tool. Successful implementation requires sophisticated data management, beginning with data collection and analysis. The GCP platform enables such management through Google’s vast network, resources, and access to data.
To give the audience more insight into how GCP accomplishes this, Doug Rehnstrom, Director of Learning Solutions at ROI Training, dove into GCP’s products with in-depth explanations of the platform’s capabilities. The sessions were bookended with product demos and Q&A, all supported by Rehnstrom’s compelling energy and cheery banter.
Rehnstrom first covered the fundamentals of GCP and the basics of cloud storage, including creating buckets and uploading data. When it comes to managing resources on the cloud, Rehnstrom demonstrated how Google Cloud Shell makes storage and data analysis seamless, as it is pre-installed with tools and libraries that naturally interact with GCP. We saw how GCP’s users do not need to allocate space for data, thus only requiring them to pay for what is used.
In order to properly set up architectures for ML capabilities and data analysis on the cloud, users must use databases and manipulate large data sets timely and efficiently. Rehnstrom showed how GCP enables this with its fully-managed database service, Cloud SQL, which allows users to create relational databases.
Throughout the day, Rehnstrom also acknowledged how Google was accommodated unused space with sliding cost scales. For example, Cloud SQL automatically increases space on drives as users approach capacity. We learned that GCP’s Dataproc reduces the complexity of big data Apache Spark and Apache Hadoop clusters, ultimately driving down processing costs.
A session called “Scaling Data Analysis” emphasized the need to organize huge data sets for scalable operations. GCP’s BigQuery is a data warehouse where users are able to add unlimited tables to each create data set, with unlimited fields. Rehnstrom taught how BigQuery allows for ad-hoc SQL queries on massive volumes of data in order to gain valuable insights.
Once users can manage their tremendous data sets, we saw how the users can move toward more machine learning capabilities. For developers who don’t want to start from scratch, or can’t afford to, we learned that Google’s DataLab offers trained models and other interactive tools created to explore, analyze, transform, and build ML models.
For those who want to build their own models, Rehnstrom and other Google engineers introduced an especially valuable resource called TensorFlow, which is an open source library that underlies many Google products to apply deep learning. The library allows for collaboration and communication between researchers who are looking to further their ML performance.
The final session covered data processing architecture, including reference architecture for real-time and batch data processing. Rehnstrom asserted that having architecture in the cloud to manage various applications is useful for successful data processing and analysis. Google’s Cloud Pub/Sub is a real-time messaging service that allows users to send and receive messages between independent applications. It automatically ingests event streams and delivers them to Cloud Dataflow for processing and BigQuery for analysis. At the conclusion of the event, we observed demos of GCP’s ready-to-use APIs, which use already-trained models for detecting image characteristics, translating text, etc.
After getting an in-depth look at GCP’s capabilities, it’s clear that the platform lessens the need for a deep understanding of the mathematics and computation behind machine learning in order to complete advanced ML and AI operations. Like other cloud services, GCP removes the task of managing complex infrastructure placed in a data center and the accompanying on-premise operational challenges. However, it is worthwhile to note that Google’s cloud network is the largest in the world and is supported by its own fiber optic cables. GCP offers most of its resources either globally or regionally, unlike other services that are limited to zones or regions. This gives users flexibility when setting up their platforms, all while maintaining security and performance.
In combination with Google’s vast resources and access to data, the cloud services available on GCP allow users to potentially create pipelines that scale infinitely. GCP matches their competitors as its many services can be easily stitched together, opening up flexible capabilities, depending on what users want to do with their data. Google Cloud is striving for a data-driven environment that is hassle-free and cost-effective, so developers can simply build their applications and innovate with ease.
Google continues to impress with their data processing capabilities and automated cloud services through GCP. The company earns its place as an industry titan, and Levvel looks forward to its continued development in innovation.
Research Content Specialist
Jamie Kim is a Research Content Specialist for Levvel Research based in New York City. She develops and writes research-based content, including data-driven reports, whitepapers, and case studies, as well as market insights within various digital transformation spaces. Jamie’s research focus is on business automation processes, including Procure-to-Pay, as well as DevOps, design practices, and cloud platforms. In addition to her research skills and content creation, Jamie has expertise in design and front-end development. She came to Levvel with a research and technical writing background at an IT consulting company focused on upcoming AI and machine learning technologies, as well as academic book editorial experience at Oxford University Press working on its music list.
API design is crucial, giving structure to application interaction. Given cross-functional teams and applications, development time is reduced with a clear, intuitive way to access data. API development often follows two approaches: REST and GraphQL.
As of June 2018, the state of California passed a new privacy law that could lead to more consequences for US-based companies than the European Union’s General Data Protection Regulation (GDPR). Here's what you need to know and how to be compliant.
Before your data scientists wring value out of your reams of data, it has to be accessible and, on some basic level, coherently arranged. To harness all that brainpower, you need to keep the data wrangling to a minimum. Enter the data lake.
Legacy applications get no respect. The developers who wrote them have aged out and no new developers want to work on career-killing software stacks. But they are still faithfully doing the job they were created to do long ago. So what's the problem?