Big Data

Big Data

December 27, 2018

Introduction to the topic

Big Data alludes to various forms of massive information sets that feel the necessity for special computational platforms to be refined and scrutinized once accumulated. Basically, there are two main concepts when we discuss big data – “large datasets” and “computing strategies and technologies used to handle those”. One thing that needs to keep in mind is that when we associate “large dataset”, it must be reasonably large which cannot be processed or stored on a single computer.

Big data is not a fantasy. We are just at the beginning of a revolution that will touch every business, machine and every life on this planet. Many of us are still treating its concept something which can be ignored but, we’re about to be run over by the steamroller that is big data.

Here are some of the examples for that should convince anyone that big data needs their attention

  • The data volumes are thundering, more data has been created in the last two years than in the entire history of humans.
  • It is estimated that Google alone contributed 54 billion dollars to the US economy in 2009.
  • Data is growing rapidly than ever before by the year 2020, about 1.7 MB of new information will be created every second for every person on the planet. Our accumulated digital universe of data will grow from 4.4 ZB today to around 44 ZB, or 44 trillion GB.
  • In Aug 2015, over a billion people used Facebook in a single day.
  • Distributed computing is becoming more real. Google uses about 1,000 computers daily to answer a single search query, which takes no more than 0.2 seconds to complete.
  • Every minute up to 300 hours of video are uploaded to YouTube alone and in 2015, a trillion photos will be taken and billions of them will be shared online
  • It is estimated that there will be 6.1 billion smartphones by 2020 and 50 billion smart connected devices in the world that will be able to collect, analyze and share all kinds of data.

Brief history and development

The story of big data didn’t start over just now it was there many years before the current buzz. The term “Information Explosion” was used by the Oxford English Dictionary in 1941. The following are the major milestones in the history of big data.   


Year Event
1941 Oxford English Dictionary introduces a term called “Information Explosion”.
1961 Derek Price Publishes “Science Since Babylon”, in which he charts the growth of scientific knowledge by looking at the growth in the number of scientific journals and papers.
1967 B. A. Marron and P. A. D. deMaine publish “Automatic data compression” in the Communications of the ACM, stating that ‘information explosion’ noted in recent years makes it essential that storage requirements for all information to be kept to a minimum.
1983 Ithiel de Sola Pool publishes “Tracking the Flow of Information” in Science. Looking at growth trends in 17 major communications media from 1960 to 1977, he concludes that “words made available to Americans.
1986 Hal B. Becker publishes “Can users really absorb data at today’s rates? Tomorrow’s?” in Data Communications. Becker estimates that “the recording density achieved by Gutenberg was approximately 500 symbols (characters) per cubic inch—500 times the density of [4,000 B.C. Sumerian] clay tablets”
1990 Peter J. Denning publishes “Saving All the Bits” which describes same machines can also pore through existing databases looking for patterns and forming class descriptions for the bits that we’ve already saved.
1998 John R. Masey, Chief Scientist at SGI, presents at a USENIX meeting a paper titled “Big Data… and the Next Wave of Infrastress.”
2000 Peter Lyman and Hal R. Varian at UC Berkeley publish “How Much Information?” which is the first comprehensive study to quantify, in computer storage terms.
2005 Tim O’Reilly publishes “What is Web 2.0” in which he asserts that “data is the next Intel inside.” and states SQL is the new HTML
2010 Kenneth Cukier publishes in The Economist a Special Report titled, “Data, data everywhere.” which explained the effect of big data being felt everywhere.
2011 Martin Hilbert and Priscila Lopez publish “The World’s Technological Capacity to Store, Communicate, and Compute Information” which estimated the world’s information storage capacity.
2012 The International Journal of Communications publishes a Special Section titled “Info Capacity” on the methodologies and findings of various studies measuring the volume of information.

Business Impact (as supported by literature)

In the global marketplace, businesses, suppliers and customers are creating and consuming vast amounts of information. According to IDC, the world’s volume of data doubles every 18 months and this flood of data is often referred to as “information overload”, “data deluge” and “big data”. Handling of such a huge volume of data has always been a challenge yet it has been found that business executives have an insatiable desire for more data. The majority of industries take data as one of the major assets which will help them determine a business strategy to lead the market. These data help them for improved business forecasts to reduce uncertainty in decision-making and improved competitive positioning. Across all industries, including government, healthcare, media, social media, energy, etc, data is becoming central to business operations.

Executives recognize the opportunity to grow their top line by harnessing customer information and this focus is driving technology investments in CRM systems. With growing data, there is a growing requirement for data security as well. 67% of executives have invested or are considering investments in CRM and 78% of companies are investing in security solutions for data security.

Despite the increasing volume of data, pressure to keep up with customer expectations and focus on technology investments, today’s companies are still struggling
to see big data as a driver of real business value.
So organizations need to leverage their data to create
new revenue streams and generate new businesses.  For this companies need to focus on the following stages in the data lifecycle.


  1. Identify: Understand where the data is coming from, who is creating it and where the content lives.
  2. Filter: Determine the relevance of information whether useful or not and provide tools and data management policy to filter information, categorize the data and establish processes.
  3. Distribute: Utilize a distribution automated and intelligent mechanism to segregate data for different levels, locations, and business units.
  4. Apply: Apply the right data in the right case and look for the opportunities to monetize data in new ways and create competitive differentiation.

Data Scientists, now armed with superior computing power and vast amounts of omnichannel data, can deliver business insights at lightning speed.  However, in many organizations, Data Analytics is still pursued on an ad-hoc basis and well-defined structure is still lacking. So, it is now the high time that most enterprises now need to think in terms of data inclusion, bringing Data science with the use of Big data in the daily business workflow. As data-enabled decision-making process promises clarity, transparency, and reduces risk and ambiguity, the impact of Big Data in business is growing more and more. So, the impact of Big Data can basically be described in three ways:

It has revolutionized old-school industries:
Use of big data has revolutionized the existing traditional industries in the way they are working. Edwin Miller, CEO of 9Lenses says “Big Data has had a tremendous impact on businesses from customer relations to supply chain operations and will continue to do so.”  

Eg: Walmart is the example of one such company which is a well-known user of Big Data analytics today, but in the 1990s, it reformed the retail industry by recording every product as data through a system called Retail Link. The system provided a way for suppliers to manage their own products by allowing them to monitor their data, including sales and inventory volume, in-stock percentage, gross margin, and inventory turnover. As a result, they could achieve low levels of inventory risk and associated costs. Walmart’s significantly low costs and high levels of efficiency were major factors that drove productivity of the merchandise retail sector over the period of 1995 – 2000 according to a 2001 McKinsey Global Productivity Report.

It has given birth to a new industry:
Historically the use of data in the business/industry was very specific and limited. For Eg: Retailers recorded sales for accounting, manufacturers recorded data of raw materials for quality management etc. But as the demand for Big Data analytics emerged, data no longer serves only its initial purpose. Companies with access to the huge volume of data with the ability to analyze and present it has created a new industry.


Eg: IBM and Twitter have partnered up for selling analytical information to corporate clients. IBM analyses Twitter data combined with other public and business sources, “helping businesses tap into billions of real-time conversations to make smarter decisions “according to Glenn Finch, Global Leader Data & Analytics, GBS the partnership has helped the two companies leverage their respective areas of expertise; IBM with their analytical skills and Twitter for their data.


It improves business regardless of company size
It is obvious that big companies in terms of data generation like Amazon (270 million active users), Google (12 trillion approx. monthly searches) have advantages over smaller ones. But that is not the end of the story, as Big Data helps smaller companies as well for their growth. Companies with limited IT budget can still effectively store data and if that is not possible or if there is not enough data available in-house, they can cheaply lease data from third-party “data intermediaries”. Companies can also hire outside data analytics firms at affordable rates. This is mostly seen in US health care industries where insurance payers are hiring data analytics company like Verscend Technologies to process and analyze their data for Insurance claims management. Verscend Technologies has been doing data analysis for big companies like Aetna, Lockton, Cigna as well as small companies like CBIZ, Consumer Medical etc.

Thus, Big data applications are bound only by the human imagination and creativity. Various businesses like car manufacturing can improve their operational efficiency, hospitals can improve patient services and fast food companies can better manage their food deliveries. Companies that utilize Big Data and create4 a competitive advantage are highly valued by investors as they not only value the growth in terms of profit but also the intangible assets such as data volume and analytical skills.

Success story and failure story of the companies using Big Data


Failure Story:, Inc. is an American Internet-based electronic commerce and cloud computing tech giant retailer in the world by total sales and market capitalization. In 2013, Australian third-party retailer known as Solid Gold Bomb partnered with Amazon to print and sell t-shirts with slogans which were automatically generated using a scripted computer process running against 100s of thousands of dictionary words. They planned to have humorous slogans by modifying the “Keep Calm and Carry on” slogan from World War II. Unfortunately, these t-shirts had phrases like “keep calm and knife her”, “keep calm and grope a lot,” “keep calm and rape her” etc. This was the result when we do not pay attention to our codes and the results, which went awfully wrong the big data wise!

Obviously, Amazon was forced to take those down and Solid Gold Bomb, the seller company, put the blame on poor programming and analytics. The results were downright offensive, with the bigger blunder being that nobody checked the results.

Yes, analytics is bound to give answers but, in this case, obviously not the right ones.


Success Story:

Uber Big Data Startup Success Stories

Uber Technologies Inc. is an American technology company that generally uses their own cars develops, markets and operates the Uber car transportation and food delivery mobile apps. You are not only just a passenger or a fare to Uber but a big data goldmine that Uber powers for analytics. Uber knows everything from where you work, where you eat, where you live, where you travel and when you do all these things. Uber a start-up in analytics is using the treasures of information placid to extract personalized services to produce massive ROI by vending this data to its customer base.

Uber is transforming new ways in the direction of money making by retailing the transactional data it has accumulated grounded on its rides. Uber has freshly partnered with Starwood Hotels and Resorts. It has flung a service that allows users to connect with Starwood favored Guest account. The profit customers have is that they get Starwood reward points when they take a ride with Uber. Customers give Uber the complete rights to share all the info about their ride with Starwood when they sign up.

Starwood now will have access to all the Uber rider’s info which can be used for analytics. A Starwood marketing personnel can instantly ground in if he/she notices that you choose to stay at any other Starwood property because Uber knows this. You will be flooded with several offers from Starwood ensuring that your holiday stay is with one of the Starwood property helping them raise revenue over other opponents.

Uber uses regression analysis to find out the size of the neighborhood which in turn helps them to find out the busiest neighborhoods on Friday nights so that they can add additional price to their customers’ bill. Uber even take ratings from its drivers and riders and leverages this data to analyze it for customer satisfaction and loyalty.

In mere Future, they even consider partnering with supreme luxury brands, retailers, restaurants to collect data about the shopping malls you visit, the clubs you visit, the places you dine in. It plans to reveal this information to its customers so that they make use of it for beleaguered marketing.


The possibility of Big Data in Verscend Technologies


Verscend technologies Pvt. Ltd. is a data analytics company which primarily works on US Healthcare Insurance data. As a company, we receive a huge volume of medical claims, pharmacy claims data along with the data of the eligible members along with their demographics. Data are received in structured text files, whereas some in unstructured data, some are in HIPAA standard EDI data (Electronic Data Interchange) which needs to be parsed and then read into the database. Once these data are received then these are loaded into the database systems and then transformed into a standard format with proper business logic and transformations. After this, the data are further processed and normalized to make it ready to use from the front-end reporting application which provides the clients with various analytical features, fraud claims detection modules, predictive analytical reports, normalized industry standard values, etc.

US healthcare industry is one of the ever growing and very important industries in Uthe S economy. With the volume of data increasing day by day and need to expand the business to become mthe arket leader, the current technologies being used soon will be insufficient to handle these data. One of the setbacks that Verscend Technologies is recently facing is the inability to swiftly process the data and deliver the reports to the clients. Normally it takes 60 days to implement new client and provide them the complete reports they desire whereas , n cthe ase of existing clients, it is taking approximately 30 days. Since the time gap in both scenarios is large, more and more clients are looking for the options elsewhere in the market where they can get the reports in lesser time. This is where Verscend needs to seriously think about being scalable as well as taking lead in the market with the use of Big Data technology with the use of the Hadoop framework and advanced analytics using R-programming.

Opportunities with Big Data for Verscend Technologies

  1. Predictive Analytics
  2. Population Health Analytics
  3. Provider/Physician Management
  4. Care Management/ Disease Management/ Risk Management
  5. Reduction in pharmacy costs


Recommendation and conclusion

1. You’ll Manage Data Better

Many of today’s data processing platforms allows
data scientists to analyze, collect and shift through various types of data.
While it does take some technical ideas to describe how the data is collected
and stored, many of today’s big data and business intelligence tools let users
sit to in the driver’s seat and work with data without going through too many
complicated technical steps.


2. You’ll Benefit from Speed, Capacity, and Scalability of Cloud Storage

Organizations that want to use considerably
large data sets should consider third-party cloud service providers, which can provide both the storage and the computing power necessary crunch data for a precise time. Cloud storage helps companies to analyze massive data sets without making a significant capital investment in hardware by enabling to host the data internally.


3. Your End Users Can Visualize Data

Big data initiative is going to necessitate
next level data visualization tools, which present BI data in easy to read charts, graphs, and slideshows. Due to the vast amounts of data being inspected, these applications must be able to offer meting out engines that let end users query and manipulate information quickly even in the real time in some cases. Applications will also need connectors that can connect to external sources for additional data sets.


4. Your Company Can Find New Business Opportunities

More users are realizing the competitive
the advantage of being a data-driven enterprise. Campaign managers in both the
Democratic and Republican parties of US saw a need for information on voters
and their specific interests; taking this info and addressing an issue through
a modified email or flyer meant the potential to gain or power a vote.


5. Your Data Analysis Methods, Capabilities Will Evolve

Data is no longer simply numbers in a
database. Text, audio and video files can also provide valuable insight; the
right tools can even recognize specific patterns based on predefined standards.
Much of this happens using natural language processing tools, which can demonstrate
vital to text mining, sentiment analysis, clinical language, and name entity
recognition efforts.