Executive Summary
Figure 1. Internal and External Spending on Web Data Extraction
Source: Opimas AnalysisThe Internet is the ultimate dataset. A rapidly increasing amount of data on the web, ever-more connected devices, and growing social media usage are creating more sources from which valuable insights can be gleaned. Web data comprises a valuable portion of alternative datasets revolutionizing the decision-making process for corporations.
Extracting these data from the Internet is a complex endeavor, with the information required to perform analyses spread across multiple sites, in different, often unstructured, formats.
The range of use cases for web data extraction is rapidly increasing, and with it the necessary investment. While spending on this area amounted to about US$2.5bn in 2017, we expect that by 2020 the market will reach almost US$7bn. The bulk of this spending is currently weighted towards internal, home-grown systems. However, with the complexity of creating and maintaining web page extractors, spending is increasingly shifting to specialized external technology and service providers. External spending is set to increase from about US$400mn in 2017 to over US$2bn in 2020 (see Figure 1).
While a number of technical (anti-webscraping defenses) and legal impediments exist, these appear to be mostly manageable.
Over the coming years, making sense of big data and gleaning value will be a priority. We will see a rapid growth in spending and value extracted from the web using external tools, especially in investment decision-making, e-commerce, and manufacturing.