What is the deep web

Deep web hidden web
"Deep web" is distinct from "dark web".(invisible web or hidden web) The "dark web" is the encrypted network that exists between Tor servers and their clients, whereas the "deep web" is simply the content of databases and other web services that for one reason or another cannot be indexed by conventional search engines.
The deep web includes many very common uses such as web mail, online banking but also paid for services with a paywall such as video on demand, and many more.
Computer scientist Mike Bergman is credited with coining the term deep web in 2000 as a search indexing term
The first conflation of the terms "deep web" and "dark web" came about in 2009 when the deep web search terminology was discussed alongside illegal activities taking place on the Freenet darknet.
Since then, the use in the Silk Road's media reporting, many people and media outlets, have taken to using Deep Web synonymously with the dark web or darknet, a comparison Bright Planet rejects as inaccurate and consequently is an ongoing source of confusion. Wired reporters Kim Zetter and Andy Greenberg recommend the terms be used in distinct fashions.

size
It is impossible to measure, and harsh to put estimates on, the size of the deep web because the majority of the information is hidden or locked inside databases. Early estimates suggested that the deep web is 400 to 550 times larger than the surface web. However, since more information and sites are always being added, it can be assumed that the deep web is growing exponentially at a rate that cannot be quantified.
Estimates based on extrapolations from a study done at University of California, Berkeley in 2001 speculate that the deep web consists of about 7.5 petabytes. More accurate estimates are available for the number of resources in the deep web: research of He et al. detected around 300,000 deep web sites in the entire web in 2004,  and, according to Shestakov, around 14,000 deep web sites existed in the Russian part of the Web in 2006. 
Content types 
  1. Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).
  2. Dynamic content: dynamic pages which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge.
  3. Limited access content: sites that limit access to their pages in a technical way (e.g., using the Robots Exclusion Standard or CAPTCHAs, or no-store directive which prohibit search engines from browsing them and creating cached copies)
  4. Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.
  5. Private Web: sites that require registration and login (password-protected resources).
  6. Scripted content: pages that are only accessible through links produced by JavaScript as well as content dynamically downloaded from Web servers via Flash or Ajaxsolutions.
  7. Software: certain content is intentionally hidden from the regular Internet, accessible only with special software, such as Tor, I2P, or other darknet software. For example, Tor allows users to access websites using the .onion host suffix anonymously, hiding their IP address.
  8. Unlinked content: pages which are not linked to by other pages, which may prevent web crawling programs from accessing the content. This content is referred to as pages without backlinks (also known as inlinks). Also, search engines do not always detect all backlinks from searched web pages.
  9. Web archives: Web archival services such as the Wayback Machine enable users to see archived versions of web pages across time, including websites which have become inaccessible, and are not indexed by search engines such as Google

Share this

Related Posts

Previous
Next Post »