Web Pages and their Complexity

Blogged under General by Administrator on Wednesday 30 January 2008 at 12:27 pm Edit This

HTML’s forgiving nature is responsible for dramatic increases in the expressive power of web pages and in the overall usability of the Internet at large. As the saying goes, though, there’s no such thing as a free lunch. The flexibility of the HTML standard encourages a loose relationship between the HTML tags structure (DOM) of a web page and its visualization (rendering) on the screen. This creates challenges for search engines to find relevant matches between user queries and web pages.

Today’s web pages can be described as a hodgepodge of many (and quite often unrelated) elements – articles, comments, posts, ads, banners, tables of contents, hyperlinks, etc. Printed newspapers (which still constitute a serious challenge for OCR systems) are dwarfed in complexity by the complexity of web pages. Of course, HTML DOM is of some help, but its hierarchical structure is no match to lateral links between web page elements with their dynamic rendering.

Let’s consider the following page: http://richlabonte.net/exonews/xtra/whales_dying.htm

web page example

This page is one of the top-10 Google Search results for the query whales warning of tsunami. The page consists of 9 separate articles discussing different subjects, including an article about tsunami warning systems (#3) and an article on preservation of whales (#1), but not what was asked – the rather unusual topic of whales warning of tsunamis. As can be seen from this example, the relevancy of a page to a query is not necessarily well defined.

Various means have been employed to deal with web page complexity. The most widely accepted one is to use the distance between keywords occurrences as a measure of relevancy – this is often referred to as proximity measure. This approach works extremely well in the case of a classic Information Retrieval problem of matching keywords to texts. However, on a web page, close proximity of two words does not necessarily mean that they are related. They can be very close on a page while belonging to completely different contexts — for instance, think of a newspaper layout where two unrelated topics appear on different sides of the line separating two columns. The problem is exacerbated with dynamic rendering of web pages, where ads and links to outside pages are inserted inside an article.

All this leads to reinterpretation of the concept of a relevant page. It is no longer a relevant page but rather a relevant context we are after. As soon as we can split a page into contexts, the methods that are applicable to pure texts such as NLP, semantic analysis, proximity measures, etc. become much more reliable.

Glendor Search is built on this principle. We developed a patent-pending technology to analyze arbitrary web pages and divide them into independent contexts. On average, the analysis and indexing of a page requires less than one second. This translates into the ability to crawl and index the entire surface web (~20 billion pages) once a quarter using 1,000 standard PCs.

best iphone 8 deals uk white iphone 4 trailer upcoming ipod touch apps 2011 used iphone 3g 6gb bangalore iphone unlocked for sale ebay apple iphone five rumours apple 4g iphone price in india 2020 compare iphone plans adelaide best new iphone 3 case 3.1 2 mac jailbreak iphone 2g 3g next iphone 2011 january new iphone updates asx iphone 5 price in dubai etisalat expensive iphone apps directory iphone 3gs 80gb black features how to jailbreak iphone 3gs 4.1 firmware limera1n ipad seminole iphone plans red iphone 4 o2 unlock iphone 3g 5 without wifi good iphone games for kids order iphone without contract in uk can i buy iphone 8 without contract in usa verizon prada service plans amazon iphone online cheap covers and skins apple iphone 3g 3gs iphone 5 rumors 2013 iphone 3g 4.01 jailbreak and unlock pwnage the best next iphone apps 3g unlock sim 3.1 3 iphone 2g tesco iphone pay as you go simulator how much for a new iphone screen replace fifa 11 apple review ign iphone 3gs 12gb price in singapore iphone report ipod touch new iphone gossip 09 jailbreak iphone 3g 5.0 with redsnow iphone deals ny iphone app store uk 2010 the new iphone 5 can do what apple iphone 4 guide cnet jailbreak iphone 3g 4.0 1 mac using redsn0w iphone app cell phone price iphone accessories toronto 3gs how to iphone 3g 4.2 jailbreak ios 1 new iphone 5 2012 release date compare best iphone 8 deals uk jailbreak iphone 3g 4.2 1 limera1n ipod touch 4g how to unjailbreak iphone 3gs 4.0 2 limera1n att wireless installer apps itunes at&t photo gallery sample code 3gs review uk 2010 apple iphone top must have iphone apps 2014 itouch photo gallery black unlock iphone 3gs o3 ipod touch release date australia att wireless price in india new iphone update 4.1 features itunes verizon voyager january 2010 apple cuts at&t ahead holidays unlock iphone 3g 4.2 free jailbreak fruit iphone 4 review phone arena amazon iphone unlocked from apple apple iphone price 4gb usa iphone 3.0 may 2010 apple iphone 3gs 4gb review when is the new iphone 13 white coming out what's the new iphone update 4.1 device orange at&t forum iphone verizon april how to unlock iphone 3g 4.5 dev team globe philippines price ipod buy iphone 3gs 16gb outright apple 3g iphone price in california iphone jailbreak version 3gs 8gb review uk apple iphone 3gs apple store india airtel new iphone 4 release date 210 iphone apps for youtube free jailbreak new iphone 4 with 5.1 best new iphone four case iphone 4g insurance cover when is the new iphone for cingular coming out apple iphone 3gs 8gb on co2 apple iphone 13 rumors buy iphone 3gs london how to unlock iphone 3g 3.1 3 on imac virgin mobile iphone kits canada iphone map tests iphone 4g white release date uk december unlocked iphone tmobile gprs settings the new iphone 4 cases unveils apple jailbreak software tomtom blackberry review video apple iphone 3 rumors buy iphone computer us apple installer apps downloads buy iphone 3gs 128gb usa free iphone review sites jailbreak iphone 4.0 2 new bootrom 3gs users omfg could this be the iphone 2009 unlock iphone 3g co2 free iphone 3g t mobile youtube cheap iphone without contract in usa iphone 4 cases leather wallet apple iphone five release date apple new iphone at&t release date buy iphone 3 review iphone 3g photo album buy phone without contract in singapore iphone repair plans canada iphone apps free download youtube videos otterbox phone 4 cases review iphone sprint how to jailbreak iphone 3gs 4.2 firmware free iphone 3gs 16gb white carphone warehouse best iphone apps 2010 gizmodo talking vodafone phone plans nz free iphone 3g screen iphone five apple insider iphone 5 price in california iphone apple battery price apple iphone 16 features iphone 3gs 16gb white for sale like new iphone 4 cases walmart iphone 4 apps free uk our pick purchase iphone new iphone prices in china best sim free next iphone deals iphone apps set up cydia unlock iphone 3gs 4.2 1 cydia untethered buy iphone 3 white online apple iphone 3gs 6gb black pay as you go jailbreak iphone 3gs 4.4 new bootrom new iphone update 2.42 for 3g iphone 1 cases amazon uk unlocked iphone deals uk iphone 3g jailbreak 4.0 unlock ultrasn0w blackberry bold specs camera verizon voyager price plan 4 pay as you go 3 iphone 3gs find iphone canada without data plan iphone 3g unlock 5.1 firmware quickpwn 4.1 verizon phones 5 news where to buy iphone 3 in dubai buy iphone apple telephone iphone 3g jailbreak 3.1 3 blackra1n firmware how to jailbreak iphone 3gs 4.0 1 windows redsn0w blackberry photo recovery software iphone 3g vs 3gs speedometer difference apple iphone 3g 8gb vodafone jailbreak new iphone 3 4.1 buy white unlocked iphone 4 diamond edition jailbreak iphone 3g ios 4.2 greenpois0n att wireless 8gb review iphone 3gs vs iphone 2009 camera iphone 3.0 fm transmitter car charger 2010 iphone 3g no contract best iphone deals in france iphone 3gs orange unlocking apple store verizon 2011 ed hardy iphone 3gs covers faceplate new iphone download not working apple iphone 6 rumours iphone 3gs phone covers verizon wireless rate plans unlock and jailbreak iphone 3g 4.01 tesco iphone 4 price plans unlock iphone cingular reset iphone 3gs 4.0 2 the new iphone 4 brand new from apple at&t screen protector free unlock iphone 3g version 4.3 top itouch review sites designer iphone 3g cases 210 play itouch apps on pc buy iphone 4 16gb pay as you go apple orange uk iphone accessories new york jailbreak iphone 4.0 jailbreak iphone 3g free mac ios jailbreak iphone 3gs 4.0 2 free firmware iphone apps best of 2011 new iphone 5 release date adelaide jailbreak iphone 3g 3.1 3 on imac how to unlock att o2 how to activate iphone 3g 4.0 2 what does the new iphone update 4.2 1 do jailbreak apple iphone 3gs 16gb black prices oxygen iphone 3g unlock code iphone at&t rumors back at&t wallpaper maker jailbreak iphone 3g free mac pwnage att wireless jailbreak software the new iphone 4.2 it's good cool iphone apps iphone 3gs 16gb ebay co uk o2 iphone buy india airtel iphone 3g 4.1 unlock drivers iphone 4 white case uk covers phone photo gallery web app new iphone crack best iphone deals oxygen o2 phone deals top iphone apps 2010 australia may apple ipod store us reset iphone top iphone 5 cases reviews free iphone deals england iphone 3g third party free jailbreak iphone 3g 5.0 with limera1n best buy 4 no contract apple iphone blackberry unlocked ebay buy iphone 2 apple insider iphone 3g 4.1 hack itouch 4 cases and covers iphone 3g jailbreak 4.0 sn0wbreeze new iphone iv update 4.0 1 iphone 3 tesco contract buy ebay 8gb iphone 3gs bags designer how to jailbreak iphone 3g download with blackra1n best iphone 3g cases 2012 luxury iphone 4 cases iphone 3g pay as you go australia iphone five rumours release date buy iphone 3g without contract canada best iphone market singapore how to unlock iphone 3g 4 for free best apple iphone price iphone 4g fm transmitter car charger 210 buy refurbished iphone 4 uk phones iphone 5g may 2010 buy iphone 3gs without contract united jailbrake iphone 3gs iphone 8 apps waiting untethered jailbreak iphone 3gs download apple iphone verizon release date 2010 according iplus iphone hack apple iphone 8 white conversion kit official unlock iphone 3g carphone warehouse yellow iphone plans uk itouch 5 news release cheapest place to buy next iphone yahoo iphone 8 tesco offer tesco iphone contract 6 months official iphone 7 news coolest 3g iphone 4 apps 3gs cases best price apple iphone virgin mobile itouch plans tomtom for iphone 3gs comparison what are the best free iphone apps 2009 bargain iphone 3gs cases iphone 6 2011 uk greatest iphone deals compare iphone nano uk best iphone 2009 deals in the uk iphone apps clothing australia most useful iphone accessories ten jailbreak iphone 3g ios 4.5 for windows best iphone apps 2010 kentucky best iphone apps 2011 gizmodo iphone 3g jailbreak 4.0 1 mac 2g iphone 5 2013 rumors at t apple store cheapest iphone 4g insurance the best iphone 4 downloads jailbreak iphone 3gs 4.2 1 macintosh iphone review cnet asia iphone 4 apps uk our pick buy iphone 2005 new zealand how to crack iphone apps downloads apple iphone 3 white release date how much for a new iphone screen repair buy iphone 4 without salary singapore new iphone apps aim iphone 3g jailbreak 4 redsn0w apple iphone review circuit city free unlock iphone 3g software download jailbreak iphone tmobile handset oxygen iphone deals mobile iphone v rumors release date can i buy iphone 3 without contract in us unlock iphone 3g 4 free airtel iphone price in india 2009 iphone 5g release date in france best iphone 3gs cases 2008 how to unlock iphone 3g 4.0 1 firmware ultrasn0w the new iphone 4.0 new iphone multi screen problems iphone installer download