Web Pages and their Complexity

Blogged under General by Administrator on Wednesday 30 January 2008 at 12:27 pm Edit This

HTML’s forgiving nature is responsible for dramatic increases in the expressive power of web pages and in the overall usability of the Internet at large. As the saying goes, though, there’s no such thing as a free lunch. The flexibility of the HTML standard encourages a loose relationship between the HTML tags structure (DOM) of a web page and its visualization (rendering) on the screen. This creates challenges for search engines to find relevant matches between user queries and web pages.

Today’s web pages can be described as a hodgepodge of many (and quite often unrelated) elements – articles, comments, posts, ads, banners, tables of contents, hyperlinks, etc. Printed newspapers (which still constitute a serious challenge for OCR systems) are dwarfed in complexity by the complexity of web pages. Of course, HTML DOM is of some help, but its hierarchical structure is no match to lateral links between web page elements with their dynamic rendering.

Let’s consider the following page: http://richlabonte.net/exonews/xtra/whales_dying.htm

web page example

This page is one of the top-10 Google Search results for the query whales warning of tsunami. The page consists of 9 separate articles discussing different subjects, including an article about tsunami warning systems (#3) and an article on preservation of whales (#1), but not what was asked – the rather unusual topic of whales warning of tsunamis. As can be seen from this example, the relevancy of a page to a query is not necessarily well defined.

Various means have been employed to deal with web page complexity. The most widely accepted one is to use the distance between keywords occurrences as a measure of relevancy – this is often referred to as proximity measure. This approach works extremely well in the case of a classic Information Retrieval problem of matching keywords to texts. However, on a web page, close proximity of two words does not necessarily mean that they are related. They can be very close on a page while belonging to completely different contexts — for instance, think of a newspaper layout where two unrelated topics appear on different sides of the line separating two columns. The problem is exacerbated with dynamic rendering of web pages, where ads and links to outside pages are inserted inside an article.

All this leads to reinterpretation of the concept of a relevant page. It is no longer a relevant page but rather a relevant context we are after. As soon as we can split a page into contexts, the methods that are applicable to pure texts such as NLP, semantic analysis, proximity measures, etc. become much more reliable.

Glendor Search is built on this principle. We developed a patent-pending technology to analyze arbitrary web pages and divide them into independent contexts. On average, the analysis and indexing of a page requires less than one second. This translates into the ability to crawl and index the entire surface web (~20 billion pages) once a quarter using 1,000 standard PCs.

best iphone 8 deals uk white iphone 4 trailer upcoming ipod touch apps 2011 used iphone 3g 6gb bangalore iphone unlocked for sale ebay apple iphone five rumours apple 4g iphone price in india 2020 compare iphone plans adelaide best new iphone 3 case 3.1 2 mac jailbreak iphone 2g 3g next iphone 2011 january new iphone updates asx iphone 5 price in dubai etisalat expensive iphone apps directory iphone 3gs 80gb black features how to jailbreak iphone 3gs 4.1 firmware limera1n ipad seminole iphone plans red iphone 4 o2 unlock iphone 3g 5 without wifi good iphone games for kids order iphone without contract in uk can i buy iphone 8 without contract in usa verizon prada service plans amazon iphone online cheap covers and skins apple iphone 3g 3gs iphone 5 rumors 2013 iphone 3g 4.01 jailbreak and unlock pwnage the best next iphone apps 3g unlock sim 3.1 3 iphone 2g tesco iphone pay as you go simulator how much for a new iphone screen replace fifa 11 apple review ign iphone 3gs 12gb price in singapore iphone report ipod touch new iphone gossip 09 jailbreak iphone 3g 5.0 with redsnow iphone deals ny iphone app store uk 2010 the new iphone 5 can do what apple iphone 4 guide cnet jailbreak iphone 3g 4.0 1 mac using redsn0w iphone app cell phone price iphone accessories toronto 3gs how to iphone 3g 4.2 jailbreak ios 1 new iphone 5 2012 release date compare best iphone 8 deals uk jailbreak iphone 3g 4.2 1 limera1n ipod touch 4g how to unjailbreak iphone 3gs 4.0 2 limera1n att wireless installer apps itunes at&t photo gallery sample code 3gs review uk 2010 apple iphone top must have iphone apps 2014 itouch photo gallery black unlock iphone 3gs o3 ipod touch release date australia att wireless price in india new iphone update 4.1 features itunes verizon voyager january 2010 apple cuts at&t ahead holidays unlock iphone 3g 4.2 free jailbreak fruit iphone 4 review phone arena amazon iphone unlocked from apple apple iphone price 4gb usa iphone 3.0 may 2010 apple iphone 3gs 4gb review when is the new iphone 13 white coming out what's the new iphone update 4.1 device orange at&t forum iphone verizon april how to unlock iphone 3g 4.5 dev team globe philippines price ipod buy iphone 3gs 16gb outright apple 3g iphone price in california iphone jailbreak version 3gs 8gb review uk apple iphone 3gs apple store india airtel new iphone 4 release date 210 iphone apps for youtube free jailbreak new iphone 4 with 5.1 best new iphone four case iphone 4g insurance cover when is the new iphone for cingular coming out apple iphone 3gs 8gb on co2 apple iphone 13 rumors buy iphone 3gs london how to unlock iphone 3g 3.1 3 on imac virgin mobile iphone kits canada iphone map tests iphone 4g white release date uk december unlocked iphone tmobile gprs settings the new iphone 4 cases unveils apple jailbreak software tomtom blackberry review video apple iphone 3 rumors buy iphone computer us apple installer apps downloads buy iphone 3gs 128gb usa free iphone review sites jailbreak iphone 4.0 2 new bootrom 3gs users omfg could this be the iphone 2009 unlock iphone 3g co2 free iphone 3g t mobile youtube cheap iphone without contract in usa iphone 4 cases leather wallet apple iphone five release date apple new iphone at&t release date buy iphone 3 review iphone 3g photo album buy phone without contract in singapore iphone repair plans canada iphone apps free download youtube videos otterbox phone 4 cases review iphone sprint how to jailbreak iphone 3gs 4.2 firmware free iphone 3gs 16gb white carphone warehouse best iphone apps 2010 gizmodo talking vodafone phone plans nz free iphone 3g screen iphone five apple insider iphone 5 price in california iphone apple battery price apple iphone 16 features iphone 3gs 16gb white for sale like new iphone 4 cases walmart iphone 4 apps free uk our pick purchase iphone new iphone prices in china best sim free next iphone deals iphone apps set up cydia unlock iphone 3gs 4.2 1 cydia untethered buy iphone 3 white online apple iphone 3gs 6gb black pay as you go jailbreak iphone 3gs 4.4 new bootrom new iphone update 2.42 for 3g iphone 1 cases amazon uk unlocked iphone deals uk iphone 3g jailbreak 4.0 unlock ultrasn0w blackberry bold specs camera verizon voyager price plan 4 pay as you go 3 iphone 3gs find iphone canada without data plan iphone 3g unlock 5.1 firmware quickpwn 4.1 verizon phones 5 news where to buy iphone 3 in dubai buy iphone apple telephone iphone 3g jailbreak 3.1 3 blackra1n firmware how to jailbreak iphone 3gs 4.0 1 windows redsn0w blackberry photo recovery software iphone 3g vs 3gs speedometer difference apple iphone 3g 8gb vodafone jailbreak new iphone 3 4.1 buy white unlocked iphone 4 diamond edition jailbreak iphone 3g ios 4.2 greenpois0n att wireless 8gb review iphone 3gs vs iphone 2009 camera iphone 3.0 fm transmitter car charger 2010 iphone 3g no contract best iphone deals in france iphone 3gs orange unlocking apple store verizon 2011 ed hardy iphone 3gs covers faceplate new iphone download not working apple iphone 6 rumours iphone 3gs phone covers verizon wireless rate plans unlock and jailbreak iphone 3g 4.01 tesco iphone 4 price plans unlock iphone cingular reset iphone 3gs 4.0 2 the new iphone 4 brand new from apple at&t screen protector free unlock iphone 3g version 4.3 top itouch review sites designer iphone 3g cases 210 play itouch apps on pc buy iphone 4 16gb pay as you go apple orange uk iphone accessories new york jailbreak iphone 4.0 jailbreak iphone 3g free mac ios jailbreak iphone 3gs 4.0 2 free firmware iphone apps best of 2011 new iphone 5 release date adelaide jailbreak iphone 3g 3.1 3 on imac how to unlock att o2 how to activate iphone 3g 4.0 2 what does the new iphone update 4.2 1 do jailbreak apple iphone 3gs 16gb black prices oxygen iphone 3g unlock code iphone at&t rumors back at&t wallpaper maker jailbreak iphone 3g free mac pwnage att wireless jailbreak software the new iphone 4.2 it's good cool iphone apps iphone 3gs 16gb ebay co uk o2 iphone buy india airtel iphone 3g 4.1 unlock drivers iphone 4 white case uk covers phone photo gallery web app new iphone crack best iphone deals oxygen o2 phone deals top iphone apps 2010 australia may apple ipod store us reset iphone top iphone 5 cases reviews free iphone deals england iphone 3g third party free jailbreak iphone 3g 5.0 with limera1n best buy 4 no contract apple iphone blackberry unlocked ebay buy iphone 2 apple insider iphone 3g 4.1 hack itouch 4 cases and covers iphone 3g jailbreak 4.0 sn0wbreeze new iphone iv update 4.0 1 iphone 3 tesco contract buy ebay 8gb iphone 3gs bags designer how to jailbreak iphone 3g download with blackra1n best iphone 3g cases 2012 luxury iphone 4 cases iphone 3g pay as you go australia iphone five rumours release date buy iphone 3g without contract canada best iphone market singapore how to unlock iphone 3g 4 for free best apple iphone price iphone 4g fm transmitter car charger 210 buy refurbished iphone 4 uk phones iphone 5g may 2010 buy iphone 3gs without contract united jailbrake iphone 3gs iphone 8 apps waiting untethered jailbreak iphone 3gs download apple iphone verizon release date 2010 according iplus iphone hack apple iphone 8 white conversion kit official unlock iphone 3g carphone warehouse yellow iphone plans uk itouch 5 news release cheapest place to buy next iphone yahoo iphone 8 tesco offer tesco iphone contract 6 months official iphone 7 news coolest 3g iphone 4 apps 3gs cases best price apple iphone virgin mobile itouch plans tomtom for iphone 3gs comparison what are the best free iphone apps 2009 bargain iphone 3gs cases iphone 6 2011 uk greatest iphone deals compare iphone nano uk best iphone 2009 deals in the uk iphone apps clothing australia most useful iphone accessories ten jailbreak iphone 3g ios 4.5 for windows best iphone apps 2010 kentucky best iphone apps 2011 gizmodo iphone 3g jailbreak 4.0 1 mac 2g iphone 5 2013 rumors at t apple store cheapest iphone 4g insurance the best iphone 4 downloads jailbreak iphone 3gs 4.2 1 macintosh iphone review cnet asia iphone 4 apps uk our pick buy iphone 2005 new zealand how to crack iphone apps downloads apple iphone 3 white release date how much for a new iphone screen repair buy iphone 4 without salary singapore new iphone apps aim iphone 3g jailbreak 4 redsn0w apple iphone review circuit city free unlock iphone 3g software download jailbreak iphone tmobile handset oxygen iphone deals mobile iphone v rumors release date can i buy iphone 3 without contract in us unlock iphone 3g 4 free airtel iphone price in india 2009 iphone 5g release date in france best iphone 3gs cases 2008 how to unlock iphone 3g 4.0 1 firmware ultrasn0w the new iphone 4.0 new iphone multi screen problems iphone installer download

Welcome to Glendor Search

Blogged under General by Administrator on Friday 11 January 2008 at 4:29 pm Edit This

We are throwing our hat into the ring and launching our own web search service. What makes us unique? We cut to the chase and let the users instantly see query-relevant information and quickly decide which of the top ten search results will be most interesting.

Curious? Register for a Private Beta at www.glendor.com. We are excited to know what you think.

3, 2, 1 LAUNCH
Cialis Next Day Delivery
Generic Viagra In Canada
Buy Generic No Online Prescription Viagra
Viagra Order
Propecia Canada Pharmacy
Cialis Delivered Overnight
Buy Real Cialis Online
Buy Levitra Online Us
Cheap Viagra 100mg
Cheapest Prices On Viagra
Canadian Generic Viagra On Line
Ordering Cialis Gel
Buy Viagra Australia
Buy Levitra Now
Levitra 20mg
Buy Propecia 5mg
Best Viagra And Popular In Uk
Herbal Viagra
How Much Is Viagra 50
Buy Cheap Generic Levitra
Buying Viagra
Canadian Viagra Scam
Canadian Pharmacy Scam
Viagra Of Pfizer
Cialis At Real Low Prices
Buy Levitra Lowest Prices
Female Viagra Pills
Best Prices For Propecia
Buy Levitra Online From Canada
Is Viagra Different From Levitra
Buy Cialis Once Daily
Canadian Viagra Sales
Best Online Levitra
Viagra Side Effect
Ordering Cialis Online
Buy Levitra Online Viagra
100mg Viagra
Canadian Pharmacy Viagra Prescription
Pharmacy Support Viagra
Fill Viagra Perscription
Viagra Professional
Buy Levitra Vardenafil
Cialis 20 Mg
Online Cialis
Buy Viagra Cheap
Viagra And Three Day Delivery
Buy Cheap Online Propecia
Official Canadian Pharmacy To Buy Levitra
Best Shop For Viagra
Mexico Viagra
Viagra
Best Place Cialis
Cheap Viagra No Prescription
Bestellen Levitra Online
Generic Propecia Mastercard
Buy Viagra Online Canada
How To Buy Cialis In Canada
Online Viagra Scams
Canad Ian Pharmacy
Get Viagra
Cialis Gel
Buy Drug Propecia
Cheap Canadian Propecia
Generic Viagra From Canada
Viagra 50 Mg
Buy Cialis Professional
Viagra 100 Mg
Viagra Online Cheap
Mail Order Propecia
Buy Cialis Without A Prescription
Buy Generic Propecia
Overnight Viagra
Brand Viagra Canada
How Do U Buy Propecia In Canada
Buy Generic Levitra
Mexico Pharmacy
Best Price For Levitra
Original Cialis
How Much Does Cialis Cost
Buy Cialis Online Canada
Canada Levitra
Canada Meds Viagra
Best Canadian Pharmacy
Cialis Low Price
Generic Viagra Canada
Best Doses For Propecia
Canadian Pharmacy Ed
Buy Cialis On Line
Cialis Daily Cost
Cialiscom
Best Prices On Viagra
Combine Cialis And Levitra
драйвер hasp dv драйвера opel sintra руководство по ремонту usb dongle драйвер руководство по ремонту тойота скачать драйвер ati radeon 3800 руководство по эксплуатации лада приора хонда джаз руководство intel gma 4500 mhd драйвер руководство по эксплуатации htc hd2 драйвера на звук lenovo g555 власть и руководство в организации ямз 236 руководство по эксплуатации genius wireless g 12x драйвер фридкин практическое руководство scx 4200 series драйвер руководство по эксплуатации автомобиля 2104 стили руководства скачать драйвер canon mf 4120 требуется драйвер шины hd audio драйвера hp для linux руководство по эксплуатации мерседес с200 internet controller драйвер скачать a4tech x 750bf драйвер руководство пользователя total commander руководство 31029 samsung r580 драйвера xp руководство для врачей hp pavilion tx1000 драйвера руководство по эксплуатации hummer h2 ati radeon x1900gt драйвер руководство по эксплуатации автосигнализации старлайн volkswagen tiguan руководство по эксплуатации amilo pi 1505 драйвера руководство по ремонту golf 4 гибкий стиль руководства скачать драйвер zte mf627 1с предприятие 8.1 руководство драйвера toshiba satellite u300 nvidia mcp79 7a скачать драйвера ауди скачать руководство nvidia geforce 8200m драйвер скачать руководство по эксплуатации рэ rtl8187b wlan adapter драйвер скачать выбор эффективного стиля руководства мицубиси лансер 9 руководство драйвер ati radeon hd 4225 руководство по эксплуатации гранд витара руководство по ремонту boxer peugeot asus p5q se драйвера скачать cd dvd драйвер скачать лидерство и руководство презентация hp d1360 драйвер ямз руководство по ремонту двигателя руководство вольво 740 скачать hp laserjet m1522 nf драйвер стиль руководства туристской фирмой руководство по эксплуатации tomahawk samsung 4300 драйвер скачать руководство фирмы asus k50c драйвера xp драйвер samsung u100 театр под руководством покровского драйвер картридера тошиба руководство по клинической лабораторной диагностике руководство по ремонту газ драйвер viewsonic va2213w asus x51 драйвера руководство по ремонту iveco камин 3.0 руководство acer aspire 5741 драйвера руководство по ремонту audi a2 драйвера nvidia lan 1с бухгалтерия 7.7 руководство пользователя руководство по ремонту фиат альбеа драйвер asus dsl g31 скачать руководство по ремонту газель скачать аудио драйвер realtek 97 dell 6400 драйвера нива шевроле скачать руководство руководство по системной поведенческой психотерапии asus rt g32 драйвер vw polo руководство по эксплуатации руководство по ремонту автомобилей газ asus drw 1608p драйвер руководство пользователя ваз 2115 руководство по ремонту тойота кариб скачать драйвер sapphire hd 3650 htc gratia руководство скачать драйвер canon mp 270 драйвер модем zyxel adsl usb руководство по эксплуатации ваз 21102 айфон 4g руководство драйвер acer al2017 руководство по ремонту снегохода буран драйвера emachines e730g руководство по эксплуатации телевизора самсунг creative vf 0050 скачать драйвер драйвер syncmaster 757 формирование стилей руководства драйвера для клавиатуры logitech g15 скачать руководство пользователя nokia x6 драйвер samsung ml 1610 скачать руководство toyota subaru outback руководство драйвера nvidia для mac os dell 1300 драйвера vw touareg руководство hd 5650 драйвер азлк 2140 руководство по ремонту foxconn mcp61pm2ma mcp61sm2ma mcp61vm2ma драйвер руководство лидерство влияние власть руководство по ремонту иж 5740 acer драйвера руководство по ремонту мицубиси паджеро руководство по ремонту рено меган ati rv100 ddr драйвер generic bluetooth adapter драйвер скачать e39 руководство по ремонту драйвер ati x1100 windows 7 руководство лидерство менеджмент и власть руководство пользователя gt s3600i драйвера depo драйвер canon pixma ip 5200 xperia x10 mini руководство sema руководство пользователя samsung r518 драйвера скачать руководство по ремонту bmw полное руководство по выживанию драйвер usb dongle bluetooth 2.0 руководство по ремонту хендай гетц gigabyte ga 7n400s драйвер руководство по технической эксплуатации great wall hover руководство samsung r538 драйвера руководство windows vista скачать электропечь руководство по эксплуатации samsung scx 4300 series драйвер драйвера для aser aspire 5100 mitsubishi fuso руководство по эксплуатации руководство образовательным процессом скачать драйвер nvidia ti 4200 драйвера для принтера canon 1900 руководство пользования ноутбуком emachines d440 xp драйвер скачать ломоносов краткое руководство к риторике ati radeon hd5550 драйвер руководство по эксплуатации женщины руководство по эксплуатации козлового крана ati radeon hd5470 скачать драйвер руководство пользователя credo abit is7 драйвера скачать скачать драйвер hp deskjet f380 rovermedia tv драйвер руководство пользователя gal lm x11 драйвера для видеокарт acer peugeot 807 руководство по ремонту руководство пользователя adobe reader samsung ml 1660 драйвер руководство по ремонту isuzu elf драйвера asus к50с руководство по ремонту ауди80 руководство фольксваген гольф 3

Recent Coverage

Blogged under General by Jeff Clavier on Wednesday 7 September 2005 at 4:48 am Edit This

Glenbrook Networks got some interesting coverage recently:

I look forward to when Glenbrook or Google will help us find information from these previously unavailable sources. It will mean billions more pages of relevant information available to the world.

 

Express Viagra Delivery
Get Cialis
Buying Cialis
Canadian Pharmacy With Lowest Generic Viagra
Best Viagra Soft Prices
Canadian Health Care Pharmacy Order Viagra
Bestellen Levitra
Cialis Canadian
Best Price For Generic Cialis
Propecia 5mg
Cialis Cheap
Buy Viagra
Cialis 20mg One A Day
Daily Cialis For Sale
Generic P Ropecia Finasteride
Buy Fast Propecia
Cialis 20 Mg 10 Pills
Canadas 1 Pharmacy
Canadian Health Care
Canadian Pharmacy Shop
Levitra Prices
Cialis Canada
Propecia Without Perscription
Canada Prescriptions Levitra
Buy Viagra Pills
Canadian Viagra For Sale
Infopharmcom
Get Levitra
Purchase Cialis Cheap
5 Mg Propecia Buy
Generic Levitra Overnight Delivery
Canadian Pharmacy Online Cialis
Cialis Daily Price
Buy Levitra With No Prescription
Get Viagra Fast
Low Cost Canadian Viagra
Best Levitra Prices
Canadian Phamacy
Levitra Tablets
Buy Viagra Online Paypal Vipps
Levitra Without Prescription
Buy Discount Viagra
Cialis Professional 20 Mg
Price Cialis Canada
Propecia Discount
Canada Meds
Propecia No Prescription
Natural Viagra
What Is Cialis
Cialis Women
Levitra Uk
Cialis Canada Online Pharmacy
Cialis Daily In Canada
Cialis 20 Mg Tablet
Cialis Buy Purchase Fast Delivery
Cialis Usa Women
Buy Levitra Online Without Prescription
Buy Viagra In New Zealand
Levitra Vs Cialis
Cost Of Viagra
Cialis Without Prescription Brand Name
Best Way To Take Cialis
Discount Propecia
Order Levitra Online
Buy Viagra Online Cheap
Best Price For Propecia
Order Cialis Online Canada
Cialis Online Canada No Prescription
Generic Cialis India Discount
Pharmacy Selling Viagra In Israel
Viagra Brand
Cialis Order
Levitra Discount
Cialis Canadian Pharmacy
Cialis Tablets
Cialis Online Canada
Canadian Pharmacy Discount Code Viagra
Levitra Purchase
Canadian Levitra
Buy Viagra Germany Canadian Meds
Pfizer Viagra Uk
Cialis Delivery In 5 Days Or Less
Viagra Pfizer Online
Cnadian Viagra India
Low Price Propecia
Cialis Overnight
Levitra Online
Cialis From India
Best Propecia Prices
Buy Cialis Usa
Canadian Pharmacy Viagra Legal
Female Viagra

Trawling the Deep Web

Blogged under General by Jeff Clavier on Sunday 21 August 2005 at 2:04 pm Edit This

The majority of web pages one can access through search engines were collected by crawling the so-called Static or Surface Web. It is a smaller portion of the Internet reportedly containing between 8 and 20 billion pages (Google vs. Yahoo index sizes). Though this number is already very large, the total number of pages available on the Web is estimated to 500 billion pages. This part of the Internet is often referred to as Deep Web, Dynamic Web, or Invisible Web. All these names reflect some of the features of this gigantic source of information - stored deep down in databases, rendered through DHTML, not accessible to standard crawlers. Pages in the Deep Web typically might not have a standard URL, and cannot be addressed in a standard fashion. In many cases, they actually do not even exist until a user asks a question by filling up fields in a form, and a response (page) is generated. Typical examples of deep web applications are airline reservation, online dictionaries, etc.

It is supposedly quite easy for a human to navigate through the Deep Web. One just needs to fill up a form by choosing one of several options like destinations and dates a on travel site, or entering a word to search for a meaning or a translation. It is much more difficult for a machine to do so automatically and generically. Because the Deep Web contains a lot of factual information, it can be seen metaphorically as an ocean with a lot of fish. That is why we call the system that navigates the Deep Web a trawler.

There are two major problems with navigating Deep Web automatically. First, the trawler needs to understand what questions to ask through aforementioned forms, and ask them exhaustively. Second, the trawler can not easily navigate from one page to another since pages do not have set URLs or might not even exist. That’s why the trawler needs to remember where it came from and return to the surface (like a whale) before “diving” again to ask the next question.

If the number of sites is relatively small, say a few thousands, each set of forms could be described manually through a templating system. Its major limitations are scalability, and non resilience to changes in page formats. 

There is a third problem that is related to the size of the Deep Web. It is so big that one needs to focus on a particular subset (vertical) to have a chance to trawl it with some level of success, especially if high precision is an important factor. Since the task of determining what questions to ask includes understanding of semantics and context, the focus on a vertical comes handy.

Glenbrook’s approach to building a trawler is based on mimicking the behavior of a (human) user. It is a useful approach since the “doors” opening the Deep Web were built with a human in mind and reflect the standards (no matter how loose) that humans use to navigate the Web.

The Trawler consists of five layers:

  1. Discoverer - locates perspective target home pages in Surface Web
  2. Scout - navigates Surface Web part of a web site and finds the “doors” - DHTML pages that contain forms leading to the Deep Web part of a web site
  3. Locksmith - fills up the forms with various requests and collects responses
  4. Assessor - analyses responses and makes a decision to use this door as candidate to query the Deep Web part of the site or move elsewhere
  5. Harvester - collects all relevant pages from Surface and Deep Web parts of the web site

After all potentially relevant pages are harvested the Extractor takes over. The Extractor is a hybrid system that applies Pattern Recognition, Natural Language Processing and other AI techniques to extract facts, combine them and populate a database that is used to provide factual answers to search queries.

The Extractor will be the subject of another post.

Tag:

Cross-posted from Software Only


Best Price For Levitra
Buy Propecia 5mg Online Uk
Generic Viagra
Buy Viagra Cheap
Canadian Cialis United Pharmacy
Bestellen Levitra
Brand Viagra Canada
Cheapest Prices On Viagra
Canadian Pharmacy Scam
Cost Of Daily Cialis
Buy Viagra Online No Prescription
Purchase Viagra Etc From Canada
Cialis 5 Mg Italia
Buy Levitra Online Without Prescription
Bio Viagra Herbal
100 Mg Viagra
Canadian Pharmacy Online Cialis
Order Propecia
Order Cialis Online Canada
Buy Cialis Without Prescription
Cialis Delivery In 5 Days Or Less
Propecia 5mg
Diuretics And Viagra
Buy Levitra Without Prescription
Levitra 10mg
Viagra Canada
Brand Viagra Over The Net
Cialis Fast Delivery Usa
Best Way To Use Cialis
Cialis Professional 20 Mg
Cheap Viagra No Prescription
5mg Propecia
Canadian Female Viagra
Cialis Women
Cialis 20 Mg Tablet
Generic Levitra Overnight Delivery
Canadian Phamacy
Dose Cialis
Canadian Healthcare Pharmacy
Generic Cialis Next Day Delivery
Purchase Of Viagra Or Cialis Etc
Buying Real Viagra Without Prescription
Cialis Online No Prescription
Generic Levitra Canadian Healthcare
Generic Cialis India Discount
Canadian Pharm Propecia Online
Buy Cialis Online Uk
Cialis Brand Name
Buy Cialis On Line
Brand Cialis
Generic Viaga Canada
Cialis By Mail
Canadianpharmacy
Pharmacy Support Viagra
Non Prescription Viagra
Cialis Online Without Prescription
Cialis Samples
Canada Viagra Generic
Buy Cialis Professional
Online Cialis
Cialis Alternative
Viagra On Line
Fast Delivery Canada Cialis
Purchase Cialis
Generic Propecia Mastercard
Cialis 20 Mg
Buy Generic Levitra Online
Buy Generic No Online Prescription Viagra
Buy Cialis From Canada
Best Prices On Viagra
Buy Cialis Generic
Buy Viagra Australia
50 Mg Cialis
Cialis Daily In Canada
Buy Viagra Online Canadian Phamacy
Mexico Pharmacy
Canadian Pharmacy Viagra Prescription
Buy Real Cialis Online
Cialis Gel
Cheap Viagra Or Cialis
Prescription Viagra
Ordering Viagra Overnight Delivery
Viagra Soft Gel
Canada Meds Viagra
5 Mg Propecia
Branded Viagra
Cialis 20mg
What Is Cialis
Official Canadian Pharmacy
Online Pharmacy Viagra Ottawa Canada
Buy Cialis Without Rx
Cheap Generic Viagra Online

Glenbrook Networks in the San Jose Mercury News

Blogged under General by Jeff Clavier on Tuesday 16 August 2005 at 8:07 am Edit This

SiliconBeat’s Michael Bazeley featured Glenbrook Networks co-founders Julia and Edward Komissarchik, and the Glendor showcase, in a great piece about “Deep Web” search and information extraction. Michael summarized it quite well:

Komissarchik and her father, Edward Komissarchik, say they have figured out how to analyze the forms on Web pages and understand the type of information the sites are looking for. Then, Glenbrook’s Web crawlers use artificial intelligence to walk themselves through sometimes complex Web forms, answering questions, such as the location of their desired job, in the same way a human would.

Julia Komissarchik likens the process to cracking a safe.

“The way to think of it is, you case the joint,'’ she said. “The scout goes through the form and tries a few options to see what the results will be. Then you have a mastermind or safecracker who gets all this information from the scout and devises a method to open the forms.'’

Finally, she said, the “harvesters'’ spring into action to gather up all the information.

Just to clarify: the “safe” analogy does not imply that the company is breaking passwords, and accessing private information. It relates to getting a machine to access generically information stored beyond interactive forms.

We announced the launch of the Glendor showcase a couple of month ago. This features the first (and still I guess, only) mashup involving jobs listings positioned on GoogleMaps.

Longer post about the concept of “web trawling” implemented by the company on its way.

Thanks to all of you who emailed us since this morning, we are grateful for reports of issues with different browser/OS combination, sorry we are not hiring at this time, and yes we can build large scale custom search and aggregation data solutions. And we are delighted that you like this showcase

Update: Gary Price, who was also quoted by Michael, posted an analysis on Search Engine Watch, that I wanted to briefly comment on. First Glenbrook’s technology does not (and can not) extract information directly from corporate databases, it goes through the public, manual, interface that companies have setup to access that data.The innovation lies in a suite of algorithms that figure out automatically the parameters to be used to extract that data, not requiring any templating of the sites to be targeted.

On server load, queries are made in a sensible way to avoid overloading servers based on response times, etc. And data can be refreshed daily, and maybe multiple times a day if the dataset is small enough. But extracting and caching data that change too frequently would not be appropriate.

On usability and searchability of the data, this is actually where the aggregation of structured data delivers its value: being able to apply on a position, a location, across a wide range of sources (in this case, jobs listings across companies).

Delighted to show you the technology at your convenience Gary…

Tag:

Cross-posted from Software Only.

Is It Legal To Bye Viagra From Canada
Pfizer Viagra Online
Buy Viagra Germany Canadian Meds
Order Levitra Online
Online Viagra Scams
Discount Canadian Cialis
Buy Pfizer Viagra
Buy Generic Cialis
Real Viagra Gel
How To Get Cialis No Prescription
Cialis Generic
Buy Cialis Once Daily
Cialis Online Canada
Best Online Generic Levitra
How To Get Cialis In Canada
Viagra Made In India
Cialis Levitra Viagra
Discount Cialis And Viagra
Buy Propecia 5mg
Get Cialis
Buy Cialis Online In Usa
Viagra And Three Day Delivery
Cnadian Viagra India
Generic Viagra Online
Cialis Soft Canada
Pfizer Viagra 50 Mg Online
Propecia For Sale
Cialis Com
Buy Viagra Without Prescription
Best Levitra Price
Cialis Pills
Cialis Tablets
Buying Cialis
Levitra Discount
Purchase Cialis Soft Tabs
Cialis Dosage
Levitra On Sale
Canada Cialis Online
Generic Viagra Propecia
Buy Levitra Vardenafil
Price Cialis
Best Price For Propecia Online
Canadian Pharmacy Cialis 5 Mg
Levitra Online
Cheap Generic Viagra India
Levitra Vs Cialis
Best Canada Meds
Canadian Cialis Uk
Buy Propecia
Cialis For Daily Use
Australia Healthcare Online Viagra
Pfizer Viagra Uk
Viagra Brand
Is Viagra Different From Levitra
Viagra Professional
Fill Viagra Perscription
Levitra
Buy Cialis 5 Mg
Viagra Canadian Pharmacy
Buy Levitra Us
Buy Viagra On The Internet
Canada Viagra Pharmacies Scam
Buy Viagara From Canadian Pharmacy
Canada Price Cialis
Best Deal For Propecia
Propeci A Sale
Ordering Cialis Online
Get Viagra Without A Prescription
Non Pescription Cialis
Cheapest Cialis
Propecia 1mg
How Can I Get Viagra Overnight
Best Viagra And Popular In Uk
Online Pharmacy Levitra
Cheap Viagra
Cialis No Rx
Soft Cialis
Best Canadian Pharmacy
100 Mg Cialis
Canada Viagra No Prescription
Generic Viagra In Canada
Viagra Of Pfizer
Viagra Pills
Viagra Oral Gel
Cialis Soft Pills
Cialis Professional
Best Prices For Propecia
Propecia Without A Prescription
Cialis Viagra
Buy Cheap Propecia Online
Obtain Viagra Without Prescription
I Need Viagra Now

Investments flowing into job search engines

Blogged under General by Jeff Clavier on Tuesday 9 August 2005 at 8:18 pm Edit This

Congratulations to SimplyHired for raising a $3M Series B from a great group of angel investors last Thursday, and to Indeed for following suit on Monday, scoring $5M from Union Square Ventures, the NY Times Company and Allen & Company. Fred Wilson shared interesting insights about the deal on his blog.

The consolidation in the jobs vertical search begins: Jobster acquires Workzoo

Blogged under General by Jeff Clavier on Tuesday 12 July 2005 at 4:45 am Edit This

I was reading Charlene Li’s excellent account of the launch of HotJobs crawling capability when I spotted that Jobster is buying WorkZoo. According to Charlene:

I spoke with Jobster CEO Jason Goldberg on Monday, and he described their vision of how WorkZoo will allow users to expand their search beyond their network of jobs on Jobster proper and see “every” job. WorkZoo has its cut out for them – in previous testing, they lagged significantly in their parsing ability compared to Indeed.com and Simply Hired. But this combination of Jobster and WorkZoo makes sense as a combined service – it’s also is similar to the partnership that currently exists between professional social networking service LinkedIn and SimplyHired.

The consolidation has already begun. Interesting.

Yahoo HotJobs is also a jobs search engine

Blogged under General by Jeff Clavier on Tuesday 12 July 2005 at 3:27 am Edit This

John Battelle said it best in “A Good Idea, Indeed. You’re Simply Hired “: Yahoo Hotjobs is entering the Jobs search arena.

“Yahoo seems to be taking a cue from Indeed and Simply Hired. Ouch. (Thanks, Richard)”

Joel Cheesman actually posted on the topic before John, and there is an interesting discussion in the comments of his post.

Let’s see what Monster.com and CarreerBuilder’s next moves are in this new segment.

Update: SiliconBeat added their take on the news

Bay Area zip codes

Blogged under General by Jeff Clavier on Wednesday 6 July 2005 at 1:42 am Edit This

Francois Gossieaux over at Emergence Marketing very rightly pointed out that our readers, and showcase testers, might not be familiar with our zip codes. Apologies for that.

I should mentiong that leaving the “Location” field empty uses San Carlos as the reference point for searches (it is sort in the center of Silicon Valley). And here are a few Bay Area zip codes: 94301 for Palo Alto, 94111 for San Francisco and 95113 for San Jose.

Glendor.com is a mashup

Blogged under General by Jeff Clavier on Tuesday 5 July 2005 at 7:04 pm Edit This

Om Malik has pointed this morning to a few applications using Google Maps to geolocate “stuff”, stuff being wireless-enabled cafes, wireless hot-spots in cities, and the now famous Craigslist meets Google Maps for having started the whole movement.

Michael Bazeley then pointed to Redfin, which combines satellite maps and MLS homes data for the Seattle area.

The O’Reilly Radar also referred to the Google Maps + Yahoo Traffic mashup that was taken down, and then brought back up.

So Glendor.com is a mashup as well then!

Finally, I found Google Maps Mania in our referrer logs:  An unofficial Google Maps blog tracking the websites, ideas and tools being influenced by Google Maps.

Mapping job listings

Blogged under General by Jeff Clavier on Tuesday 5 July 2005 at 2:57 am Edit This

Glendor Showcase

And this was developed before the Google Maps API was released! Which means that we might not have used all the capabilities now available.
Also make sure to zoom in the map to display the different companies with less overlap.

.

A few search examples

Blogged under General by Jeff Clavier on Tuesday 5 July 2005 at 2:37 am Edit This

The following searches will give you an idea of what can be accessed on Glendor.com:

  • Development jobs available 25 miles around Palo Alto, CA:  search map rss
  • Software jobs listed on company websites that includes the keywords (kernel, networking, file system): search map rss
  • Contract or temporary admin jobs published in the last 7 days, within 10 miles of San Francisco, CA: search map rss

Don’t be surprised if some jobs are outside of the Bay Area: we are restricting the sources to companies having operations, or their headquarter, in the Bay Area, but the jobs themselves might be anywhere in the US, or actually abroad.

Also, the precision of the mapping is at the level of the city since only rarely is the actual address of the company mentioned in the job listing. That’s why multiple jobs may overlap on one city, and clicking on one character does not display all jobs available for that city in the “bubble”.

A word about this blog

Blogged under General by Jeff Clavier on Monday 4 July 2005 at 1:48 am Edit This

Besides keeping you up to date on the developments of Glenbrook Networks, and the Glendor showcase, this blog will also talk about vertical search in general, and some of the technology issues that we had to solve when building our vertical search and information extraction platform.
Please tune in the RSS feed.

Welcome to the Glendor Showcase

Blogged under General by Jeff Clavier on Monday 4 July 2005 at 1:10 am Edit This

Glendor.com is the showcase of Glenbrook Networks, the search and information extraction platform provider.

We have chosen jobs as a vertical for this showcase because extracting listings from company web sites exercises all aspects of our technology to produce quality, structured results: surface and dynamic web crawling, layout recognition, natural language processing,…
We have also integrated a few additional features like the mapping job listings onto Google Maps, the ability to subscribe to search results via RSS feeds, and to syndicate searches on blogs or other web sites.

The showcase is providing job listings extracted from a few hundred Bay Area company web sites, and one large job board. Using it is pretty straightforward, but check out the Help section for typical queries.

Proudly powered by Wordpress - Theme Glendor