Archive for the ‘Module 4’ Category

Evaluating the Web

Thursday, August 28th, 2008

Annotation
Basic Search Tips and Advanced Boolean Explained” is a guide - presumably for University of California, Berkeley, students - on conducting effective web searches. The resource, a PDF file, was found on the afore-mentioned University’s library website, and references a “course”. The author is a university librarian, and may be considered a very credible source of information, which is relevant to my purpose, since it provides objective information on how to use conditional operators in Web searches.

The guide does not detail all aspects of Web searching, eschewing information on advanced search engine options and the commonly implemented Site operators (such as “define”, “cache”, “site” and so on). But as the document only claims to cover basic searching and advanced Boolean, the credibility of the document is not diminished for this omission.

The guide unfortunately does not specify a publication date. One may assume that searching techniques would remain relatively constant, but this is not a given, and searching for additional material that provides surety of currency is recommended.

Google reports that 12 sites link directly to this document, including other educational institutions. This is a good indication of the guide’s perceived quality.

Discussion
I enjoyed conducting this evaluation and believe the Evaluating Web Sites tutorial to be a particularly helpful framework for considering the validity of published material.

In general, people - including me - find it easy to defer to Google: whose premise is that the more links a webpage has, the more authority it carries (there are other factors, too, but it is generally thought that a link counts as a vote for relevance). Naturally, Google’s calculation is automated, and there are many reasons a page may receive many links and a consequently high placing on Search Engine Results Pages (SERPs), so a good heuristic for evaluating Web-sourced information is important, particularly in “mission-critical” applications.

Depending on the type of resource, I think that in most cases people would be happier to click on a link surrounded by good contextual data than to read my annotation. It would take less time to click the link and scan the first part of the document than to read through my thoughts on the matter.

Additional Resource:

Evaluating Web Sites

Searching the Web

Thursday, August 28th, 2008

The Deep Web

The concept of the Deep Web is not surprising to me: I’ve had a bunch of pages up on the Web that I haven’t bothered or wanted to have indexed in crawler-based sites, and I’ve even used robots.txt and Sitemap files to restrict access to some of my content. But I had never actually thought about the value of “hidden” content, and I didn’t realise that Sitemaps were a sneaky tool (well, the Sitemap generators are!) for accessing more of the deep web, because they inform search engines of more URLs than their crawlers would be able to find.

Now, I’ll point out that as a Linux user, I didn’t have access to Copernic or Sherlock, so I used Web-based metasearch tools instead.

When I first read about the Deep Web in this course, I wondered about how the content was accessible to anyone other than the owner. In my researches, I came across Turbo10, a Deep Net search tool that returns Web content as well as other hidden Internet resources. Turbo10 is different to many other search engines because it causes the relevance ranking, topic clustering and result merging to occur in the client browser rather than on the server (this is done in the interests of speed and is achieved by way of asynchronous data transfer). It turns out that Turbo10 may use a technique called federated searching, whereby programmed “adapters” automatically connect to topical deep web search engines, searching and extracting results from the Deep Net. Turbo10 has provided a great paper that explains the mechanics of searching the Deep Net.

To the actual task:
Google Search for “project management methodologies”: top result - Project Management Methodologies

Google results, about 3,600,000 results in total.

Turbo10 Search for “project management methodologies: top result - Project Management Methodologies, 20 results in total.

I personally don’t think either search produced great results: a project management body would have carried more authority than an individual project manager and would hold more relevance than a book review. Turbo10 was quite disappointing, actually, in the number of results it returned.

Another great paper on searching the Deep Web is Using the Deep Web: A How-To Guide For IT Professsionals.

Boolean Searching
To get the biggest result set, use OR between keywords. Note that this will return the entire result set for each keyword joined by this operator.

Google Search for “project management OR methodologies”: top result - Project Management - Wikipedia.

If you need all keywords in your results, you’re better off using the AND operator or no operator at all (since AND is the default operator). I think the most useful searching techniques are to use the minus (-) operator to remove results that include the phrases you specify (eg project management methodologies -agile). Another good technique is to quote phrases if you wish the exact phrase to appear in the result set (eg “I have of late, but wherefore I know not”).

To obtain result sets originating only from university sources, it is best to use the Advanced Search option. Yahoo allows you to set specific Top Level Domains to search within: this is the best approach. Simply enter the search phrase, then limit the search to .edu domains.

Further references:
Librarian Search Guide
Great Boolean search cheat sheet
Dogpile
Ask

Organising Search Information

I used OpenOffice Calc (a spreadsheet tool) and The Gimp to record my search information. These are great open source tools that I use regularly. I built a similar system to the one you see in the screenshot below for recording website bugs for CybaSumo.com.

Downloading Internet Tools

Wednesday, August 27th, 2008

Here’s where being a Linux user has some disadvantages: standard plugins and programs are little behind the main game. The Flash Player, which is essentially necessary for browsing the Web, is a bit buggy, or at least not as fully featured as the Windows version. This means I sometimes encounter problems, for instance in pausing YouTube videos or doing various other things. In fact, this is the only real problem I face as a result of choosing Linux over the commercial operating systems.

Many Linux users are supporters of Free Software (”free as in speech, not beer”: “because the user is free”) . Many advocate only using free software, which would preclude the use of the Flash plugin, Adobe Reader and other proprietary software, including Opera (my favourite browser). As such, you’ll find free (usually released under the GNU Public Licence) software as substitutes: eg Gnash for Flash Player and Evince for Adobe Reader. I actually think it’s practical to use the proprietary standard versions in many cases.

For this task, I downloaded and installed RealPlayer, which is based on the OpenSource Helix Player. Since then, I’ve enjoyed the relaxing sounds of ABC’s Dig Radio.