AnyBook4Less.com - ISBN: 0134638379 - Information Retrieval: Data Structures and Algorithms by William B. Frakes

AnyBook4Less.com

Find the Best Price on the Web
Order from a Major Online Bookstore

Home | Store List | FAQ | Contact Us |

Ultimate Book Price Comparison Engine
Save Your Time And Money

Information Retrieval: Data Structures and Algorithms

Please fill out form in order to compare prices

Title: Information Retrieval: Data Structures and Algorithms
by William B. Frakes, Ricardo Baeza-Yates
ISBN: 0-13-463837-9
Publisher: Prentice Hall PTR
Pub. Date: 12 June, 1992
Format: Paperback
Volumes: 1
List Price(USD): $69.67

Your Country
Currency
Delivery
Include Used Books
Are you a club member of: Barnes and Noble Books A Million Chapters.Indigo.ca

Average Customer Rating: 4 (4 reviews)

Customer Reviews

Rating: 3
Summary: Covers Basics with Varying Depth and Quality
Comment: Rather than a coherent textbook about information retrieval, this book contains 18 papers by individual authors which vary wildly in depth, quality and relevance today. The basic issues are covered each with their own chapters: inverted files, vector comparison techniques, stoplists, stemming, tehsauri, string searching, relevance feedback, boolean operations, ranking, clustering and hashing.

The introduction covers hashing and automata for string matching in detail, but doesn't mention vector-based techniques other than Hamming distance (!) and in one paragraph provides the only mention of edit distance (aka Levenstein distance) in the book. The chapter on PAT trees and the one on optical disks seem out of place due to their depth and obscurity. On the other hand, there's no mention of caching anywhere. The chapter on lexical analysis and stoplists by Fox has a nice introduction, but then devolves into page after page of C code. Ditto for Frakes' chapter on stemming -- good introduction, but we didn't need ten pages of code. Same for the thesaurus chapter -- a few pages of introduction, and then 40+ pages of code for some kind of hierarchical clustering. Baeza-Yates' chapter on string searching covers Knuth-Morris-Pratt and Boyer-Moore briefly and even contains some interesting empirical data, but again, we didn't really need the C code. Harman's chapter on relevance feedback (query modification) stands out as being entirely sensible, high level and informative, but is a decade behind the times. The chapter on boolean operations provides a few pages of info and then mysteriously spends 10 pages on bit vector code and then another handful on hashing. Then the following chapter on hashing has 40 pages of C code for perfect hashing! Harman's later chapter on ranking algorithms is a useful overview of scoring (though very high level). Rasmussen's chapter on clustering is also thoughtful, but rather non-standard -- you don't even get k-means, everyone's favorite clustering algorithm, and it also recaps the definitions of many of the other chapters.

Unfortunately, you don't get any higher-order graph analysis techniques that power web search engines like Google. You won't get any kind of help for load balancing servers or databases, which is critical. You also don't get any dimensionality reducing and smoothing techniques like latent semantic analysis or principal components analysis. There's also no analysis from a users' perspective on usability and the different kinds of tasks that peopel might be using information retrieval for. And of course, there's no discussion of natural language understanding techniques or crosslingual or multilingual retrieval techniques. Finally, it's all text based and you won't get any information on retrieving audio or images.

If you're serious about information retrieval, this book lacks the depth and recency to leave you feeling like an expert. The statistical language processing book by Manning and Schuetze contains an excellent introduction to information retrieval algorithms, as well as reams of background on statistical language processing you'll want to understand before getting into information retrieval. For more details on information retrieval itself, check out the collection of primary source papers edited by Karen Sparck-Jones: Readings in Information Retrieval.

Rating: 5
Summary: Useful reference book
Comment: I bought this book while working on some informaitno retrieval related project, and it turned out as a useful reference for explaining terminology, suggesting efficient data structures, and offering good references for further reading.

However, the book turned out yet more useful to me as, during my M.A. studies (in CS) I had to write a work on "Suffix Trees" and "Suffix Arrays" and I found that Gonnet, Baeza-Yates and Snider describe equivalent ideas they call "PAT trees" and "PAT arrays".

I found this book useful too for working on computational linguistics related projects as well.

In short - I like keeping this book always in reach, as a reference, though, I found this book not so friendly as an introduction book to the subject ("Managing Gigabytes", might turn out to be a more welcomming).

Rating: 4
Summary: Good coverage and treatment of algorithms for I.R.
Comment: I adopted this book as the primary textbook for my course on information retrieval. It covers a substantial part of core topics in IR: models of information retrieval system (boolean and best-match systems); implementations (inverted files, tries, signature files, hashing), indexing and retrieval algorithms (lexical analysis, stemming, ranking, relevance feedback, boolean operations) and somewhat more advanced topics like clustering and automatic thesaurus construction. These topics are dealt with varying level of detail: for some of them there are also C code examples that are rather useful to students; other topics are less well detailed (eg. relevance feedback and probabilistic models). These topics are dealt with sufficient clarity and reasonable conciseness. Some shortcomings are: (i) the weak treatment of the probabilistic models (I would have liked a deeper analysis of the underlying principles and how they lead to certain kinds of systems). Consequences of some techniques are discussed with insufficient depth. (ii) In my view too much attention is devoted to low-level string processing, like what is done in chapter 10, centered on string searching algorithms (not relevant to the main topic of the book). (iii) Other important topics have not been dealt at all, unfortunately. These include almost everything that goes under the topic of user-centered information retrieval and user interfaces. Another missing topic is "passage retrieval".

Similar Books:

	Title: Modern Information Retrieval by Ricardo Baeza-Yates, Berthier Ribiero-Neto, Berthier Ribeiro-Neto ISBN: 020139829X Publisher: Addison-Wesley Pub Co Pub. Date: 15 May, 1999 List Price(USD): $50.00
	Title: Mining the Web: Analysis of Hypertext and Semi Structured Data by Soumen Chakrabarti ISBN: 1558607544 Publisher: Morgan Kaufmann Pub. Date: 15 August, 2002 List Price(USD): $54.95
	Title: Foundations of Statistical Natural Language Processing by Christopher D. Manning, Hinrich Schütze ISBN: 0262133601 Publisher: MIT Press Pub. Date: 18 June, 1999 List Price(USD): $75.00
	Title: Managing Gigabytes: Compressing and Indexing Documents and Images by Ian Witten, Ian H. Witten, Allistair Moffat, Timothy C. Bell ISBN: 1558605703 Publisher: Morgan Kaufmann Pub. Date: 15 May, 1999 List Price(USD): $62.95
	Title: Ontology Learning for the Semantic Web (The Kluwer International Series in Engineering and Computer Science, Volume 665) by Alexander Maedche ISBN: 0792376560 Publisher: Kluwer Academic Publishers Pub. Date: 01 February, 2002 List Price(USD): $105.00

Thank you for visiting www.AnyBook4Less.com and enjoy your savings!