18.470 indexing local (and other) machines

From: Humanist Discussion Group (by way of Willard McCarty willard.mccarty_at_kcl.ac.uk>
Date: Fri, 7 Jan 2005 10:39:59 +0000

               Humanist Discussion Group, Vol. 18, No. 470.
       Centre for Computing in the Humanities, King's College London
                   www.kcl.ac.uk/humanities/cch/humanist/
                        www.princeton.edu/humanist/
                     Submit to: humanist_at_princeton.edu

   [1] From: Erik Hatcher <esh6h_at_virginia.edu> (42)
         Subject: Re: 18.463 indexing local machines

   [2] From: "Okyere, Emmanuel II" <chief_at_okyere.org> (96)
         Subject: RE: 18.463 indexing local machines

   [3] From: "Patrik Svensson" (41)
                 <patrik.svensson_at_engelska.umu.se>
         Subject: RE: 18.463 indexing local machines

   [4] From: "Stephen Woodruff" <s.woodruff_at_arts.gla.ac.uk> (4)
         Subject: RE: 18.463 indexing local machines

--[1]------------------------------------------------------------------
         Date: Fri, 07 Jan 2005 10:12:57 +0000
         From: Erik Hatcher <esh6h_at_virginia.edu>
         Subject: Re: 18.463 indexing local machines

On Jan 6, 2005, at 2:27 AM, Humanist Discussion Group (by way of
Willard McCarty <willard.mccarty_at_kcl.ac.uk>) wrote:
>It took only about a month before I deleted the highly structured
>collection in favour of the unstructured one.

Structure is so overrated!

>Seriously, in the life of an interdisciplinary computing humanist
>nearly
>every intellectual object falls under so many distinct categories,
>whatever
>the scheme, that I cannot see any such thing working. Except, perhaps,
>for
>those who devote themselves to the scheme rather than to what it
>schematizes.

Consider Folksonomies :)

          http://www.adammathes.com/academic/computer-mediated-communication/
folksonomies.html

>Automatic indexing then became a priority. Eventually I gave up on
>Windows
>XP's native indexing -- the finding mechanism is too slow and clumsy. A
>visiting lecturer (may his tribe increase) drew my attention to X1
>(www.x1.com/), which I tried out, then purchased.

What was your experience with X1?

>What have others done? What's been the experience?

I come to this party with a heavy Lucene bias. Check out my newly
launched site at http://www.lucenebook.com - this is a "search inside"
the book combined with a blog. I'm actively evolving it (with TODO
items of integrating errata into book section search matches, and so
on).

There are several Lucene-based desktop search options, though
admittedly I have little experience with them first hand. But here are
some things to try:

          * Searchblox - http://www.searchblox.com (they contributed a case
study to the Lucene book, so I know the most about this one)

          * Aduna AutoFocus - http://aduna.biz/products/autofocus/index.html

          * Zilverline - http://www.zilverline.org

Our Lucene book free source code download comes with a simple text file
indexer (it'll crawl a directory tree indexing .txt files only) which
could be adapted to index other types of content. The tricks would be
to also enhance to check date stamps and re-index new content and
remove documents that no longer exist and integrate in various document
parser for HTML, Word, PDF, etc types.

          Erik

--[2]------------------------------------------------------------------
         Date: Fri, 07 Jan 2005 10:16:34 +0000
         From: "Okyere, Emmanuel II" <chief_at_okyere.org>
         Subject: RE: 18.463 indexing local machines

Willard,

I'm yet to try X1 and I'm looking forward to when yahoo finally release it
(yahoo has licensed X1 and will release a free version early this year:
http://www.usatoday.com/tech/techinvestor/corporatenews/2004-12-10-yahoo-des
ktop-search_x.htm?csp=34) to give it a go.

I have tried the MSN toolbar suite (http://beta.toolbar.msn.com/), Google
Desktop and Copernic Desktop Search (http://www.copernic.com/); I like
copernic best and it gives me so much flexibility in terms of what I want to
index; it also plugs a search bar into the taskbar that makes it easy to
search anytime and has a really nice UI. It is also free.

There's a nice roundup of things here: http://slate.msn.com/id/2111643/

Cheers,
- eokyere

---
Emmanuel OKYERE II
CTO - AKUABA, LLC
Phone/Fax:  703.815.4702
PGP Key ID: 0xA7FD6168
MSN: compubandit
AIM: compubndit
http://www.okyere.org/
| -----Original Message-----
| From: Humanist Discussion Group [mailto:humanist_at_Princeton.EDU] On Behalf
| Of Humanist Discussion Group (by way of Willard McCarty
| <willard.mccarty_at_kcl.ac.uk>)
| Sent: Thursday, January 06, 2005 2:28 AM
| To: humanist_at_Princeton.EDU
| |
|                Humanist Discussion Group, Vol. 18, No. 463.
|        Centre for Computing in the Humanities, King's College London
|                    www.kcl.ac.uk/humanities/cch/humanist/
|                         www.princeton.edu/humanist/
|                      Submit to: humanist_at_princeton.edu
|
|
|
|          Date: Thu, 06 Jan 2005 07:22:05 +0000
|          From: Willard McCarty <willard.mccarty_at_kcl.ac.uk>
|          Subject: indexing local machines
|
| Recently I have tried out two programs for indexing the text- and
| email-files on my local machines and one for cataloguing my images. This
| is, in effect, a query about such programs, with a long preamble on my
| experience so far.
|
| Like most others here, I suppose, I've accumulated sufficient amounts of
| texts and images to make finding what I need sometimes quite difficult.
| During 2003-4 I started a systematic and large-scale effort to accumulate
| Web-pages, PDFs and other forms of text to support my research. (The
| collection now stands at ca. 1/2GB -- it's small because I actually read
| the stuff.) At first I evolved a reasonably complex directory structure
| for
| these files, but soon I realised that I was spending significant amounts
| of
| time deciding in which of the sub-sub-subdirectories to put a newcomer and
| looking through the many such sub-sub-subdirectories for one I had
| judiciously placed somewhere not too long before. So I set up a parallel
| unstructured bit-bucket in which I put an identical copy of everything,
| with the idea of seeing which way my wind was blowing. I also adopted the
| practice of putting as many copies of newcomers in as many places in the
| highly structured collection as I thought they belonged.
|
| It took only about a month before I deleted the highly structured
| collection in favour of the unstructured one. Perhaps, if I had been able
| to replicate myself and my equipment a number of times, I might have
| assigned some of these imaginary selves to a cataloguing, metadata-writing
| party, but under the circumstances I could only find that notion amusing.
| Seriously, in the life of an interdisciplinary computing humanist nearly
| every intellectual object falls under so many distinct categories,
| whatever
| the scheme, that I cannot see any such thing working. Except, perhaps, for
| those who devote themselves to the scheme rather than to what it
| schematizes.
|
| Automatic indexing then became a priority. Eventually I gave up on Windows
| XP's native indexing -- the finding mechanism is too slow and clumsy. A
| visiting lecturer (may his tribe increase) drew my attention to X1
| (www.x1.com/), which I tried out, then purchased. A friend then told me
| about Google's Desktop free Search (desktop.google.com/), which I tried,
| then discarded: what works for the Web at large does not, in my
| experience,
| work well for one's private collection.
|
| Meanwhile I picked up Google's Picasa Photo Organizer
| (www.google.com/downloads/), which is as good as anything I've seen.
|
| What have others done? What's been the experience?
|
| Yours,
| WM
|
| [NB: If you do not receive a reply within 24 hours please resend]
| Dr Willard McCarty | Senior Lecturer | Centre for Computing in the
| Humanities | King's College London | Kay House, 7 Arundel Street | London
| WC2R 3DX | U.K. | +44 (0)20 7848-2784 fax: -2980 ||
| willard.mccarty_at_kcl.ac.uk www.kcl.ac.uk/humanities/cch/wlm/
--[3]------------------------------------------------------------------
         Date: Fri, 07 Jan 2005 10:18:08 +0000
         From: "Patrik Svensson" <patrik.svensson_at_engelska.umu.se>
         Subject: RE: 18.463 indexing local machines
Dear Willard,
This is a most interesting issue! Like you I find that unstructure works
relatively well. I have tried several programs for indexing and searching
local data. There is so much of my life in my email program that decent
search facilities are vital. I have used X1 for quite some time now and I
love it. I have about 90,000 email messages stored (basically all in-going
and out-going messages from 1996 onwards) and a great deal of documents and
other data (including 45 versions of my Ph.D. thesis). X1 makes it very easy
to find information. I especially like the narrowing-down-as-you-type design
of the program and the blazing speed (there is no perceivable lag). Also X1
searches everything and is able to show many file formats directly. Other
programs I have tried, including 80-20, do not search headers when you do
free text searches. That proves to be a problem as people do not always
include their own name in the message field of emails, and name is a primary
search category. Of course these indexing programs take up some resources
doing the actual indexing but on my computers, it is hardly noticeable. I
think my fascination with these tools is partly because I think that they
make a qualitative difference. I do things now that I could not do before
and I also find myself using the search program to find email that just
arrived. It is a different kind of interface to email (and other data) and
in my mind, it is a significant step away from the mailbox paradigm - in
multiple ways I think.
It is not always that easy to make the distinction between local and
non-local data. Myself I tend to go for software that allows me to
distribute data. For instance, I use Biblioscape to handle references and it
allows me (and others if I let them) to view, edit and search my
bibliographic data from any connected computer. I often take notes in my
blog which is also distributed (and searchable). For instant messaging I use
Trillian which allows me to store and search im conversations (logging is
not totally unproblematic here of course). X1 allows me to search network
drives (lab resources for instance) as well as local drives. I use
del.icio.us (http://del.icio.us/) to keep track and search for bookmarks
(from any connected computer) - this one is rather interesting as it allows
"unplanned" tagging and you can see how categories develop in your own
material (rather than having decided on an ontology to start with).
Moreover, you may explore how your own emergent tagging scheme coincides
with that of other users. I also find the tagging process rewarding in
itself. It helps me associate, connect and remember.
Patrik Svensson
HUMlab, Umeň University, Sweden
http://www.humlab.umu.se/patrik
--[4]------------------------------------------------------------------
         Date: Fri, 07 Jan 2005 10:19:04 +0000
         From: "Stephen Woodruff" <s.woodruff_at_arts.gla.ac.uk>
         Subject: RE: 18.463 indexing local machines
I notice Yahoo have bought X1 (www.x1.com/) and apparently intend to
offer its technology (or maybe a subset?) free early this year, so now
is not a good time to buy.
Stephen Woodruff
Received on Fri Jan 07 2005 - 05:49:05 EST

This archive was generated by hypermail 2.2.0 : Fri Jan 07 2005 - 05:49:39 EST