I'm trying to convince Cocoon's built-in search engine, lucene, to index a site of some 3000 documents. Not a particularly challenging task, but unfortunately it keeps blowing up:
org.apache.cocoon.ProcessingException: Exception in numDocs()!:
java.io.FileNotFoundException:
/usr/local/jakarta-tomcat/work/Standalone/some.site/_/cocoon-files/index/_a.tis (Too many open files)
This had been a continual problem for me, up until recently when I refactored mercilessly, chopping several hundred lines of code and streamlining the site. Two days ago, it was indexing just fine.
I know people are using this stuff on sites with 250,000+ documents, so 3000 should not be a problem. Hours of reading docs, mailing lists, google and source code has not enlightened me.
(Well, actually, that's a lie - I now know about store-fields, merge-factor, exclude and other fine things. Unfortunately, none of this has helped solve the problem.)
Update: it works fine in my old copy of Cocoon 2.0.4. Which is great. Except I need two features of Cocoon 2.0.5-dev (the bugfix tree for 2.0.*) - the PaginatorTransformer and Sylvain's and Jeremy's backport of the new Lucene features which let me specify which fields get output in search results (ie, rather than listing just web addresses, it can give meaningful results like page titles).
Time to find out what broke in 2.0.5 ... (time for another cup of tea)
Posted by savs at August 29, 2003 1:54 AM