The Apache Software Foundation

Apache Nutch

DOAP File RSS File Atom Feed

Apache Nutch is an open source web-search software project. Stemming from Apache Lucene, it now builds on Apache Solr adding web-specifics, such as a crawler, a link-graph database and parsing support handled by Apache Tika for HTML and and array other document formats. Apache Nutch can run on a single machine, but gains a lot of its strength from running in a Hadoop cluster The system can be enhanced (eg other document formats can be parsed) using a highly flexible, easily extensible and thoroughly maintained plugin infrastructure.

Programming Languages Java
Categories web-framework
Mailing Lists http://nutch.apache.org/mailing_lists.html
Bug/Issue Tracker http://issues.apache.org/jira/browse/NUTCH
License Apache License Version 2.0
Project Website http://nutch.apache.org
PMC Apache Nutch

Project Release Information

Releases can be downloaded from http://www.apache.org/dyn/closer.cgi/nutch/

Most recent releases:

Release Version Date
Apache Nutch 1.3 1.3 2011-06-07
nutch-1.0 1.0 2009-03-23
nutch-0.9 0.9 2007-04-01
nutch-0.8.1 0.8.1 2006-09-24
nutch-0.8 0.8 2006-06-25
nutch-0.7.2 0.7.2 2006-03-31

Access to the source code:

Copyright 1999-2012, The Apache Software Foundation

Licensed under the Apache License, Version 2.0.

Generated Wed, 16 May 2012 18:30:40 GMT