Release Notes: This release supports materializing as new indexes query-based combinations (crawl mixes) of old search indexes. This should make query performance of crawl mixes much better. Cache pages of search results now have a new history UI which allows you to search cache pages in all indexes you have, much like the way Internet Archive does. Yioop now supports spell corrections on searches after they have been performed, and it has an API for transliterating between roman and other scripts. Query performance has been improved over previous versions, and lots of minor bugs have been fixed.
Release Notes: This release adds an activity to manage search media sources. For now, one can add Video and RSS sources. When configured, RSS feeds download hourly and are integrated into search results. Also new is a command line tool for configuring Yioop in VPS settings. An Italian stemmer has been added, as well as more translations. This version implements some important bugfixes in robot handling, as well as unit testing of these. Yioop! now works in PHP 5.4 as well as PHP 5.3, and plays friendlier with more recent versions of Xampp on Windows.
Release Notes: This release adds initial support for word suggestions as a user types in queries. The bigramming used to speed common two word queries now works with n word grams. N word gram filter files can now be created using Wikipedia raw page count dumps. This version adds support for * and $ in allowed- and disallowed-to-crawl sites. Using this, the user can crawl sites to a fixed depth. Robots.txt processing now supports * and $ in robot.txt paths. Support for NOSNIPPET, NOARCHIVE, and X-Robots-tag HTTP headers has also been implemented. A tool for editing search summaries after a crawl has also been added.
Release Notes: This release improved scalability by allowing multiple machines to maintain portions of the "to crawl next" queue. Query processing can also be split amongst machines, with different machines being responsible for documents of a given hash. Yioop! now supports mirroring of machines. Two word phrases as determined by an XML file such as Wikipedia URL dump can now be treated as a logical unit. The Yioop! model-view-controller framework has been made easier to extend and documentation for it has been added to the website.
Release Notes: This version supports starting, stopping, and viewing log files of the queue server and fetchers from a Web interface. One can now inject new URLs into an active crawl via a Web interface. This version of Yioop! supports re-crawling of pages after a fixed number of days. Also, the file extensions that are crawled, the number of bytes downloaded per page, and how Yioop! weighs different page components can now all be controlled through a Web interface rather than just the config.php file. Improvements have also been made to how HTML Processor extracts text to index.
Release Notes: Character n-grams are now supported for many languages that did not have a stemmer. Language detection was improved and better UTF-8 preparation was provided for downloads. Yioop!'s ability to following redirects, including bit.ly redirects, was improved. Proximity scoring of text in documents has also been enhanced.
Release Notes: This version adds a function API to get search results out of Yioop! It also improves the Open RSS Responses that Yioop! generates and allows them to contain images. The online documentation has been enhanced to describe in more detail how to incorporate Yioop! into a site. This release also simplifies the arc_tool syntax and adds the ability within arc tool to reindex a corrupt IndexArchiveBundle dictionary. This version fixes a bug in output buffering of the dictionary that could cause the dictionary to become corrupted on large indexes.
Release Notes: This release adds support for if: conditions in crawl mixes and for general searching. It improves the crawl status UI. It improves the performance of negation in queries. It adds support for Open Search RSS output of search results. There are many bugfixes.
Release Notes: This version adds support for crawling xlsx, pptx, and epub. The HTML processor now supports the base tag. The code for the front-end (not crawler) has been changed to work in cPanel hosting environments with only PHP 5.2. Filecache support, in addition to memcached support, has been added. Yioop! can now filter out hosts from search results, if needed, after indexing has been done. Improvements have been made to the start and stop of the crawling process. A command-line query tool has been added.
Release Notes: This version moves Yioop! from using a bag of words index to a positional index. Proximity scores are calculated when a multi-word query is done. The host of an inlink is now also incorporated in a ranking. An indexing plugin architecture has been created to make it easier to customize the crawl process; an example recipe plugin is included. An SEO cache view and improved test query statistics have been added. This version finally includes bug fixes for memcached and bmp handling.