Projects / Sally

Sally

Sally is a tool for mapping a set of strings to a set of vectors. This mapping is referred to as embedding and allows techniques of machine learning and data mining to be applied for the analysis of string data. It can be used with data such as text documents, DNA sequences, or log files. The vector space model or bag-of-words model is used. Strings are characterized by a set of features, where each feature is associated with one dimension of the vector space. Occurrences of the features in each string are counted. Alternatively, binary or TF-IDF values can be computed. Vectors can be output in plain text, LibSVM, or Matlab format.

Tags
Licenses
Operating Systems
Implementation

RSS Recent releases

  •  25 Dec 2013 13:08

    Release Notes: A major bug in the processing of strings has been fixed. Support for the new version of libarchive has been added. Several minor bugs have been fixed.

    •  27 Dec 2012 21:46

      Release Notes: Support for positional n-grams with varying shift has been added. Several minor bugs have been fixed.

      •  29 Aug 2012 22:25

        Release Notes: Support for stop words and frequency thresholding has been added. The configuration has been simplified and is more transparent. Several bugs have been fixed.

        •  18 May 2012 18:52

          Release Notes: The configuration and manual have been improved.

          •  13 May 2012 20:18

            Release Notes: Support for signed embedding of strings has been added. Several minor bugs have been fixed.

            Screenshot

            Project Spotlight

            Grenouille

            An online service for weather data.

            Screenshot

            Project Spotlight

            SYINF

            A portable, cross-platform program for brief system information.