ARFFUtil

java.lang.Object
- modnlp.util.PrintUtil
- - modnlp.tc.util.ARFFUtil

```
public class ARFFUtil
extends PrintUtil
```
Manipulate strings for/from ARFF files to be used with the WEKA toolkit.

See Also:
the weka toolkit for data mining.

Constructor Summary

Constructors
Constructor and Description

ARFFUtil()

Constructors
Constructor and Description
`ARFFUtil()`

Method Summary

Methods
Modifier and Type	Method and Description
`static void`	`printBooleanARFF(TCInvertedIndex ii, WordFrequencyPair[] wfp, java.lang.String category, java.io.PrintStream out)` Convert a `TCInvertedIndex` into an ARFF file for `category` (a single category or <>null<>, representing all categories) and prints the ARFF file onto an output stream.
`static void`	`printDebug(TCInvertedIndex ii, WordFrequencyPair[] wfp, java.io.PrintStream out)` print debug information and all possible ARFF representation this class handles
`static void`	`printOccurARFF(TCInvertedIndex ii, WordFrequencyPair[] wfp, java.lang.String category, java.io.PrintStream out)` Convert a `TCInvertedIndex` into an ARFF file for `category` (a single category or <>null<>, representing all categories) and prints the ARFF file onto an output stream.
`static void`	`printPWeightARFF(TCInvertedIndex ii, WordFrequencyPair[] wfp, java.lang.String category, java.io.PrintStream out)` Convert a `TCInvertedIndex` into an ARFF file for `category` (a single category or <>null<>, representing all categories) and prints the ARFF file onto an output stream.
`static void`	`printTermByDocMatrixCSV(TCInvertedIndex ii, WordFrequencyPair[] wfp, java.io.PrintStream out)`
`static void`	`printTermCoOccurARFF(TCInvertedIndex ii, WordFrequencyPair[] wfp, java.io.PrintStream out)`
`static void`	`printTermCoOccurCSV(TCInvertedIndex ii, WordFrequencyPair[] wfp, java.io.PrintStream out)`
`static void`	`printTFIDFARFF(TCInvertedIndex ii, WordFrequencyPair[] wfp, java.lang.String category, java.io.PrintStream out)` Convert a `TCInvertedIndex` into an ARFF file for `category` (a single category or <>null<>, representing all categories) and prints the ARFF file onto an output stream.

Methods inherited from class modnlp.util.PrintUtil
donePrinting, printNoMove, resetCounter, toString, toString, toString, toString, toString, toString, toString, toString, toString

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ARFFUtil
```
public ARFFUtil()
```
- Method Detail
  - printOccurARFF
```
public static void printOccurARFF(TCInvertedIndex ii,
                  WordFrequencyPair[] wfp,
                  java.lang.String category,
                  java.io.PrintStream out)
```
    Convert a TCInvertedIndex into an ARFF file for category (a single category or <>null<>, representing all categories) and prints the ARFF file onto an output stream. The WordFrequencyPair array restricts the entries of this ARFF file to those terms that occur in wfp (i.e. the terms selected by term set reduction.) Documents are represented as vectors of integers whose elements indicate the number of occurrences of terms in a document.
    
    Parameters:
    ii - a TCInvertedIndex value
    wfp - a WordFrequencyPair[] value
    category - a String representing a category or null representing all categories.
    out - a PrintStream value
  - printBooleanARFF
```
public static void printBooleanARFF(TCInvertedIndex ii,
                    WordFrequencyPair[] wfp,
                    java.lang.String category,
                    java.io.PrintStream out)
```
    Convert a TCInvertedIndex into an ARFF file for category (a single category or <>null<>, representing all categories) and prints the ARFF file onto an output stream. The WordFrequencyPair array restricts the entries of this ARFF file to those terms that occur in wfp (i.e. the terms selected by term set reduction.) Documents are represented as vectors of Boolean values whose elements indicate the occurrence or non-occurrence of terms a the document.
    
    Parameters:
    ii - a TCInvertedIndex value
    wfp - a WordFrequencyPair[] value
    category - a String representing a category or null representing all categories.
    out - a PrintStream value
    See Also:
    NewsItemAsOccurVector.getBooleanTextArray()
  - printTFIDFARFF
```
public static void printTFIDFARFF(TCInvertedIndex ii,
                  WordFrequencyPair[] wfp,
                  java.lang.String category,
                  java.io.PrintStream out)
```
    Convert a TCInvertedIndex into an ARFF file for category (a single category or <>null<>, representing all categories) and prints the ARFF file onto an output stream. The WordFrequencyPair array restricts the entries of this ARFF file to those terms that occur in wfp (i.e. the terms selected by term set reduction.) Documents are represented as vectors of TFIDF values calculated as follows:
```
                                                            size_of_corpus  
        tfidf = no_of_occurrences_of_t_in_d * log ---------------------------------- 
                                                  size_of_subcorpus_in_which_t_occurs 
     
```
    Parameters:
    ii - a TCInvertedIndex value
    wfp - a WordFrequencyPair[] value
    category - a String representing a category or null representing all categories.
    out - a PrintStream value
    See Also:
    NewsItemAsOccurVector.getTFIDFVector(WordFrequencyPair[],int)
  - printPWeightARFF
```
public static void printPWeightARFF(TCInvertedIndex ii,
                    WordFrequencyPair[] wfp,
                    java.lang.String category,
                    java.io.PrintStream out)
```
    Convert a TCInvertedIndex into an ARFF file for category (a single category or <>null<>, representing all categories) and prints the ARFF file onto an output stream. The WordFrequencyPair array restricts the entries of this ARFF file to those terms that occur in wfp (i.e. the terms selected by term set reduction.) Documents are represented as vectors of proportional term weight values calculated as follows:
```
 
                              1 + log no_occurs_term_i_in_j 
    pweight = round ( 10 x  ------------------------------ )
                              1 + log no_terms_in_j

    
```
    if no_terms_in_j > 0. Otherwise pweight = 0.
    Parameters:
    ii - a TCInvertedIndex value
    wfp - a WordFrequencyPair[] value
    category - a String representing a category or null representing all categories.
    out - a PrintStream value
    See Also:
    NewsItemAsOccurVector.getPWEIGHTVector(int)
  - printTermCoOccurARFF
```
public static void printTermCoOccurARFF(TCInvertedIndex ii,
                        WordFrequencyPair[] wfp,
                        java.io.PrintStream out)
```
  - printTermCoOccurCSV
```
public static void printTermCoOccurCSV(TCInvertedIndex ii,
                       WordFrequencyPair[] wfp,
                       java.io.PrintStream out)
```
  - printTermByDocMatrixCSV
```
public static void printTermByDocMatrixCSV(TCInvertedIndex ii,
                           WordFrequencyPair[] wfp,
                           java.io.PrintStream out)
```
  - printDebug
```
public static void printDebug(TCInvertedIndex ii,
              WordFrequencyPair[] wfp,
              java.io.PrintStream out)
```
    print debug information and all possible ARFF representation this class handles

Class ARFFUtil

Constructor Summary

Method Summary

Methods inherited from class modnlp.util.PrintUtil

Methods inherited from class java.lang.Object

Constructor Detail

ARFFUtil

Method Detail

printOccurARFF

printBooleanARFF

printTFIDFARFF

printPWeightARFF

printTermCoOccurARFF

printTermCoOccurCSV

printTermByDocMatrixCSV

printDebug