NAME - Check to see if files are likely to be SPAM


The following options are supported

Add Header To Files (-a, --add_header_level)

Add a header to file if spam probability is greater or equal to this value. Default value is 2 which means that a header will never be added because the smallest probability is 0 and greatest is 1. This allows for you to have a header added only if the file is above a certain thresh-hold.

The following example will only add a header if the spam probability is at least 0.90.

example: perl -w -s ./*.msg -p prob.dat -a 0.90

Case Sensitive Tokens (--case or -c)

Tokens are not considered case sensitive by default. If you desire that the tokens Hello and hello be considered different, turn case sensitive tokens on.

example: perl -w -c -s ./*.msg -p prob.dat

Case Sensitive File Search (--file_case or -f)

Files are specified based on ``file specifications''. By default, the file specs are assumed to be case insensitive. In the UNIX world, this may make a difference so you can turn case sensitivity on with this option.

example: perl -w -fc -s ./*.msg -p prob.dat

Help (-h or -?)

Print useage instructions

example: perl -w -h

Log File Name (-l or --log)

If a logfile is specified, then this is used as the logfile name. By default, the log tokenize_file.log is created.

Log Configuration Files (--log_cfg)

You can create a configuration file for your logger and then configure your log object by simply telling it to read the specified configuration file. To create an initial configuration file, write a perl script that creates a logger, configures the logger, and then use the write_to_file('log_cfg.dat') method.

This provides complete control over how the logger is configured. You can set screen and file output levels, for example.

example: perl -w -c -s ./*.msg -p prob.dat --log_cfg ~andy/logs/default_log.dat

Log File Directory (--log_dir)

This allows you to specify which directory contains the log

example: perl -w -c -s ./*.msg -p prob.dat --log_dir ~andy/logs

Probability Token File (-p or --prob)

This provides a method of specifying the name of the probability token data file.

example: perl -w -c -s ./*.msg -p prob.dat

Recurse Directories (-r or --recurse)

This causes all directories under the specified directory to be searched for the given file spec.

example: perl -w -r -s ./*.msg -p prob.dat

Setting the SPAM Limit (Sensitiveity) with (-sl or --spam_limit)

By default, a file is considered SPAM if the probability is greater than 0.90. You can make this less or more sensitive by changing this value.

To make this more strict, for example, change the probability to 0.95.

example: perl -w -r -s ./*.msg -p prob.dat -sl 0.95

File Specs (-s or --spec)

This specifies the file specs to search. If you desire to have three sets of file specs, then include the spec parameter three times.

example: perl -w -r -s ./*.msg -p prob.dat -s *.MES -s ~andy/*.msg


Copyright 1998-2002, Andrew Pitonyak (

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Modification History

September 10, 2002

Version 1.00 First release