JClass Elements

PreviousNextIndex

24

String Tokenizer

Features of JCStringTokenizer  Classes  Methods  Examples


24.1 Features of JCStringTokenizer

JCStringTokenizer provides simple linear tokenization of a String. The set of delimiters, which defaults to common whitespace characters, can be specified either during creation or on a per-token basis. It is similar to java.util.StringTokenizer, but delimiters can be included as literals by preceding them with a backslash character (the default). It exhibits this useful behavior: if one delimiter immediately follows another, a null String is returned as the token instead of being skipped over.

JCStringTokenizer has these capabilities:


24.2 Classes

This utility consists of a single class called JCStringTokenizer.

Pass the String to be tokenized to the constructor:

String s = "Hello my friend";
JCStringTokenizer st = new JCStringTokenizer(s);

Process the tokens in the String tokenizer with methods hasMoreTokens() and nextToken().


24.3 Methods

These are the methods of JCStringTokenizer:

countTokens()

Returns the next number of tokens in the String using the delimiter you specify.

getEscapeChar()

Gets the escape character (default: \).

getPosition()

Returns the current scan position within the String.

hasMoreTokens()

Used with nextToken(). Returns true if more tokens exist in the String tokenizer.

nextToken()

Gets the next token from the delimited String. If required, the delimiter can be "escaped" by a backslash character.

 

To include a backslash character, precede it by another backslash character.

nextToken()

Gets the next whitespace-delimited token.

parse()

Given a String a delimiter, and an optional escape character, this method parses the String using the specified delimiter and returns the values in an array of Strings.

 

Use the second form of the command if you wish to set an escape character different from the default, which is the backslash character.

setEscapeChar()

Sets the escape character (default: \). If 0, no escape character is used.


24.4 Examples

At one point, there are two side-by-side commas in the String that is to be split into tokens. The delimiter for tokenization is a comma, so a null is returned as the token in this case. Upon encountering it, println() outputs the word "null" as part of the print stream. Note that leading spaces are not stripped from the tokenized word.

String token, s = "this, is, a,, test";
JCStringTokenizer st = new JCStringTokenizer(s);
while (st.hasMoreTokens()) {
token = st.nextToken(',');
System.out.println(token); }

This prints the following to the console:

this
is
a
null
test

You can remove the leading spaces by passing each token in turn to another String tokenizer whose delimiter is a space.

In the next example, a slightly longer String is parsed based on the delimiter being the space character. As in the previous example, side-by-side spaces are interpreted as having a null token between them.

import com.klg.jclass.util.JCStringTokenizer;

public class StringTokenizerExample {

public static void main(String args[]){

String token, s = "this is a test of the string " + + "tokenizer called JCStringTokenizer. " + "\nThe whitespace between the repeated words is a tab tab. ";
System.out.println("First, the string: " + s);
JCStringTokenizer st = new JCStringTokenizer(s);
while (st.hasMoreTokens()) {
token = st.nextToken(' ');
System.out.println(token);
}

}
}

This time, the output is:

First, the string: this is a test of the string tokenizer called JCStringTokenizer.
The whitespace between the repeated words is a tab tab.
this
null
null
is
a
null
test
of
the
string
tokenizer
called
JCStringTokenizer.

The
whitespace
between
the
repeated
words
is
a
tab
tab.

PreviousNextIndex