next up previous contents
Next: Street network data and Up: Some programming recommendations Previous: Iterators   Contents

Tokenizer

In order to read line-oriented input files, it is useful to first read the complete line (getline), and then to parse it. This can look as follows:


{}
assert( inFile.is_open() ) ;
typedef vector<string> Tokens; Tokens tokens ;
while ( !inFile.eof() ) {
    string aString ; getline( inFile, aString ) ;
    if ( !aString.empty() ) { // ( skip empty lines )
        tokenize( aString, tokens ) ;
        for ( Tokens::iterator tt=tokens.begin() ; ii!=tokens.end() ; ii++ ) {
            cout << *tt << "\n" ;
        }
    }
}

As of 2003, there is unfortunately no standard tokenizer for C++. A simple tokenizer, which separates on white spaces (such as blanks and tabs), is the following (from the linux C++-programming-howto):


{}
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
...
/home/nagel/src/book/sim/book/tokenize This is slightly modified when compared to the original version in so far as it puts ``TRASH'' into the zeroth element so that the counting of tokens starts with one. This has the advantage that a token from the $n$th column will be in token[n].



2004-02-02