PoDoFo 0.9.20
Public Member Functions | Static Public Member Functions | Protected Member Functions | List of all members
PoDoFo::PdfTokenizer Class Reference

#include <PdfTokenizer.h>

Inheritance diagram for PoDoFo::PdfTokenizer:
PoDoFo::PdfPostScriptTokenizer

Public Member Functions

bool TryReadNextToken (InputStreamDevice &device, std::string_view &token)
 
bool TryPeekNextToken (InputStreamDevice &device, std::string_view &token)
 
int64_t ReadNextNumber (InputStreamDevice &device)
 
void ReadNextVariant (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt &encrypt={ })
 

Static Public Member Functions

static bool IsWhitespace (char ch)
 
static bool IsDelimiter (char ch)
 
static bool IsTokenDelimiter (char ch, PdfTokenType &tokenType)
 
static bool IsRegular (char ch)
 
static bool IsPrintable (char ch)
 

Protected Member Functions

void ReadNextVariant (InputStreamDevice &device, const std::string_view &token, PdfTokenType tokenType, PdfVariant &variant, const PdfStatefulEncrypt &encrypt)
 
void EnqueueToken (const std::string_view &token, PdfTokenType type)
 
void ReadDictionary (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt &encrypt)
 
void ReadArray (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt &encrypt)
 
void ReadString (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt &encrypt)
 
void ReadHexString (InputStreamDevice &device, PdfVariant &variant, const PdfStatefulEncrypt &encrypt)
 
void ReadName (InputStreamDevice &device, PdfVariant &variant)
 
PdfLiteralDataType DetermineDataType (InputStreamDevice &device, const std::string_view &token, PdfTokenType tokenType, PdfVariant &variant)
 

Detailed Description

A simple tokenizer for PDF files and PDF content streams

Member Function Documentation

◆ DetermineDataType()

PdfTokenizer::PdfLiteralDataType PdfTokenizer::DetermineDataType ( InputStreamDevice device,
const std::string_view &  token,
PdfTokenType  tokenType,
PdfVariant variant 
)
protected

Determine the possible datatype of a token. Numbers, reals, bools or nullptr values are parsed directly by this function and saved to a variant.

Returns
the expected datatype

◆ EnqueueToken()

void PdfTokenizer::EnqueueToken ( const std::string_view &  token,
PdfTokenType  type 
)
protected

Add a token to the queue of tokens. tryReadNextToken() will return all enqueued tokens first before reading new tokens from the input device.

Parameters
tokenstring of the token
typetype of the token
See also
tryReadNextToken

◆ IsDelimiter()

bool PdfTokenizer::IsDelimiter ( char  ch)
static

Returns true if the given character is a delimiter according to the pdf reference

◆ IsPrintable()

bool PdfTokenizer::IsPrintable ( char  ch)
static

True if the passed character is within the generally accepted "printable" ASCII range.

◆ IsRegular()

bool PdfTokenizer::IsRegular ( char  ch)
static

True if the passed character is a regular character according to the PDF reference (Section 3.1.1, Character Set); ie it is neither a white-space nor a delimiter character.

◆ IsTokenDelimiter()

bool PdfTokenizer::IsTokenDelimiter ( char  ch,
PdfTokenType &  tokenType 
)
static

Returns true if the given character is a token delimiter

◆ IsWhitespace()

bool PdfTokenizer::IsWhitespace ( char  ch)
static

Returns true if the given character is a whitespace according to the pdf reference

Returns
true if it is a whitespace character otherwise false

◆ ReadArray()

void PdfTokenizer::ReadArray ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt &  encrypt 
)
protected

Read an array from the input device and store it into a variant.

Parameters
variantstore the array into this variable
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadDictionary()

void PdfTokenizer::ReadDictionary ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt &  encrypt 
)
protected

Read a dictionary from the input device and store it into a variant.

Parameters
variantstore the dictionary into this variable
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadHexString()

void PdfTokenizer::ReadHexString ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt &  encrypt 
)
protected

Read a hex string from the input device and store it into a variant.

Parameters
variantstore the hex string into this variable
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadName()

void PdfTokenizer::ReadName ( InputStreamDevice device,
PdfVariant variant 
)
protected

Read a name from the input device and store it into a variant.

Throws UnexpectedEOF if there is nothing to read.

Parameters
variantstore the name into this variable

◆ ReadNextNumber()

int64_t PdfTokenizer::ReadNextNumber ( InputStreamDevice device)

Read the next number from the current file position ignoring all comments.

Raises NoNumber exception if the next token is no number, and UnexpectedEOF if no token could be read. No token is consumed if NoNumber is thrown.

Returns
a number read from the input device.

◆ ReadNextVariant() [1/2]

void PoDoFo::PdfTokenizer::ReadNextVariant ( InputStreamDevice device,
const std::string_view &  token,
PdfTokenType  tokenType,
PdfVariant variant,
const PdfStatefulEncrypt &  encrypt 
)
protected

Read the next variant from the current file position ignoring all comments.

Raises an exception if there is no variant left in the file.

Parameters
tokena token that has already been read
typetype of the passed token
variantwrite the read variant to this value
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadNextVariant() [2/2]

void PdfTokenizer::ReadNextVariant ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt &  encrypt = { } 
)

Read the next variant from the current file position ignoring all comments.

Raises an UnexpectedEOF exception if there is no variant left in the file.

Parameters
variantwrite the read variant to this value
encryptan encryption object which is used to decrypt strings during parsing

◆ ReadString()

void PdfTokenizer::ReadString ( InputStreamDevice device,
PdfVariant variant,
const PdfStatefulEncrypt &  encrypt 
)
protected

Read a string from the input device and store it into a variant.

Parameters
variantstore the string into this variable
encryptan encryption object which is used to decrypt strings during parsing

◆ TryPeekNextToken()

bool PoDoFo::PdfTokenizer::TryPeekNextToken ( InputStreamDevice device,
std::string_view &  token 
)

Try peek the next token from the current file position ignoring all comments, without actually consuming it

Returns
false if EOF

◆ TryReadNextToken()

bool PoDoFo::PdfTokenizer::TryReadNextToken ( InputStreamDevice device,
std::string_view &  token 
)

Reads the next token from the current file position ignoring all comments.

Parameters
[out]tokenOn true return, set to a pointer to the read token (a nullptr-terminated C string). The pointer is to memory owned by PdfTokenizer and must NOT be freed. The contents are invalidated on the next call to tryReadNextToken(..) and by the destruction of the PdfTokenizer. Undefined on false return.
[out]tokenTypeOn true return, if not nullptr the type of the read token will be stored into this parameter. Undefined on false return.
Returns
True if a token was read, false if there are no more tokens to read.