bases.encoding.block

Block base encodings.

Split the bytestring to encode (resp. string to decode) into blocks, then encodes (resp. decodes) each block individually using an underlying encoding. By default, the underlying encoding is a simple base encoding.

Constructor options:

  • block_size: Union[int, Mapping[int, int]] cf. below

  • sep_char: str an optional separator character for encoded string blocks (default: "")

  • reverse_blocks: bool an optional flag to reverse individual char blocks in the encoded string (default: False)

The block_size option is mandatory and determines the allowed block sizes for encoding and decoding:

  • if block_size is a strictly increasing mapping of positive integers to positive integers, its keys are taken to be the allowed block byte sizes and its values are taken to be the corresponding block char sizes.

  • if block_size is an integer, all block byte sizes in range(1, block_size+1) are allowed, and the coresponding block char sizes are computed by:

char_size = int(math.floor(math.log(256**byte_size, base)))+1

The property nbytes2nchars has all valid block byte sizes as keys and the corresponding block char sizes as values. The property nchars2nbytes has all valid block char sizes as keys and the corresponding block byte sizes as values. Each pair of corresponding block byte and char sizes is assessed to ensure that encoding and decoding are unambiguous, using the static methods max_block_nchars and max_block_nbytes from the zeropad base encoding implementation (cf. class ZeropadBaseEncoding).

The maximum valid block byte (resp. char) size is used on encoding (resp. decoding) for all blocks except at most the last one: if the number of bytes (resp. chars) in the last block is not valid, the bytestring (resp. string) is not valid overall.

As a concrete example, the following is the constructor for the base45 encoding:

base45 = BlockBaseEncoding(alphabet.base45, block_size={1: 2, 2: 3})

In this case, encoding uses blocks of 2 bytes, with the final block allowed to be 1 or 2 bytes. Decoding uses blocks of 3 chars, with the final block allowed to be 2 or 3 chars (but not 1 char). Because no encoding was explicitly specified, the encoding used is the simple encoding for the base45 alphabet.

Encoding of a bytestring b:

  1. split b into blocks of size block_nbytes, with the final block allowed to be any size in nbytes2nchars (raise EncodingError if it isn’t)

  2. encode each block individually using the block_encoding

  3. check that no encoded block string exceeds the block char size corresponding to the original block byte size

  4. prepend zero chars to each encoded block string until it reaches the designated block char size

  5. if reverse_blocks, reverse each individual char block

  6. join the blocks into the final encoded string (using the separator character sep_char, if specified)

Decoding of a string s:

  1. split s into blocks of size block_nchars, with the final block allowed to be any size in nchars2nbytes (raise DecodingError if it isn’t)

  2. if reverse_blocks, reverse each individual char block

  3. decode each block individually using the block_encoding

  4. check that no decode block bytestring exceeds the block byte size corresponding to the original block char size

  5. prepend zero bytes to each decoded block bytestring until it reaches the designated block byte size

  6. join the blocks into the final decoded bytestring

BlockBaseEncoding

class BlockBaseEncoding(encoding, *, case_sensitive=None, block_size, sep_char='', reverse_blocks=False)[source]

Bases: BaseEncoding

Block base encodings.

Parameters:
  • alphabet (str, range or Alphabet) – the alphabet to use for the encoding

  • case_sensitive (bool or None, optional) – optional case sensitivity (if None, the one from the alphabet is used)

  • block_size (int or Mapping[int, int]]) – allowed block size(s) for encoding/decoding

  • sep_char (bool, optional) – an optional separator character for encoded string blocks (default: "")

  • reverse_blocks – an optional flag to reverse individual char blocks in the encoded string (default: False)

property block_encoding

The encoding used for individual blocks.

Return type:

BaseEncoding

property block_nbytes

Number of bytes in the largest blocks.

Return type:

int

property block_nchars

Number of characters in the largest blocks.

Return type:

int

canonical_bytes(b)[source]

Returns a canonical version of the bytestring b: this is the bytestring obtained by first encoding b and then decoding it.

(This method is overridden by subclasses with more efficient implementations.)

Parameters:

b (BytesLike) – the bytestring

Return type:

bytes

canonical_string(s)[source]

Returns a canonical version of the string s: this is the string obtained by first decoding s and then encoding it.

(This method is overridden by subclasses with more efficient implementations.)

Parameters:

s (str) – the string

Return type:

str

property nbytes2nchars

Mapping of bytes block sizes to char block sizes.

Return type:

Mapping[int, int]

property nchars2nbytes

Mapping of char block sizes to byte block sizes.

Return type:

Mapping[int, int]

options(skip_defaults=False)[source]

The options used to construct this particular encoding.

Example usage:

>>> encoding.base32.options()
{'char_nbits': 'auto', 'pad_char': '=', 'padding': 'include'}
>>> encoding.base32.options(skip_defaults=True)
{'pad_char': '=', 'padding': 'include'}
Parameters:

skip_defaults (bool, optional) – if set to True, only options with non-default values are included in the mapping

Return type:

Mapping[str, Any]

property reverse_blocks

Whether individual char block should be reversed when encoding, e.g. as done by the base45 spec.

Return type:

bool

property sep_char

Optional block separation character. It is either the empty string, or a string of length 1.

Return type:

str

with_options(**options)[source]

Returns a new encoding with the same kind, alphabet and case sensitivity as this one, but different options.

Parameters:
Return type:

BlockBaseEncodingSubclass

BlockBaseEncodingSubclass

BlockBaseEncodingSubclass = ~BlockBaseEncodingSubclass

Type variable for subclasses of BlockBaseEncoding.