`StringArrayEncoding`#

class biotite.structure.io.pdbx.StringArrayEncoding(strings: ... = None, data_encoding: ... = None, offset_encoding: ... = None)[source]#

Bases: Encoding

Encoding that compresses an array of strings into an array of indices that point to the unique strings in that array.

The unique strings themselves are stored as part of the StringArrayEncoding as concatenated string. The start index of each unique string in the concatenated string is stored in an offset array.

Parameters:

stringsndarray, optional: The unique strings that are used for encoding. If omitted, the unique strings are determined from the data the first time encode() is called.
data_encodinglist of Encoding, optional: The encodings that are applied to the index array. If omitted, the array is directly encoded into bytes without further compression.
offset_encodinglist of Encoding, optional: The encodings that are applied to the offset array. If omitted, the array is directly encoded into bytes without further compression.

Attributes:

stringsndarray
data_encodinglist of Encoding
offset_encodinglist of Encoding

Examples

>>> data = np.array(["apple", "banana", "cherry", "apple", "banana", "apple"])
>>> print(data)
['apple' 'banana' 'cherry' 'apple' 'banana' 'apple']
>>> # By default the indices would directly be encoded into bytes
>>> # However, the indices should be printed here -> data_encoding=[]
>>> encoding = StringArrayEncoding(data_encoding=[])
>>> encoded = encoding.encode(data)
>>> print(encoding.strings)
['apple' 'banana' 'cherry']
>>> print(encoded)
[0 1 2 0 1 0]

decode(data)#

Apply the inverse of this encoding to the given data.

Parameters:

datandarray or bytes: The data to be decoded.

Returns:

decoded_datandarray: The decoded data.

Warning

When overriding this method, do not omit bound checks with @cython.boundscheck(False) or @cython.wraparound(False), since the file content may be invalid/malicious.

static deserialize(content)#

Create this component by deserializing the given content.

Parameters:

contentstr or dict: The content to be deserialized. The type of this parameter depends on the file format. In case of CIF files, this is the text of the lines that represent this component. In case of BinaryCIF files, this is a dictionary parsed from the MessagePack data.

encode(data)#

Apply this encoding to the given data.

Parameters:

datandarray: The data to be encoded.

Returns:

encoded_datandarray or bytes: The encoded data.

serialize()#

Convert this component into a Python object that can be written to a file.

Returns:

contentstr or dict: The content to be serialized. The type of this return value depends on the file format. In case of CIF files, this is the text of the lines that represent this component. In case of BinaryCIF files, this is a dictionary that can be encoded into MessagePack.

static subcomponent_class()#

Get the class of the components that are stored in this component.

Returns:

subcomponent_classtype: The class of the subcomponent. If this component already represents the lowest level, i.e. it does not contain subcomponents, None is returned.

static supercomponent_class()#

Get the class of the component that contains this component.

Returns:

supercomponent_classtype: The class of the supercomponent. If this component present already the highest level, i.e. it is not contained in another component, None is returned.

StringArrayEncoding#

`StringArrayEncoding`#