StringArrayEncoding#

class biotite.structure.io.pdbx.StringArrayEncoding(strings: ... = None, data_encoding: ... = None, offset_encoding: ... = None)[source]#

Bases: Encoding

Encoding that compresses an array of strings into an array of indices that point to the unique strings in that array.

The unique strings themselves are stored as part of the StringArrayEncoding as concatenated string. The start index of each unique string in the concatenated string is stored in an offset array.

Parameters:
stringsndarray, optional

The unique strings that are used for encoding. If omitted, the unique strings are determined from the data the first time encode() is called.

data_encodinglist of Encoding, optional

The encodings that are applied to the index array. If omitted, the array is directly encoded into bytes without further compression.

offset_encodinglist of Encoding, optional

The encodings that are applied to the offset array. If omitted, the array is directly encoded into bytes without further compression.

Examples

>>> data = np.array(["apple", "banana", "cherry", "apple", "banana", "apple"])
>>> print(data)
['apple' 'banana' 'cherry' 'apple' 'banana' 'apple']
>>> # By default the indices would directly be encoded into bytes
>>> # However, the indices should be printed here -> data_encoding=[]
>>> encoding = StringArrayEncoding(data_encoding=[])
>>> encoded = encoding.encode(data)
>>> print(encoding.strings)
['apple' 'banana' 'cherry']
>>> print(encoded)
[0 1 2 0 1 0]
Attributes:
stringsndarray
data_encodinglist of Encoding
offset_encodinglist of Encoding
decode(data)#

Apply the inverse of this encoding to the given data.

Parameters:
datandarray or bytes

The data to be decoded.

Returns:
decoded_datandarray

The decoded data.

static deserialize(content)#

Create this component by deserializing the given content.

Parameters:
contentstr or dict

The content to be deserialized. The type of this parameter depends on the file format. In case of CIF files, this is the text of the lines that represent this component. In case of BinaryCIF files, this is a dictionary parsed from the MessagePack data.

encode(data)#

Apply this encoding to the given data.

Parameters:
datandarray

The data to be encoded.

Returns:
encoded_datandarray or bytes

The encoded data.

serialize()#

Convert this component into a Python object that can be written to a file.

Returns:
contentstr or dict

The content to be serialized. The type of this return value depends on the file format. In case of CIF files, this is the text of the lines that represent this component. In case of BinaryCIF files, this is a dictionary that can be encoded into MessagePack.

static subcomponent_class()#

Get the class of the components that are stored in this component.

Returns:
subcomponent_classtype

The class of the subcomponent. If this component already represents the lowest level, i.e. it does not contain subcomponents, None is returned.

static supercomponent_class()#

Get the class of the component that contains this component.

Returns:
supercomponent_classtype

The class of the supercomponent. If this component present already the highest level, i.e. it is not contained in another component, None is returned.