%PDF- %PDF-
Mini Shell

Mini Shell

Direktori : /usr/lib/python3/dist-packages/chardet/__pycache__/
Upload File :
Create Path :
Current File : //usr/lib/python3/dist-packages/chardet/__pycache__/universaldetector.cpython-312.pyc

�

�d:����dZddlZddlZddlZddlmZmZmZddlm	Z	ddl
mZddlm
Z
mZmZddlmZdd	lmZdd
lmZddlmZddlmZdd
lmZddlmZGd�d�Zy)a
Module containing the UniversalDetector detector class, which is the primary
class a user of ``chardet`` should use.

:author: Mark Pilgrim (initial port to Python)
:author: Shy Shalom (original C code)
:author: Dan Blanchard (major refactoring for 3.0)
:author: Ian Cordasco
�N)�List�Optional�Union�)�CharSetGroupProber)�
CharSetProber)�
InputState�LanguageFilter�ProbingState)�EscCharSetProber)�Latin1Prober)�MacRomanProber)�MBCSGroupProber)�
ResultDict)�SBCSGroupProber)�
UTF1632Proberc	�N�eZdZdZdZejd�Zejd�Zejd�Z	dddd	d
ddd
d�Z
dddddddd�Zejdfdededdfd�Zedefd��Zedefd��Zedeefd��Zd!d�Zdeeefddfd�Zdefd �Zy)"�UniversalDetectoraq
    The ``UniversalDetector`` class underlies the ``chardet.detect`` function
    and coordinates all of the different charset probers.

    To get a ``dict`` containing an encoding and its confidence, you can simply
    run:

    .. code::

            u = UniversalDetector()
            u.feed(some_bytes)
            u.close()
            detected = u.result

    g�������?s[�-�]s(|~{)s[�-�]zWindows-1252zWindows-1250zWindows-1251zWindows-1256zWindows-1253zWindows-1255zWindows-1254zWindows-1257)�
iso-8859-1z
iso-8859-2z
iso-8859-5z
iso-8859-6z
iso-8859-7z
iso-8859-8�
iso-8859-9ziso-8859-13zISO-8859-11�GB18030�CP949�UTF-16)�asciirztis-620r�gb2312zeuc-krzutf-16leF�lang_filter�should_rename_legacy�returnNc��d|_d|_g|_dddd�|_d|_d|_tj|_d|_	||_
tjt�|_d|_||_|j#�y)N���encoding�
confidence�languageF�)�_esc_charset_prober�_utf1632_prober�_charset_probers�result�done�	_got_datar	�
PURE_ASCII�_input_state�
_last_charr�logging�	getLogger�__name__�logger�_has_win_bytesr�reset)�selfrrs   �;/usr/lib/python3/dist-packages/chardet/universaldetector.py�__init__zUniversalDetector.__init__ds���
@D�� �8<���57������#
���
��	����&�1�1������&����'�'��1���#���$8��!��
�
�r%c��|jS�N)r-�r5s r6�input_statezUniversalDetector.input_state{s��� � � r%c��|jSr9)r3r:s r6�
has_win_byteszUniversalDetector.has_win_bytess���"�"�"r%c��|jSr9)r(r:s r6�charset_probersz!UniversalDetector.charset_probers�s���$�$�$r%c�V�dddd�|_d|_d|_d|_tj
|_d|_|jr|jj�|jr|jj�|jD]}|j��y)z�
        Reset the UniversalDetector and all of its probers back to their
        initial states.  This is called by ``__init__``, so you only need to
        call this directly in between analyses of different documents.
        Nr r!Fr%)r)r*r+r3r	r,r-r.r&r4r'r()r5�probers  r6r4zUniversalDetector.reset�s���$(�s��M�����	����#���&�1�1�������#�#��$�$�*�*�,����� � �&�&�(��+�+�	�F��L�L�N�	r%�byte_strc�V	�|jry|syt|t�st|�}|js�|j	t
j�rdddd�|_n�|j	t
jt
jf�rdddd�|_nt|j	d�rdddd�|_nW|j	d	�rd
ddd�|_n:|j	t
jt
jf�rdddd�|_d|_|jd
�d|_y|jtjk(r�|jj!|�rtj"|_nZ|jtjk(r=|j$j!|j&|z�rtj(|_|dd|_|j*st-�|_|j*j.t0j2k(rk|j*j5|�t0j6k(r?|j*j8|j*j;�dd�|_d|_y|jtj(k(r�|j<st?|j@�|_|j<j5|�t0j6k(rS|j<j8|j<j;�|j<jBd�|_d|_yy|jtj"k(�r:|jDs�tG|j@�g|_"|j@tHjJzr#|jDjMtO��|jDjMtQ��|jDjMtS��|jDD]Z}|j5|�t0j6k(s�&|j8|j;�|jBd�|_d|_n|jTj!|�rd|_+yyy)a�
        Takes a chunk of a document and feeds it through all of the relevant
        charset probers.

        After calling ``feed``, you can check the value of the ``done``
        attribute to see if you need to continue feeding the
        ``UniversalDetector`` more data, or if it has made a prediction
        (in the ``result`` attribute).

        .. note::
           You should always call ``close`` when you're done feeding in your
           document if ``done`` is not already ``True``.
        Nz	UTF-8-SIG��?�r!zUTF-32s��zX-ISO-10646-UCS-4-3412s��zX-ISO-10646-UCS-4-2143rTr"���),r*�
isinstance�	bytearrayr+�
startswith�codecs�BOM_UTF8r)�BOM_UTF32_LE�BOM_UTF32_BE�BOM_LE�BOM_BEr-r	r,�HIGH_BYTE_DETECTOR�search�	HIGH_BYTE�ESC_DETECTORr.�	ESC_ASCIIr'r�stater�	DETECTING�feed�FOUND_IT�charset_name�get_confidencer&rrr$r(rr
�NON_CJK�appendrr
r�WIN_BYTE_DETECTORr3)r5rBrAs   r6rWzUniversalDetector.feed�s����9�9�����(�I�.� ��*�H��~�~��"�"�6�?�?�3�!,�"%� "����
�$�$�f�&9�&9�6�;N�;N�%O�P�,4�3�TV�W����$�$�%8�9�!9�"%� "�	����$�$�%8�9�!9�"%� "�	����$�$�f�m�m�V�]�]�%C�D�,4�3�TV�W���!�D�N��{�{�:�&�2� ��	�����
� 5� 5�5��&�&�-�-�h�7�$.�$8�$8��!��!�!�Z�%:�%:�:��%�%�,�,�T�_�_�x�-G�H�$.�$8�$8��!�"�2�3�-����#�#�#0�?�D� ����%�%��)?�)?�?��#�#�(�(��2�l�6K�6K�K� $� 4� 4� A� A�"&�"6�"6�"E�"E�"G� "����
!��	�����
� 4� 4�4��+�+�+;�D�<L�<L�+M��(��'�'�,�,�X�6�,�:O�:O�O� $� 8� 8� E� E�"&�":�":�"I�"I�"K� $� 8� 8� A� A����
!��	�
P��
�
�*�"6�"6�
6��(�(�)8��9I�9I�)J�(K��%��#�#�n�&<�&<�<��)�)�0�0��1B�C��%�%�,�,�\�^�<��%�%�,�,�^�-=�>��/�/�
���;�;�x�(�L�,A�,A�A�$*�$7�$7�&,�&;�&;�&=�$*�O�O�#�D�K�
!%�D�I��
��%�%�,�,�X�6�&*��#�7�#7r%c	�H�|jr|jSd|_|js|jj	d��nD|j
tjk(r
dddd�|_�n|j
tjk(r�d}d}d}|jD]}|s�|j�}||kDs�|}|}�!|r�||jkDr�|j}|�J�|j�}|j�}|jd	�r(|jr|j j#||�}|j$r.|j&j#|xsdj�|�}|||j(d�|_|jj+�t,j.kr�|jd
��|jj	d�|jD]�}|s�t1|t2�rR|j4D]B}|jj	d|j|j(|j���D�h|jj	d|j|j(|j����|jS)
z�
        Stop analyzing the current document and come up with a final
        prediction.

        :returns:  The ``result`` attribute, a ``dict`` with the keys
                   `encoding`, `confidence`, and `language`.
        Tzno data received!rrDrEr!Nr ziso-8859r"z no probers hit minimum thresholdz%s %s confidence = %s)r*r)r+r2�debugr-r	r,rRr(rZ�MINIMUM_THRESHOLDrY�lowerrIr3�ISO_WIN_MAP�getr�
LEGACY_MAPr$�getEffectiveLevelr/�DEBUGrGr�probers)	r5�prober_confidence�max_prober_confidence�
max_proberrArY�lower_charset_namer#�group_probers	         r6�closezUniversalDetector.closesp���9�9��;�;����	��~�~��K�K���1�2��
�
�*�"7�"7�
7�'.�c�r�R�D�K��
�
�*�"6�"6�
6� $��$'�!��J��/�/�
(����$*�$9�$9�$;�!�$�'<�<�,=�)�!'�J�

(��4�t�7M�7M�M�)�6�6��#�/�/�/�%1�%7�%7�%9�"�'�6�6�8�
�&�0�0��<��*�*�'+�'7�'7�';�';�.��(���,�,�#'�?�?�#6�#6�%�+��2�2�4�l�$�L�!-�",� *� 3� 3�����;�;�(�(�*�g�m�m�;��{�{�:�&�.����!�!�"D�E�$(�$9�$9��L�'� �!�,�0B�C�&2�&:�&:��F� �K�K�-�-� 7� &� 3� 3� &��� &� 5� 5� 7�	�����)�)�3�(�5�5�(�1�1�(�7�7�9�	��$�{�{�r%)rN)r1�
__module__�__qualname__�__doc__r`�re�compilerPrSr]rbrdr
�ALL�boolr7�property�intr;r=rrr?r4r�bytesrHrWrrm�r%r6rr8s7��� ��#����N�3���2�:�:�l�+�L�"��
�
�>�2��$�$�$�$�$�$�$�%�	�K� �$� �$�����J�'5�&8�&8�%*��#��#��
�	�.�!�S�!��!��#�t�#��#��%��m�!4�%��%��&A+�U�5�)�#3�4�A+��A+�FM�z�Mr%r)rprJr/rq�typingrrr�charsetgroupproberr�
charsetproberr�enumsr	r
r�	escproberr�latin1proberr
�macromanproberr�mbcsgroupproberr�
resultdictr�sbcsgroupproberr�
utf1632proberrrrxr%r6�<module>r�sH��8���	�(�(�2�(�;�;�'�&�*�,�"�,�(�r�rr%

Zerion Mini Shell 1.0