Parse all strings of specific length?












1














I've exported my email archive of 10 years which is very large.



I want to parse all the text for any string that is 64 characters long in search of a bitcoin private key.



How can I parse strings of a certain length in characters?










share|improve this question
























  • What format are the emails in? Plain text? Maildir?
    – Sparhawk
    4 hours ago










  • @Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
    – Philip Kirkbride
    4 hours ago










  • @Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
    – Philip Kirkbride
    4 hours ago






  • 1




    What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
    – terdon
    4 hours ago












  • The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
    – Jeff Schaller
    1 hour ago
















1














I've exported my email archive of 10 years which is very large.



I want to parse all the text for any string that is 64 characters long in search of a bitcoin private key.



How can I parse strings of a certain length in characters?










share|improve this question
























  • What format are the emails in? Plain text? Maildir?
    – Sparhawk
    4 hours ago










  • @Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
    – Philip Kirkbride
    4 hours ago










  • @Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
    – Philip Kirkbride
    4 hours ago






  • 1




    What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
    – terdon
    4 hours ago












  • The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
    – Jeff Schaller
    1 hour ago














1












1








1







I've exported my email archive of 10 years which is very large.



I want to parse all the text for any string that is 64 characters long in search of a bitcoin private key.



How can I parse strings of a certain length in characters?










share|improve this question















I've exported my email archive of 10 years which is very large.



I want to parse all the text for any string that is 64 characters long in search of a bitcoin private key.



How can I parse strings of a certain length in characters?







text-processing files wildcards pattern-matching






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 3 hours ago









terdon

128k31250426




128k31250426










asked 4 hours ago









Philip Kirkbride

2,4062883




2,4062883












  • What format are the emails in? Plain text? Maildir?
    – Sparhawk
    4 hours ago










  • @Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
    – Philip Kirkbride
    4 hours ago










  • @Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
    – Philip Kirkbride
    4 hours ago






  • 1




    What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
    – terdon
    4 hours ago












  • The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
    – Jeff Schaller
    1 hour ago


















  • What format are the emails in? Plain text? Maildir?
    – Sparhawk
    4 hours ago










  • @Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
    – Philip Kirkbride
    4 hours ago










  • @Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
    – Philip Kirkbride
    4 hours ago






  • 1




    What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
    – terdon
    4 hours ago












  • The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
    – Jeff Schaller
    1 hour ago
















What format are the emails in? Plain text? Maildir?
– Sparhawk
4 hours ago




What format are the emails in? Plain text? Maildir?
– Sparhawk
4 hours ago












@Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
– Philip Kirkbride
4 hours ago




@Sparhawk I'm still downloading the files, I'm hoping they are in txt or something I can cat and parse with a pipe.
– Philip Kirkbride
4 hours ago












@Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
– Philip Kirkbride
4 hours ago




@Sparhawk file type is mbox, hoping to convert it into txt for easy parsing
– Philip Kirkbride
4 hours ago




1




1




What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
– terdon
4 hours ago






What do you mean by "parse" here? Do you just want to find all strings of exactly 64 characters and then parse them? And how are strings defined? Can we assume you mean things that are delineated with whitespace? So anything with a space, a tab, a newline etc on either side of it? And what operating system are you using? Is it Linux? Can we assume you have access to GNU tools?
– terdon
4 hours ago














The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
– Jeff Schaller
1 hour ago




The entire email file is a "string of certain length"; IMHO, the string you're looking for consists of certain characters and is delimited in some way. What can you say about the string besides it being 64 of some character?
– Jeff Schaller
1 hour ago










4 Answers
4






active

oldest

votes


















2














If you want to find all words of length 64 from /path/to/file, you can use



tr '[[:space:]]' 'n' < /path/to/file | grep '^.{64}$'


This replaces all whitespace by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.






share|improve this answer





















  • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
    – Isaac
    2 hours ago










  • @Isaac why? Why are you assuming they can't appear inside the target string?
    – terdon
    1 hour ago



















1














If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



egrep -aro '<[A-F0-9]{64}>' files and dirs ...


Add the -i option or also include the a-f range if some of the keys are in lowercase.



For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



echo ÁRVÍZtűrő tükörFÚRÓgép |
LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
FÚRÓ


Replace p{Lu} with S for non-spaces, etc.






share|improve this answer































    1














    If you have GNU grep (default on Linux), you can do:



    grep -Po '(^|s)S{64}(s|$)' file


    The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



    Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



    grep -Po '(^|s)KS{64}(?=s|$)'


    That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.






    share|improve this answer























    • @Sparhawk it most certainly would, yes. Thanks for pointing it out, answer edited.
      – terdon
      3 hours ago










    • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
      – pizdelect
      3 hours ago










    • This is assuming that the searched strings are in separate lines (posibly with spaces). Other punctuation, beside space (s) could also delimit a "word".
      – Isaac
      2 hours ago










    • @Isaac I had originally written this with the -w string, and then thought to use b, but then realized I don't know (and the question doesn't explain) what characters are allowable. So I had no reason to assume that non-word characters like , or % couldn't be part of the string. Since the OP gave no guidance, I went for whitespace which is the lowest common denominator. I don't get what you mean about separate lines. If multiple strings on the same line match, this will print all of them, so no it doesn't assume that they're on separate lines.
      – terdon
      1 hour ago












    • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
      – terdon
      1 hour ago



















    1














    It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





    • < word start.


    • > word end.


    • b word boundary.


    All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



     grep -o '<.{64}>' file


    If you could use extended regex, the could be reduced:



     grep -oE '<.{64}>' file


    That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



    If you want to be more strict on the selection (hex digits), use:



     grep -oE '<[0-9a-fA-F]{64}>' file


    Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



     LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


    Some implementations of grep (as grep -P, and BSD grep) do not have a "start of word" or "end of word", but have "word boundary":



    grep -oP 'b[0-9a-fA-F]{64}b' file


    There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



    And there are a lot more flavors of "word boundaries".






    share|improve this answer



















    • 1




      the first two examples are completely bogus; they will also match strings like a;;; ... 62 semicolons ... ;;;b. and < and > assertions are also supported on bsd.
      – pizdelect
      2 hours ago












    • About BSD grep @pizdelect
      – Isaac
      1 min ago











    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492725%2fparse-all-strings-of-specific-length%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    2














    If you want to find all words of length 64 from /path/to/file, you can use



    tr '[[:space:]]' 'n' < /path/to/file | grep '^.{64}$'


    This replaces all whitespace by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.






    share|improve this answer





















    • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
      – Isaac
      2 hours ago










    • @Isaac why? Why are you assuming they can't appear inside the target string?
      – terdon
      1 hour ago
















    2














    If you want to find all words of length 64 from /path/to/file, you can use



    tr '[[:space:]]' 'n' < /path/to/file | grep '^.{64}$'


    This replaces all whitespace by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.






    share|improve this answer





















    • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
      – Isaac
      2 hours ago










    • @Isaac why? Why are you assuming they can't appear inside the target string?
      – terdon
      1 hour ago














    2












    2








    2






    If you want to find all words of length 64 from /path/to/file, you can use



    tr '[[:space:]]' 'n' < /path/to/file | grep '^.{64}$'


    This replaces all whitespace by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.






    share|improve this answer












    If you want to find all words of length 64 from /path/to/file, you can use



    tr '[[:space:]]' 'n' < /path/to/file | grep '^.{64}$'


    This replaces all whitespace by newlines, so each word is on its own line. Then it filters this result to include only the words of length 64.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 4 hours ago









    Fox

    5,20411232




    5,20411232












    • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
      – Isaac
      2 hours ago










    • @Isaac why? Why are you assuming they can't appear inside the target string?
      – terdon
      1 hour ago


















    • What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
      – Isaac
      2 hours ago










    • @Isaac why? Why are you assuming they can't appear inside the target string?
      – terdon
      1 hour ago
















    What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
    – Isaac
    2 hours ago




    What about dot (.), comma (,), colon (:), semicolon (;) and many other usual punctuation characters, shoudln't those also be converted to a newline ?
    – Isaac
    2 hours ago












    @Isaac why? Why are you assuming they can't appear inside the target string?
    – terdon
    1 hour ago




    @Isaac why? Why are you assuming they can't appear inside the target string?
    – terdon
    1 hour ago













    1














    If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



    egrep -aro '<[A-F0-9]{64}>' files and dirs ...


    Add the -i option or also include the a-f range if some of the keys are in lowercase.



    For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



    echo ÁRVÍZtűrő tükörFÚRÓgép |
    LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
    FÚRÓ


    Replace p{Lu} with S for non-spaces, etc.






    share|improve this answer




























      1














      If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



      egrep -aro '<[A-F0-9]{64}>' files and dirs ...


      Add the -i option or also include the a-f range if some of the keys are in lowercase.



      For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



      echo ÁRVÍZtűrő tükörFÚRÓgép |
      LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
      FÚRÓ


      Replace p{Lu} with S for non-spaces, etc.






      share|improve this answer


























        1












        1








        1






        If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



        egrep -aro '<[A-F0-9]{64}>' files and dirs ...


        Add the -i option or also include the a-f range if some of the keys are in lowercase.



        For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



        echo ÁRVÍZtűrő tükörFÚRÓgép |
        LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
        FÚRÓ


        Replace p{Lu} with S for non-spaces, etc.






        share|improve this answer














        If you mean to search for a 256-bit number in hexadecimal form (64 chars from the range 0-9 and A-F -- one of the formats in which a bitcoin private key could appear), this should do:



        egrep -aro '<[A-F0-9]{64}>' files and dirs ...


        Add the -i option or also include the a-f range if some of the keys are in lowercase.



        For the general problem of finding runs of characters from the same class having a specified length, you would better use pcre regexps, which could be used with GNU grep with the -P option. For instance, to find runs of uppercase letters from any charset, of min length of 2 and max length of 4, and which are delimited by chars which are not uppercase letters:



        echo ÁRVÍZtűrő tükörFÚRÓgép |
        LC_CTYPE=en_US.UTF-8 grep -Po '(?<!p{Lu})p{Lu}{2,4}(?!p{Lu})'
        FÚRÓ


        Replace p{Lu} with S for non-spaces, etc.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 1 hour ago

























        answered 3 hours ago









        pizdelect

        38016




        38016























            1














            If you have GNU grep (default on Linux), you can do:



            grep -Po '(^|s)S{64}(s|$)' file


            The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



            Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



            grep -Po '(^|s)KS{64}(?=s|$)'


            That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.






            share|improve this answer























            • @Sparhawk it most certainly would, yes. Thanks for pointing it out, answer edited.
              – terdon
              3 hours ago










            • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
              – pizdelect
              3 hours ago










            • This is assuming that the searched strings are in separate lines (posibly with spaces). Other punctuation, beside space (s) could also delimit a "word".
              – Isaac
              2 hours ago










            • @Isaac I had originally written this with the -w string, and then thought to use b, but then realized I don't know (and the question doesn't explain) what characters are allowable. So I had no reason to assume that non-word characters like , or % couldn't be part of the string. Since the OP gave no guidance, I went for whitespace which is the lowest common denominator. I don't get what you mean about separate lines. If multiple strings on the same line match, this will print all of them, so no it doesn't assume that they're on separate lines.
              – terdon
              1 hour ago












            • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
              – terdon
              1 hour ago
















            1














            If you have GNU grep (default on Linux), you can do:



            grep -Po '(^|s)S{64}(s|$)' file


            The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



            Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



            grep -Po '(^|s)KS{64}(?=s|$)'


            That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.






            share|improve this answer























            • @Sparhawk it most certainly would, yes. Thanks for pointing it out, answer edited.
              – terdon
              3 hours ago










            • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
              – pizdelect
              3 hours ago










            • This is assuming that the searched strings are in separate lines (posibly with spaces). Other punctuation, beside space (s) could also delimit a "word".
              – Isaac
              2 hours ago










            • @Isaac I had originally written this with the -w string, and then thought to use b, but then realized I don't know (and the question doesn't explain) what characters are allowable. So I had no reason to assume that non-word characters like , or % couldn't be part of the string. Since the OP gave no guidance, I went for whitespace which is the lowest common denominator. I don't get what you mean about separate lines. If multiple strings on the same line match, this will print all of them, so no it doesn't assume that they're on separate lines.
              – terdon
              1 hour ago












            • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
              – terdon
              1 hour ago














            1












            1








            1






            If you have GNU grep (default on Linux), you can do:



            grep -Po '(^|s)S{64}(s|$)' file


            The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



            Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



            grep -Po '(^|s)KS{64}(?=s|$)'


            That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.






            share|improve this answer














            If you have GNU grep (default on Linux), you can do:



            grep -Po '(^|s)S{64}(s|$)' file


            The -P enables Perl Compatible Regular Expressions, which give us b (word-boundaries) S (non-whitespace) and {N} (find exactly N characters), and the -o means "print only the matching part of the line. Then, we look for stretches of non-whitespace that are exactly 64 characters long that are either at the beginning of the line (^) or after whitespace ('s) and which end either at the end of the line ($) or with another whitespace character.



            Note that the result will include any whitespace characters at the beginning and end of the string, so if you want to parse this further, you might want to use this instead:



            grep -Po '(^|s)KS{64}(?=s|$)'


            That will look for a whitespace character or the beginning of the string (s|^), then discard it K and then look for 64 non-whitespace characters followed by (the (?=foo) is called a "lookahead" and will not be included in the match) either a whitespace character, or the end of the line.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 1 hour ago

























            answered 4 hours ago









            terdon

            128k31250426




            128k31250426












            • @Sparhawk it most certainly would, yes. Thanks for pointing it out, answer edited.
              – terdon
              3 hours ago










            • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
              – pizdelect
              3 hours ago










            • This is assuming that the searched strings are in separate lines (posibly with spaces). Other punctuation, beside space (s) could also delimit a "word".
              – Isaac
              2 hours ago










            • @Isaac I had originally written this with the -w string, and then thought to use b, but then realized I don't know (and the question doesn't explain) what characters are allowable. So I had no reason to assume that non-word characters like , or % couldn't be part of the string. Since the OP gave no guidance, I went for whitespace which is the lowest common denominator. I don't get what you mean about separate lines. If multiple strings on the same line match, this will print all of them, so no it doesn't assume that they're on separate lines.
              – terdon
              1 hour ago












            • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
              – terdon
              1 hour ago


















            • @Sparhawk it most certainly would, yes. Thanks for pointing it out, answer edited.
              – terdon
              3 hours ago










            • pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
              – pizdelect
              3 hours ago










            • This is assuming that the searched strings are in separate lines (posibly with spaces). Other punctuation, beside space (s) could also delimit a "word".
              – Isaac
              2 hours ago










            • @Isaac I had originally written this with the -w string, and then thought to use b, but then realized I don't know (and the question doesn't explain) what characters are allowable. So I had no reason to assume that non-word characters like , or % couldn't be part of the string. Since the OP gave no guidance, I went for whitespace which is the lowest common denominator. I don't get what you mean about separate lines. If multiple strings on the same line match, this will print all of them, so no it doesn't assume that they're on separate lines.
              – terdon
              1 hour ago












            • @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
              – terdon
              1 hour ago
















            @Sparhawk it most certainly would, yes. Thanks for pointing it out, answer edited.
            – terdon
            3 hours ago




            @Sparhawk it most certainly would, yes. Thanks for pointing it out, answer edited.
            – terdon
            3 hours ago












            pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
            – pizdelect
            3 hours ago




            pcre also gives us negative lookahead and lookbehind assertions: grep -Po '(?<!S)S{64}(?!S)' is enough to find runs of 64 non-spaces; but please read my answer for why that's probably not what's intended.
            – pizdelect
            3 hours ago












            This is assuming that the searched strings are in separate lines (posibly with spaces). Other punctuation, beside space (s) could also delimit a "word".
            – Isaac
            2 hours ago




            This is assuming that the searched strings are in separate lines (posibly with spaces). Other punctuation, beside space (s) could also delimit a "word".
            – Isaac
            2 hours ago












            @Isaac I had originally written this with the -w string, and then thought to use b, but then realized I don't know (and the question doesn't explain) what characters are allowable. So I had no reason to assume that non-word characters like , or % couldn't be part of the string. Since the OP gave no guidance, I went for whitespace which is the lowest common denominator. I don't get what you mean about separate lines. If multiple strings on the same line match, this will print all of them, so no it doesn't assume that they're on separate lines.
            – terdon
            1 hour ago






            @Isaac I had originally written this with the -w string, and then thought to use b, but then realized I don't know (and the question doesn't explain) what characters are allowable. So I had no reason to assume that non-word characters like , or % couldn't be part of the string. Since the OP gave no guidance, I went for whitespace which is the lowest common denominator. I don't get what you mean about separate lines. If multiple strings on the same line match, this will print all of them, so no it doesn't assume that they're on separate lines.
            – terdon
            1 hour ago














            @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
            – terdon
            1 hour ago




            @pizdelect yes, I know negative lookbehinds, but I find the K approach much more readable and elegant.
            – terdon
            1 hour ago











            1














            It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





            • < word start.


            • > word end.


            • b word boundary.


            All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



             grep -o '<.{64}>' file


            If you could use extended regex, the could be reduced:



             grep -oE '<.{64}>' file


            That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



            If you want to be more strict on the selection (hex digits), use:



             grep -oE '<[0-9a-fA-F]{64}>' file


            Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



             LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


            Some implementations of grep (as grep -P, and BSD grep) do not have a "start of word" or "end of word", but have "word boundary":



            grep -oP 'b[0-9a-fA-F]{64}b' file


            There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



            And there are a lot more flavors of "word boundaries".






            share|improve this answer



















            • 1




              the first two examples are completely bogus; they will also match strings like a;;; ... 62 semicolons ... ;;;b. and < and > assertions are also supported on bsd.
              – pizdelect
              2 hours ago












            • About BSD grep @pizdelect
              – Isaac
              1 min ago
















            1














            It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





            • < word start.


            • > word end.


            • b word boundary.


            All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



             grep -o '<.{64}>' file


            If you could use extended regex, the could be reduced:



             grep -oE '<.{64}>' file


            That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



            If you want to be more strict on the selection (hex digits), use:



             grep -oE '<[0-9a-fA-F]{64}>' file


            Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



             LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


            Some implementations of grep (as grep -P, and BSD grep) do not have a "start of word" or "end of word", but have "word boundary":



            grep -oP 'b[0-9a-fA-F]{64}b' file


            There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



            And there are a lot more flavors of "word boundaries".






            share|improve this answer



















            • 1




              the first two examples are completely bogus; they will also match strings like a;;; ... 62 semicolons ... ;;;b. and < and > assertions are also supported on bsd.
              – pizdelect
              2 hours ago












            • About BSD grep @pizdelect
              – Isaac
              1 min ago














            1












            1








            1






            It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





            • < word start.


            • > word end.


            • b word boundary.


            All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



             grep -o '<.{64}>' file


            If you could use extended regex, the could be reduced:



             grep -oE '<.{64}>' file


            That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



            If you want to be more strict on the selection (hex digits), use:



             grep -oE '<[0-9a-fA-F]{64}>' file


            Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



             LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


            Some implementations of grep (as grep -P, and BSD grep) do not have a "start of word" or "end of word", but have "word boundary":



            grep -oP 'b[0-9a-fA-F]{64}b' file


            There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



            And there are a lot more flavors of "word boundaries".






            share|improve this answer














            It seems that grep is the correct tool to "search" for an string. What is left to do is to define such string with a regex. The first issue is to define the limits of a word. It is not as simple as "an space", as a book, a lamp use , as word delimiter, in the same concept, many other characters, or even the start or end of a line could act as word delimiter. There are some word delimiters in GNU grep:





            • < word start.


            • > word end.


            • b word boundary.


            All of them assume that a word is a sequence of [a-zA-Z0-9_] characters. If that is ok for you, this regex could work:



             grep -o '<.{64}>' file


            If you could use extended regex, the could be reduced:



             grep -oE '<.{64}>' file


            That selects from a "word start" (<), 64 ({64}) characters (.), to a "word end" (>) and prints only the matching (-o) parts.



            If you want to be more strict on the selection (hex digits), use:



             grep -oE '<[0-9a-fA-F]{64}>' file


            Which will allow hex digits in lowercase or uppercase. But if you really want to be strict, as some non-ASCII characters might be included, use:



             LC_ALL=C grep -oE '<[0-9a-fA-F]{64}>' file


            Some implementations of grep (as grep -P, and BSD grep) do not have a "start of word" or "end of word", but have "word boundary":



            grep -oP 'b[0-9a-fA-F]{64}b' file


            There are some languages that accept the POSIX word boundaries [[:<:]] and [[:>:]], but not perl, and only from PCRE 8.34.



            And there are a lot more flavors of "word boundaries".







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 19 mins ago

























            answered 2 hours ago









            Isaac

            11.4k11650




            11.4k11650








            • 1




              the first two examples are completely bogus; they will also match strings like a;;; ... 62 semicolons ... ;;;b. and < and > assertions are also supported on bsd.
              – pizdelect
              2 hours ago












            • About BSD grep @pizdelect
              – Isaac
              1 min ago














            • 1




              the first two examples are completely bogus; they will also match strings like a;;; ... 62 semicolons ... ;;;b. and < and > assertions are also supported on bsd.
              – pizdelect
              2 hours ago












            • About BSD grep @pizdelect
              – Isaac
              1 min ago








            1




            1




            the first two examples are completely bogus; they will also match strings like a;;; ... 62 semicolons ... ;;;b. and < and > assertions are also supported on bsd.
            – pizdelect
            2 hours ago






            the first two examples are completely bogus; they will also match strings like a;;; ... 62 semicolons ... ;;;b. and < and > assertions are also supported on bsd.
            – pizdelect
            2 hours ago














            About BSD grep @pizdelect
            – Isaac
            1 min ago




            About BSD grep @pizdelect
            – Isaac
            1 min ago


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492725%2fparse-all-strings-of-specific-length%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            數位音樂下載

            When can things happen in Etherscan, such as the picture below?

            格利澤436b