BASH Script hangs after some processing on Ubuntu

I have been running below script on a Red Hat server, and it works fine and finishes the job. The file I am feeding it, contains half a million lines in it (approximately 500000 lines), and that's why (to finish it faster) I have added an '&' at the end of while loop block

But now I have setup a Desktop with 8 GB of RAM running Ubuntu 18.04 on it, and running the same code only finishes a few thousand lines and then hangs. I read a bit about it and increased the stack limit to unlimited as well and still it hung after 80000 lines or so, Any suggestions about how can I optimize the code or tune my PC parameters to always finish the job?

while read -r CID60

do    

 { 

       OLT=$(echo "$CID60" | cut -d"|" -f5) 

       ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//') 

       echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv       

  } &     

done < $localpath/$CID7360

Input:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|

output:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column

edited yesterday

GAD3R

1,523821

asked 2 days ago

Ibraheem

185

1

that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

– glenn jackman
2 days ago

@PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

– Ibraheem
yesterday

@glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

– Ibraheem
yesterday

Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

– xenoid
yesterday

@xenoid can you please suggest some sed expression?

– Ibraheem
yesterday

add a comment |

while read -r CID60

do    

 { 

       OLT=$(echo "$CID60" | cut -d"|" -f5) 

       ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//') 

       echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv       

  } &     

done < $localpath/$CID7360

Input:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|

output:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column

edited yesterday

GAD3R

1,523821

asked 2 days ago

Ibraheem

185

1

that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

– glenn jackman
2 days ago

@PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

– Ibraheem
yesterday

@glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

– Ibraheem
yesterday

Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

– xenoid
yesterday

@xenoid can you please suggest some sed expression?

– Ibraheem
yesterday

add a comment |

while read -r CID60

do    

 { 

       OLT=$(echo "$CID60" | cut -d"|" -f5) 

       ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//') 

       echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv       

  } &     

done < $localpath/$CID7360

Input:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|

output:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column

edited yesterday

GAD3R

1,523821

asked 2 days ago

Ibraheem

185

while read -r CID60

do    

 { 

       OLT=$(echo "$CID60" | cut -d"|" -f5) 

       ONID=${OLT}:$(echo "$CID60" | cut -d, -f2 | sed 's/ //g ; s/).*|//') 

       echo $ONID,$(echo "$CID60" | cut -d"|" -f3) >> $localpath/CID_$logfile.csv       

  } &     

done < $localpath/$CID7360

Input:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN45| Unlocked|12-654-0330|Up|202-00_MSRFKH00OL6|P282018767.C2028 ( network, R1.S1.LT7.PON8.ONT81.SERV1 )|



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN46| Unlocked|12-654-0330|Down|202-00_MSRFKH00OL6|P282017856.C881 ( local, R1.S1.LT7.PON8.ONT81.C1.P1 )|



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ASSN52| Unlocked|12-664-1186|Up|202-00_MSRFKH00OL6|P282012623.C2028 ( network, R1.S1.LT7.PON8.ONT75.SERV1 )|

output:

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330



202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

my output of interest is 5th column ( separated with pipe | ) being concatenated with part of last column, and then the third column

bash text-processing background-process

edited yesterday

GAD3R

1,523821

asked 2 days ago

Ibraheem

185

edited yesterday

GAD3R

1,523821

asked 2 days ago

Ibraheem

185

edited yesterday

GAD3R

1,523821

edited yesterday

GAD3R

1,523821

edited yesterday

GAD3R

1,523821

asked 2 days ago

Ibraheem

185

asked 2 days ago

Ibraheem

185

asked 2 days ago

Ibraheem

185

1

that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

– glenn jackman
2 days ago

@PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

– Ibraheem
yesterday

@glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

– Ibraheem
yesterday

Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

– xenoid
yesterday

@xenoid can you please suggest some sed expression?

– Ibraheem
yesterday

add a comment |

1

that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

– glenn jackman
2 days ago

@PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

– Ibraheem
yesterday

@glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

– Ibraheem
yesterday

Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

– xenoid
yesterday

@xenoid can you please suggest some sed expression?

– Ibraheem
yesterday

that's an awful lot of processes to fire off at, more or less, the same time. You might want to wait after some number of lines, or investigate other strategies to parallelize the job (such as GNU parallel)

– glenn jackman
2 days ago

@PerlDuck I have added the input and output of the script. of course it won't run as it is since some of the variables are defined out of this code. Also I am thinking to try sed or awk to do this job, it might be a lot quicker but I need to learn how to write such expression....

– Ibraheem
yesterday

@glennjackman I have been reading about parallel, can you suggest some way how I can use it in a loop like this one above?

– Ibraheem
yesterday

Your code seems amenable to a single sed instruction operating on the input file that would run thousands of times faster. awk would also be a solution.

– xenoid
yesterday

@xenoid can you please suggest some sed expression?

– Ibraheem
yesterday

add a comment |

4 Answers
4

active

oldest

votes

Perl solution

This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.

#!/usr/bin/env perl



use strict;

use warnings;



while( <> ) {

    if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {

        print "$2:$3,$1n";

    }

}

I copied your sample data until I got 1,572,864 lines and then ran it as follows:

me@ubuntu:~> time ./filter.pl < input.txt > output.txt

real    0m3,603s

user    0m3,487s

sys     0m0,100s



me@ubuntu:~> tail -3 output.txt

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

If you prefer one-liners, do:

perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt

answered yesterday

PerlDuck

6,18211334

1

Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

– Ibraheem
yesterday

@Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

– sudodus
yesterday

@sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

– Ibraheem
yesterday

@Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

– sudodus
yesterday

1

I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

– sudodus
yesterday

|
show 2 more comments

A pure sed solution:

sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat

edited yesterday

answered yesterday

xenoid

1,5781416

+1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

– sudodus
yesterday

Yes I know. But mine produces the result in the requested order 🤨🤨

– xenoid
yesterday

That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

– sudodus
yesterday

1

@sudodus Yes, transcription error. Fixed :)

– xenoid
yesterday

I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

– sudodus
yesterday

add a comment |

Oneliner

If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,

< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output

but in a comment you wrote that you need exactly the specified format.

I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

Test

The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.

I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,

Oneliner with cut and sed:

$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

$ wc -l infile

1572864 infile

$ wc -l outfile

1572864 outfile

$ tail outfile

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Timing

We might expect, that a pure sed solution would be faster, but I think that reordering of the data slows it down, so that the cut and sed solution is faster. Both solutions work without any problem in my computer.

Oneliner with cut and sed:

$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile



real    0m8,132s

user    0m8,633s

sys     0m0,617s

A pure sed oneliner by xenoid:

$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 



real    1m8,686s

user    1m8,259s

sys     0m0,344s

A perl oneliner by PerlDuck is faster than the previous oneliners:

$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl



real    0m5,929s

user    0m5,339s

sys     0m0,256s

Oneliner with tr and cut with a tr -s command:

I used tr to convert the spaces in the input file to pipeline characters and then cut could do it all without sed. As you can see, tr is much faster than sed. The tr -s command removes double pipes in the input, which is a good idea, particularly if there can be repeated spaces or pipes in the input file. It does not cost much.

$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut



real    0m1,277s

user    0m1,781s

sys     0m0,925s

Oneliner with tr and cut without the tr -s command, fastest so far:

time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut



real    0m1,199s

user    0m1,020s

sys     0m0,618s





$ tail outfile-tr-cut

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Oneliner with awk, fast but not the fastest,

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output



$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk



real    0m5,091s

user    0m4,724s

sys     0m0,365s

Speed summary: the 'real' time according to time rounded to 1 decimal

1m 8.7s - sed

   8.1s - cut & sed

   5.9s - perl

   5.1s - awk

   1.2s - tr & cut

Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.

$ tail outfile.awk

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

edited yesterday

answered yesterday

sudodus

23.9k32874

2

Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

– PerlDuck
yesterday

1

Note that the desired output is not | separated but uses : and ,.

– PerlDuck
yesterday

1

@PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

– sudodus
yesterday

Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

– PerlDuck
yesterday

@PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

– sudodus
yesterday

|
show 6 more comments

Python

import sys,re



pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

(works with both Python2 and Python3)

Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)

import sys,re



pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

edited yesterday

answered yesterday

xenoid

1,5781416

This one also works fine as expected but a bit slower then the perl one,

– Ibraheem
14 hours ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "89"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1114510%2fbash-script-hangs-after-some-processing-on-ubuntu%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Perl solution

This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.

#!/usr/bin/env perl



use strict;

use warnings;



while( <> ) {

    if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {

        print "$2:$3,$1n";

    }

}

I copied your sample data until I got 1,572,864 lines and then ran it as follows:

me@ubuntu:~> time ./filter.pl < input.txt > output.txt

real    0m3,603s

user    0m3,487s

sys     0m0,100s



me@ubuntu:~> tail -3 output.txt

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

If you prefer one-liners, do:

perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt

answered yesterday

PerlDuck

6,18211334

1

Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

– Ibraheem
yesterday

@Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

– sudodus
yesterday

@sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

– Ibraheem
yesterday

@Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

– sudodus
yesterday

1

I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

– sudodus
yesterday

|
show 2 more comments

Perl solution

This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.

#!/usr/bin/env perl



use strict;

use warnings;



while( <> ) {

    if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {

        print "$2:$3,$1n";

    }

}

I copied your sample data until I got 1,572,864 lines and then ran it as follows:

me@ubuntu:~> time ./filter.pl < input.txt > output.txt

real    0m3,603s

user    0m3,487s

sys     0m0,100s



me@ubuntu:~> tail -3 output.txt

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

If you prefer one-liners, do:

perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt

answered yesterday

PerlDuck

6,18211334

1

Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

– Ibraheem
yesterday

@Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

– sudodus
yesterday

@sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

– Ibraheem
yesterday

@Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

– sudodus
yesterday

1

I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

– sudodus
yesterday

|
show 2 more comments

Perl solution

This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.

#!/usr/bin/env perl



use strict;

use warnings;



while( <> ) {

    if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {

        print "$2:$3,$1n";

    }

}

I copied your sample data until I got 1,572,864 lines and then ran it as follows:

me@ubuntu:~> time ./filter.pl < input.txt > output.txt

real    0m3,603s

user    0m3,487s

sys     0m0,100s



me@ubuntu:~> tail -3 output.txt

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

If you prefer one-liners, do:

perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt

answered yesterday

PerlDuck

6,18211334

Perl solution

This script doesn't do anything in parallel but is quite fast regardless.
Save it as filter.pl (or whatever name you prefer) and make it executable.

#!/usr/bin/env perl



use strict;

use warnings;



while( <> ) {

    if ( /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/ ) {

        print "$2:$3,$1n";

    }

}

I copied your sample data until I got 1,572,864 lines and then ran it as follows:

me@ubuntu:~> time ./filter.pl < input.txt > output.txt

real    0m3,603s

user    0m3,487s

sys     0m0,100s



me@ubuntu:~> tail -3 output.txt

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

If you prefer one-liners, do:

perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt

answered yesterday

PerlDuck

6,18211334

answered yesterday

PerlDuck

6,18211334

answered yesterday

PerlDuck

6,18211334

answered yesterday

PerlDuck

6,18211334

1

Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

– Ibraheem
yesterday

@Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

– sudodus
yesterday

@sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

– Ibraheem
yesterday

@Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

– sudodus
yesterday

1

I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

– sudodus
yesterday

|
show 2 more comments

1

Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

– Ibraheem
yesterday

@Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

– sudodus
yesterday

@sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

– Ibraheem
yesterday

@Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

– sudodus
yesterday

1

I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

– sudodus
yesterday

Indeed this perl solution has been fastest, took about less than a second to process 300K lines, I am preparing some other lookups like scripts, I will be looking forward to further help, thanks everyone, all were helpful, but @perlduck's solution was fastest, and as my original while loop wasn't producing results in order, so the order won't matter for me anyway

– Ibraheem
yesterday

@Ibraheem, Yes this perl solution is very good, probably with a great margin fast enough for your purpose. -- But try my tr and cut solution, which is actually faster in my computer (and I think easier to understand and modify), and wait for a solution with parallel and perl by PerlDuck, which I think can be the fastest of them all.

– sudodus
yesterday

@sudodus I tried your solution, it was really fast (took about 0.205 seconds), but the columns are not coming as I want them and it has a pipe in the middle,

– Ibraheem
yesterday

@Ibraheem, Is it important to have the format that you want (order of columns and separators between the column)? The reason why my solution is fast is that it does as little as possible, still showing what you need (but in a different order). If you prefer another separator, it is possible, space ' ' would cost no extra time, another separator would cost some extra time for a tr or tr -s command, but not very much.

– sudodus
yesterday

I finally made a oneliner with awk, which is on par with the perl oneliner (slightly faster in my computer), maybe easier to understand and edit, if you would need that in the future. The outputs of these two oneliners are exactly the same for the test case. See the end of my answer. Any of the two solutions should be good for you.

– sudodus
yesterday

|
show 2 more comments

A pure sed solution:

sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat

edited yesterday

answered yesterday

xenoid

1,5781416

+1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

– sudodus
yesterday

Yes I know. But mine produces the result in the requested order 🤨🤨

– xenoid
yesterday

That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

– sudodus
yesterday

1

@sudodus Yes, transcription error. Fixed :)

– xenoid
yesterday

I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

– sudodus
yesterday

add a comment |

A pure sed solution:

sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat

edited yesterday

answered yesterday

xenoid

1,5781416

+1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

– sudodus
yesterday

Yes I know. But mine produces the result in the requested order 🤨🤨

– xenoid
yesterday

That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

– sudodus
yesterday

1

@sudodus Yes, transcription error. Fixed :)

– xenoid
yesterday

I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

– sudodus
yesterday

add a comment |

A pure sed solution:

sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat

edited yesterday

answered yesterday

xenoid

1,5781416

A pure sed solution:

sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <in.dat >out.dat

edited yesterday

answered yesterday

xenoid

1,5781416

edited yesterday

answered yesterday

xenoid

1,5781416

answered yesterday

xenoid

1,5781416

answered yesterday

xenoid

1,5781416

+1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

– sudodus
yesterday

Yes I know. But mine produces the result in the requested order 🤨🤨

– xenoid
yesterday

That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

– sudodus
yesterday

1

@sudodus Yes, transcription error. Fixed :)

– xenoid
yesterday

I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

– sudodus
yesterday

add a comment |

+1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

– sudodus
yesterday

Yes I know. But mine produces the result in the requested order 🤨🤨

– xenoid
yesterday

That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

– sudodus
yesterday

1

@sudodus Yes, transcription error. Fixed :)

– xenoid
yesterday

I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

– sudodus
yesterday

+1: Nice with a pure sed solution :-) But my cut and sed solution is faster ;-)

– sudodus
yesterday

Yes I know. But mine produces the result in the requested order 🤨🤨

– xenoid
yesterday

That's right, we will see how important it is to get exactly what the OP prescribes. By the way, I think you drop one character, MSRFKH00OL6 --> MSRFKH00OL in your output. I think you can fix that with a minor edit.

– sudodus
yesterday

@sudodus Yes, transcription error. Fixed :)

– xenoid
yesterday

I timed your new one-liner and it works well, actually slightly faster than before. I don't know if there was something else happening in my computer, anyway, I edited my answer to show the new result :-)

– sudodus
yesterday

add a comment |

Oneliner

If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,

< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output

but in a comment you wrote that you need exactly the specified format.

I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

Test

The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.

I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,

Oneliner with cut and sed:

$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

$ wc -l infile

1572864 infile

$ wc -l outfile

1572864 outfile

$ tail outfile

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Timing

Oneliner with cut and sed:

$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile



real    0m8,132s

user    0m8,633s

sys     0m0,617s

A pure sed oneliner by xenoid:

$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 



real    1m8,686s

user    1m8,259s

sys     0m0,344s

A perl oneliner by PerlDuck is faster than the previous oneliners:

$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl



real    0m5,929s

user    0m5,339s

sys     0m0,256s

Oneliner with tr and cut with a tr -s command:

$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut



real    0m1,277s

user    0m1,781s

sys     0m0,925s

Oneliner with tr and cut without the tr -s command, fastest so far:

time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut



real    0m1,199s

user    0m1,020s

sys     0m0,618s





$ tail outfile-tr-cut

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Oneliner with awk, fast but not the fastest,

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output



$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk



real    0m5,091s

user    0m4,724s

sys     0m0,365s

Speed summary: the 'real' time according to time rounded to 1 decimal

1m 8.7s - sed

   8.1s - cut & sed

   5.9s - perl

   5.1s - awk

   1.2s - tr & cut

Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.

$ tail outfile.awk

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

edited yesterday

answered yesterday

sudodus

23.9k32874

2

Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

– PerlDuck
yesterday

1

Note that the desired output is not | separated but uses : and ,.

– PerlDuck
yesterday

1

@PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

– sudodus
yesterday

Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

– PerlDuck
yesterday

@PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

– sudodus
yesterday

|
show 6 more comments

Oneliner

If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,

< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output

but in a comment you wrote that you need exactly the specified format.

I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

Test

The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.

I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,

Oneliner with cut and sed:

$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

$ wc -l infile

1572864 infile

$ wc -l outfile

1572864 outfile

$ tail outfile

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Timing

Oneliner with cut and sed:

$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile



real    0m8,132s

user    0m8,633s

sys     0m0,617s

A pure sed oneliner by xenoid:

$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 



real    1m8,686s

user    1m8,259s

sys     0m0,344s

A perl oneliner by PerlDuck is faster than the previous oneliners:

$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl



real    0m5,929s

user    0m5,339s

sys     0m0,256s

Oneliner with tr and cut with a tr -s command:

$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut



real    0m1,277s

user    0m1,781s

sys     0m0,925s

Oneliner with tr and cut without the tr -s command, fastest so far:

time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut



real    0m1,199s

user    0m1,020s

sys     0m0,618s





$ tail outfile-tr-cut

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Oneliner with awk, fast but not the fastest,

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output



$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk



real    0m5,091s

user    0m4,724s

sys     0m0,365s

Speed summary: the 'real' time according to time rounded to 1 decimal

1m 8.7s - sed

   8.1s - cut & sed

   5.9s - perl

   5.1s - awk

   1.2s - tr & cut

Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.

$ tail outfile.awk

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

edited yesterday

answered yesterday

sudodus

23.9k32874

2

Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

– PerlDuck
yesterday

1

Note that the desired output is not | separated but uses : and ,.

– PerlDuck
yesterday

1

@PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

– sudodus
yesterday

Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

– PerlDuck
yesterday

@PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

– sudodus
yesterday

|
show 6 more comments

Oneliner

If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,

< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output

but in a comment you wrote that you need exactly the specified format.

I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

Test

The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.

I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,

Oneliner with cut and sed:

$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

$ wc -l infile

1572864 infile

$ wc -l outfile

1572864 outfile

$ tail outfile

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Timing

Oneliner with cut and sed:

$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile



real    0m8,132s

user    0m8,633s

sys     0m0,617s

A pure sed oneliner by xenoid:

$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 



real    1m8,686s

user    1m8,259s

sys     0m0,344s

A perl oneliner by PerlDuck is faster than the previous oneliners:

$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl



real    0m5,929s

user    0m5,339s

sys     0m0,256s

Oneliner with tr and cut with a tr -s command:

$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut



real    0m1,277s

user    0m1,781s

sys     0m0,925s

Oneliner with tr and cut without the tr -s command, fastest so far:

time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut



real    0m1,199s

user    0m1,020s

sys     0m0,618s





$ tail outfile-tr-cut

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Oneliner with awk, fast but not the fastest,

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output



$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk



real    0m5,091s

user    0m4,724s

sys     0m0,365s

Speed summary: the 'real' time according to time rounded to 1 decimal

1m 8.7s - sed

   8.1s - cut & sed

   5.9s - perl

   5.1s - awk

   1.2s - tr & cut

Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.

$ tail outfile.awk

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

edited yesterday

answered yesterday

sudodus

23.9k32874

Oneliner

If the order of the items and the separators can be different from what you specify in the question, I thought the following one-liner would do it,

< input tr ' ' '|' | cut -d '|' -f 4,6,10 > output

but in a comment you wrote that you need exactly the specified format.

I added a solution with 'awk', which is approximately on par with PerlDuck's solution with perl. See the end of this answer.

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output

Test

The test was done in my computer with Lubuntu 18.04.1 LTS, 2*2 processors and 4 GiB RAM.

I made a huge infile by 'doubling 20 times' from your demo input (1572864 lines), so some margin to your 500000 lines,

Oneliner with cut and sed:

$ < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile

$ wc -l infile

1572864 infile

$ wc -l outfile

1572864 outfile

$ tail outfile

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Timing

Oneliner with cut and sed:

$ time < infile cut -d '|' -f 3,5,6 | sed -e 's/|[A-Z].*, /|/' -e 's/ )$//' > outfile



real    0m8,132s

user    0m8,633s

sys     0m0,617s

A pure sed oneliner by xenoid:

$ time sed -r 's/^[^|]+|[^|]+|([^|]+)|[^|]+|([^|]+)|.+( .+, ([^ ]+).+/2:3,1/' <infile > outfile-sed 



real    1m8,686s

user    1m8,259s

sys     0m0,344s

A perl oneliner by PerlDuck is faster than the previous oneliners:

$ time perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < infile > outfile.perl



real    0m5,929s

user    0m5,339s

sys     0m0,256s

Oneliner with tr and cut with a tr -s command:

$ time < infile tr ' ' '|' | tr -s '|' '|' | cut -d '|' -f 3,5,9 > outfile-tr-cut



real    0m1,277s

user    0m1,781s

sys     0m0,925s

Oneliner with tr and cut without the tr -s command, fastest so far:

time < infile tr ' ' '|' | cut -d '|' -f 4,6,10 > outfile-tr-cut



real    0m1,199s

user    0m1,020s

sys     0m0,618s





$ tail outfile-tr-cut

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.SERV1

12-654-0330|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT81.C1.P1

12-664-1186|202-00_MSRFKH00OL6|R1.S1.LT7.PON8.ONT75.SERV1

Oneliner with awk, fast but not the fastest,

< input awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > output



$ time < infile awk '{gsub("\|"," "); print $5 ":" $9 "," $3}' > outfile.awk



real    0m5,091s

user    0m4,724s

sys     0m0,365s

Speed summary: the 'real' time according to time rounded to 1 decimal

1m 8.7s - sed

   8.1s - cut & sed

   5.9s - perl

   5.1s - awk

   1.2s - tr & cut

Finally, I note that the oneliners with sed, perl and awk create an output file with the prescribed format.

$ tail outfile.awk

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.SERV1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT81.C1.P1,12-654-0330

202-00_MSRFKH00OL6:R1.S1.LT7.PON8.ONT75.SERV1,12-664-1186

edited yesterday

answered yesterday

sudodus

23.9k32874

edited yesterday

answered yesterday

sudodus

23.9k32874

answered yesterday

sudodus

23.9k32874

answered yesterday

sudodus

23.9k32874

2

Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

– PerlDuck
yesterday

1

Note that the desired output is not | separated but uses : and ,.

– PerlDuck
yesterday

1

@PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

– sudodus
yesterday

Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

– PerlDuck
yesterday

@PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

– sudodus
yesterday

|
show 6 more comments

2

Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

– PerlDuck
yesterday

1

Note that the desired output is not | separated but uses : and ,.

– PerlDuck
yesterday

1

@PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

– sudodus
yesterday

Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

– PerlDuck
yesterday

@PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

– sudodus
yesterday

Nice :-) Try perl -lne 'print "$2:$3,$1" if /^(?:[^|]+|){2}([^|]+)|[^|]+|([^|]+)|[^,]+,s*(S+)/;' < input.txt > output.txt. I also repeated the input until I got 1,572,864 lines and it runs in ~3,5 seconds on my machine.

– PerlDuck
yesterday

Note that the desired output is not | separated but uses : and ,.

– PerlDuck
yesterday

@PerlDuck, 1. Yes, I will time your perl expression :-) 2. I know (and wrote about it in the beginning of my answer), that it is not exactly what the OP wants, but similar enough to be useful, and, I think, faster than if rearranged to the exact specification.

– sudodus
yesterday

Sorry, I missed your introductory sentence about the different separators. Btw., this is an interesting approach using GNU parallel ;-)

– PerlDuck
yesterday

@PerlDuck, If you make an answer with your fast perl oneliner, I will upvote it :-)

– sudodus
yesterday

|
show 6 more comments

Python

import sys,re



pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

(works with both Python2 and Python3)

Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)

import sys,re



pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

edited yesterday

answered yesterday

xenoid

1,5781416

This one also works fine as expected but a bit slower then the perl one,

– Ibraheem
14 hours ago

add a comment |

Python

import sys,re



pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

(works with both Python2 and Python3)

Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)

import sys,re



pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

edited yesterday

answered yesterday

xenoid

1,5781416

This one also works fine as expected but a bit slower then the perl one,

– Ibraheem
14 hours ago

add a comment |

Python

import sys,re



pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

(works with both Python2 and Python3)

Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)

import sys,re



pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

edited yesterday

answered yesterday

xenoid

1,5781416

Python

import sys,re



pattern=re.compile(r'^.+|.+|(.+)|.+|(.+)|.+, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

(works with both Python2 and Python3)

Using a regex with non-greedy matches is 4x faster (avoids backtracking?) and puts python on par with the cut/sed method (python2 being a bit faster than python3)

import sys,re



pattern=re.compile(r'^[^|]+?|[^|]+?|([^|]+?)|[^|]+?|([^|]+?)|[^,]+?, (.+) )|$')



for line in sys.stdin:

match=pattern.match(line)

if match:

    print(match.group(2)+':'+match.group(3)+','+match.group(1))

edited yesterday

answered yesterday

xenoid

1,5781416

edited yesterday

answered yesterday

xenoid

1,5781416

answered yesterday

xenoid

1,5781416

answered yesterday

xenoid

1,5781416

This one also works fine as expected but a bit slower then the perl one,

– Ibraheem
14 hours ago

add a comment |

This one also works fine as expected but a bit slower then the perl one,

– Ibraheem
14 hours ago

This one also works fine as expected but a bit slower then the perl one,

– Ibraheem
14 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Ask Ubuntu!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfxtrjtrk

BASH Script hangs after some processing on Ubuntu

4 Answers
4

Perl solution

Oneliner

Test

Timing

Python

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Perl solution

Perl solution

Perl solution

Perl solution

Oneliner

Test

Timing

Oneliner

Test

Timing

Oneliner

Test

Timing

Oneliner

Test

Timing

Python

Python

Python

Python

Post as a guest

Popular posts from this blog

香港中文大學

空中巴士A300-600ST

波音707

BASH Script hangs after some processing on Ubuntu

4 Answers 4

Perl solution

Oneliner

Test

Timing

Python

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Perl solution

Perl solution

Perl solution

Perl solution

Oneliner

Test

Timing

Oneliner

Test

Timing

Oneliner

Test

Timing

Oneliner

Test

Timing

Python

Python

Python

Python

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

香港中文大學

空中巴士A300-600ST

波音707

4 Answers
4

4 Answers
4

4 Answers
4