How to use Pandas to get the count of every combination inclusive

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I am trying to figure out what combination of clothing customers are buying together. I can figure out the exact combination, but the problem I can't figure out is the count that includes the combination + others.

For example, I have:

Cust_num  Item    Rev

Cust1     Shirt1  $40

Cust1     Shirt2  $40

Cust1     Shorts1 $40

Cust2     Shirt1  $40

Cust2     Shorts1 $40

This should result in:

Combo                  Count

Shirt1,Shirt2,Shorts1    1

Shirt1,Shorts1           2

The best I can do is unique combinations:

Combo                 Count

Shirt1,Shirt2,Shorts1   1

Shirt1,Shorts1          1

I tried:

df = df.pivot(index='Cust_num',columns='Item').sum()

df[df.notnull()] = "x"

df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))

col = df.stack().groupby(level=0).apply(','.join)

df2 = pd.DataFrame(col)

df2.groupby([0]).size().reset_index(name='counts')

But that is just the unique counts.

asked Apr 8 at 3:38

lys_dad

565

New contributor

3

I feel like this is one sort of problem pandas would not be suitable for.

– coldspeed
Apr 8 at 4:31

add a comment |

For example, I have:

Cust_num  Item    Rev

Cust1     Shirt1  $40

Cust1     Shirt2  $40

Cust1     Shorts1 $40

Cust2     Shirt1  $40

Cust2     Shorts1 $40

This should result in:

Combo                  Count

Shirt1,Shirt2,Shorts1    1

Shirt1,Shorts1           2

The best I can do is unique combinations:

Combo                 Count

Shirt1,Shirt2,Shorts1   1

Shirt1,Shorts1          1

I tried:

df = df.pivot(index='Cust_num',columns='Item').sum()

df[df.notnull()] = "x"

df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))

col = df.stack().groupby(level=0).apply(','.join)

df2 = pd.DataFrame(col)

df2.groupby([0]).size().reset_index(name='counts')

But that is just the unique counts.

asked Apr 8 at 3:38

lys_dad

565

New contributor

3

I feel like this is one sort of problem pandas would not be suitable for.

– coldspeed
Apr 8 at 4:31

add a comment |

For example, I have:

Cust_num  Item    Rev

Cust1     Shirt1  $40

Cust1     Shirt2  $40

Cust1     Shorts1 $40

Cust2     Shirt1  $40

Cust2     Shorts1 $40

This should result in:

Combo                  Count

Shirt1,Shirt2,Shorts1    1

Shirt1,Shorts1           2

The best I can do is unique combinations:

Combo                 Count

Shirt1,Shirt2,Shorts1   1

Shirt1,Shorts1          1

I tried:

df = df.pivot(index='Cust_num',columns='Item').sum()

df[df.notnull()] = "x"

df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))

col = df.stack().groupby(level=0).apply(','.join)

df2 = pd.DataFrame(col)

df2.groupby([0]).size().reset_index(name='counts')

But that is just the unique counts.

asked Apr 8 at 3:38

lys_dad

565

New contributor

For example, I have:

Cust_num  Item    Rev

Cust1     Shirt1  $40

Cust1     Shirt2  $40

Cust1     Shorts1 $40

Cust2     Shirt1  $40

Cust2     Shorts1 $40

This should result in:

Combo                  Count

Shirt1,Shirt2,Shorts1    1

Shirt1,Shorts1           2

The best I can do is unique combinations:

Combo                 Count

Shirt1,Shirt2,Shorts1   1

Shirt1,Shorts1          1

I tried:

df = df.pivot(index='Cust_num',columns='Item').sum()

df[df.notnull()] = "x"

df = df.loc[:,"Shirt1":].replace("x", pd.Series(df.columns, df.columns))

col = df.stack().groupby(level=0).apply(','.join)

df2 = pd.DataFrame(col)

df2.groupby([0]).size().reset_index(name='counts')

But that is just the unique counts.

python pandas

asked Apr 8 at 3:38

lys_dad

565

New contributor

asked Apr 8 at 3:38

lys_dad

565

New contributor

asked Apr 8 at 3:38

lys_dad

565

New contributor

asked Apr 8 at 3:38

lys_dad

565

asked Apr 8 at 3:38

lys_dad

565

New contributor

lys_dad is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

3

I feel like this is one sort of problem pandas would not be suitable for.

– coldspeed
Apr 8 at 4:31

add a comment |

3

I feel like this is one sort of problem pandas would not be suitable for.

– coldspeed
Apr 8 at 4:31

I feel like this is one sort of problem pandas would not be suitable for.

– coldspeed
Apr 8 at 4:31

add a comment |

4 Answers
4

active

oldest

votes

Using pandas.DataFrame.groupby:

grouped_item = df.groupby('Cust_num')['Item']

subsets = grouped_item.apply(lambda x: set(x)).tolist()

Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]

combo = grouped_item.apply(lambda x:','.join(x))

combo = combo.reset_index()

combo['Count']=Count

Output:

  Cust_num                   Item  Count

0    Cust1  Shirt1,Shirt2,Shorts1      1

1    Cust2         Shirt1,Shorts1      2

answered Apr 8 at 4:26

Chris

3,811523

How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

– Kill3rbee
2 days ago

@Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

– Kill3rbee
2 days ago

@LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

– Chris
2 days ago

I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

– Chris
2 days ago

Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

– lys_dad
2 days ago

add a comment |

Late answer, but you can use:

df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})

df['Count'] = df['Count'].str.replace(r'Cust','')

combo                   Count                 

Shirt1,Shirt2,Shorts1     1

Shirt1,Shorts1            2

edited Apr 8 at 5:25

answered Apr 8 at 4:58

Pedro Lobito

50.6k16138172

add a comment |

I think you need to create a combination of items first.

How to get all possible combinations of a list’s elements?

I used the function from Dan H's answer.

from itertools import chain, combinations

def all_subsets(ss):

    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))



uq_items = df.Item.unique()



list(all_subsets(uq_items))



[(),

 ('Shirt1',),

 ('Shirt2',),

 ('Shorts1',),

 ('Shirt1', 'Shirt2'),

 ('Shirt1', 'Shorts1'),

 ('Shirt2', 'Shorts1'),

 ('Shirt1', 'Shirt2', 'Shorts1')]

And use groupby each customer to get their items combination.

ls = 



for _, d in df.groupby('Cust_num', group_keys=False):

    # Get all possible subset of items

    pi = np.array(list(all_subsets(d.Item)))



    # Fliter only > 1

    ls.append(pi[[len(l) > 1 for l in pi]])

Then convert to Series and use value_counts().

pd.Series(np.concatenate(ls)).value_counts()



(Shirt1, Shorts1)            2

(Shirt2, Shorts1)            1

(Shirt1, Shirt2, Shorts1)    1

(Shirt1, Shirt2)             1

edited 2 days ago

answered Apr 8 at 4:56

ResidentSleeper

48710

This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

– lys_dad
2 days ago

I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

– lys_dad
2 days ago

@lys_dad The accepted answer has already solved your memory problem, right?

– ResidentSleeper
2 days ago

1

It does, yes. But thank you for your elegant solution!

– lys_dad
2 days ago

add a comment |

My version which I believe is easier to understand

new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})



new_df ['count'] = range(1, len(new_df ) + 1)

Output:

                            Item      Rev count

                        <lambda> <lambda>      

Cust_num                                       

Cust1      Shirt1 Shirt2 Shorts1      $40     1

Cust2             Shirt1 Shorts1      $40     2

Since you do not need the Rev column, you can drop it:

new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()



new_df

Output:

  Cust_num                    Item count

                          <lambda>      

0    Cust1   Shirt1 Shirt2 Shorts1     1

1    Cust2          Shirt1 Shorts1     2

This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:

[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]

Then next step finds the subsets:

for s1 in subsets:

    for s2 in subsets:

        if s2.issubset(s1):

            print("{}: {}".format(s2,s2.issubset(s1)))

Output:

{' Shirt2', ' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

You asked me to explain myself and I did. However after thinking about it, I realized your approach was wrong too. As such I was not mocking you, but thanking you for making me think about my solution. Also thanks to @ResidentSleeper for his solution.

edited 2 days ago

answered Apr 8 at 5:16

Kill3rbee

13410

@Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

– Kill3rbee
2 days ago

@Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

– Kill3rbee
2 days ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

lys_dad is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55565916%2fhow-to-use-pandas-to-get-the-count-of-every-combination-inclusive%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Using pandas.DataFrame.groupby:

grouped_item = df.groupby('Cust_num')['Item']

subsets = grouped_item.apply(lambda x: set(x)).tolist()

Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]

combo = grouped_item.apply(lambda x:','.join(x))

combo = combo.reset_index()

combo['Count']=Count

Output:

  Cust_num                   Item  Count

0    Cust1  Shirt1,Shirt2,Shorts1      1

1    Cust2         Shirt1,Shorts1      2

answered Apr 8 at 4:26

Chris

3,811523

How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

– Kill3rbee
2 days ago

@Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

– Kill3rbee
2 days ago

@LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

– Chris
2 days ago

I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

– Chris
2 days ago

Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

– lys_dad
2 days ago

add a comment |

Using pandas.DataFrame.groupby:

grouped_item = df.groupby('Cust_num')['Item']

subsets = grouped_item.apply(lambda x: set(x)).tolist()

Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]

combo = grouped_item.apply(lambda x:','.join(x))

combo = combo.reset_index()

combo['Count']=Count

Output:

  Cust_num                   Item  Count

0    Cust1  Shirt1,Shirt2,Shorts1      1

1    Cust2         Shirt1,Shorts1      2

answered Apr 8 at 4:26

Chris

3,811523

How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

– Kill3rbee
2 days ago

@Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

– Kill3rbee
2 days ago

@LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

– Chris
2 days ago

I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

– Chris
2 days ago

Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

– lys_dad
2 days ago

add a comment |

Using pandas.DataFrame.groupby:

grouped_item = df.groupby('Cust_num')['Item']

subsets = grouped_item.apply(lambda x: set(x)).tolist()

Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]

combo = grouped_item.apply(lambda x:','.join(x))

combo = combo.reset_index()

combo['Count']=Count

Output:

  Cust_num                   Item  Count

0    Cust1  Shirt1,Shirt2,Shorts1      1

1    Cust2         Shirt1,Shorts1      2

answered Apr 8 at 4:26

Chris

3,811523

Using pandas.DataFrame.groupby:

grouped_item = df.groupby('Cust_num')['Item']

subsets = grouped_item.apply(lambda x: set(x)).tolist()

Count = [sum(s2.issubset(s1) for s1 in subsets) for s2 in subsets]

combo = grouped_item.apply(lambda x:','.join(x))

combo = combo.reset_index()

combo['Count']=Count

Output:

  Cust_num                   Item  Count

0    Cust1  Shirt1,Shirt2,Shorts1      1

1    Cust2         Shirt1,Shorts1      2

answered Apr 8 at 4:26

Chris

3,811523

answered Apr 8 at 4:26

Chris

3,811523

answered Apr 8 at 4:26

Chris

3,811523

answered Apr 8 at 4:26

Chris

3,811523

How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

– Kill3rbee
2 days ago

@Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

– Kill3rbee
2 days ago

@LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

– Chris
2 days ago

I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

– Chris
2 days ago

Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

– lys_dad
2 days ago

add a comment |

How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

– Kill3rbee
2 days ago

@Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

– Kill3rbee
2 days ago

@LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

– Chris
2 days ago

I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

– Chris
2 days ago

Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

– lys_dad
2 days ago

How is finding subsets finding inclusive combination of df['Item']? Subsets are not combinations. Below is output of what you doing: {' Shirt2', ' Shorts1', ' Shirt1'}: True followed by {' Shorts1', ' Shirt1'}: True and then {' Shorts1', ' Shirt1'}: True. Then you sum then to get [1,2]. I agree my approach i did is wrong so is yours. I would think @ResidentSleeper has correct answer.

– Kill3rbee
2 days ago

@Chris, I think you need to find combination of Item first which would give you your subsets. You would sum unique. FYI, I did not mock your comment. I was asking a question just like you asked me a question. I did not throw a hissy or downvote you. Thanks for sharing

– Kill3rbee
2 days ago

@LeeMtoti Apologies for a strong language. I've deleted it. BTW, grouped_item contains a combination of each CustN's choice of clothing. lambda x: set(x) was implemented for a issubset comparison. As you pointed out, issubset returns True if and only if a set is contained in other set, which I still believe is what OP wants.

– Chris
2 days ago

I believe the term combination is subtle misleading. My understanding is a set of each customer's choices of Item. Hope this makes my answer and intention clearer.

– Chris
2 days ago

Apologies if the language wasn't correct. I wasn't sure how to describe it. I tried this with the first 1,000 customers, and it worked! I'm running it now with the full list.

– lys_dad
2 days ago

add a comment |

Late answer, but you can use:

df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})

df['Count'] = df['Count'].str.replace(r'Cust','')

combo                   Count                 

Shirt1,Shirt2,Shorts1     1

Shirt1,Shorts1            2

edited Apr 8 at 5:25

answered Apr 8 at 4:58

Pedro Lobito

50.6k16138172

add a comment |

Late answer, but you can use:

df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})

df['Count'] = df['Count'].str.replace(r'Cust','')

combo                   Count                 

Shirt1,Shirt2,Shorts1     1

Shirt1,Shorts1            2

edited Apr 8 at 5:25

answered Apr 8 at 4:58

Pedro Lobito

50.6k16138172

add a comment |

Late answer, but you can use:

df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})

df['Count'] = df['Count'].str.replace(r'Cust','')

combo                   Count                 

Shirt1,Shirt2,Shorts1     1

Shirt1,Shorts1            2

edited Apr 8 at 5:25

answered Apr 8 at 4:58

Pedro Lobito

50.6k16138172

Late answer, but you can use:

df = df.groupby(['Cust_num'], as_index=False).agg(','.join).drop(columns=['Rev']).set_index(['Item']).rename_axis("combo").rename(columns={"Cust_num": "Count"})

df['Count'] = df['Count'].str.replace(r'Cust','')

combo                   Count                 

Shirt1,Shirt2,Shorts1     1

Shirt1,Shorts1            2

edited Apr 8 at 5:25

answered Apr 8 at 4:58

Pedro Lobito

50.6k16138172

edited Apr 8 at 5:25

answered Apr 8 at 4:58

Pedro Lobito

50.6k16138172

answered Apr 8 at 4:58

Pedro Lobito

50.6k16138172

answered Apr 8 at 4:58

Pedro Lobito

50.6k16138172

add a comment |

I think you need to create a combination of items first.

How to get all possible combinations of a list’s elements?

I used the function from Dan H's answer.

from itertools import chain, combinations

def all_subsets(ss):

    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))



uq_items = df.Item.unique()



list(all_subsets(uq_items))



[(),

 ('Shirt1',),

 ('Shirt2',),

 ('Shorts1',),

 ('Shirt1', 'Shirt2'),

 ('Shirt1', 'Shorts1'),

 ('Shirt2', 'Shorts1'),

 ('Shirt1', 'Shirt2', 'Shorts1')]

And use groupby each customer to get their items combination.

ls = 



for _, d in df.groupby('Cust_num', group_keys=False):

    # Get all possible subset of items

    pi = np.array(list(all_subsets(d.Item)))



    # Fliter only > 1

    ls.append(pi[[len(l) > 1 for l in pi]])

Then convert to Series and use value_counts().

pd.Series(np.concatenate(ls)).value_counts()



(Shirt1, Shorts1)            2

(Shirt2, Shorts1)            1

(Shirt1, Shirt2, Shorts1)    1

(Shirt1, Shirt2)             1

edited 2 days ago

answered Apr 8 at 4:56

ResidentSleeper

48710

This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

– lys_dad
2 days ago

I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

– lys_dad
2 days ago

@lys_dad The accepted answer has already solved your memory problem, right?

– ResidentSleeper
2 days ago

1

It does, yes. But thank you for your elegant solution!

– lys_dad
2 days ago

add a comment |

I think you need to create a combination of items first.

How to get all possible combinations of a list’s elements?

I used the function from Dan H's answer.

from itertools import chain, combinations

def all_subsets(ss):

    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))



uq_items = df.Item.unique()



list(all_subsets(uq_items))



[(),

 ('Shirt1',),

 ('Shirt2',),

 ('Shorts1',),

 ('Shirt1', 'Shirt2'),

 ('Shirt1', 'Shorts1'),

 ('Shirt2', 'Shorts1'),

 ('Shirt1', 'Shirt2', 'Shorts1')]

And use groupby each customer to get their items combination.

ls = 



for _, d in df.groupby('Cust_num', group_keys=False):

    # Get all possible subset of items

    pi = np.array(list(all_subsets(d.Item)))



    # Fliter only > 1

    ls.append(pi[[len(l) > 1 for l in pi]])

Then convert to Series and use value_counts().

pd.Series(np.concatenate(ls)).value_counts()



(Shirt1, Shorts1)            2

(Shirt2, Shorts1)            1

(Shirt1, Shirt2, Shorts1)    1

(Shirt1, Shirt2)             1

edited 2 days ago

answered Apr 8 at 4:56

ResidentSleeper

48710

This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

– lys_dad
2 days ago

I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

– lys_dad
2 days ago

@lys_dad The accepted answer has already solved your memory problem, right?

– ResidentSleeper
2 days ago

1

It does, yes. But thank you for your elegant solution!

– lys_dad
2 days ago

add a comment |

I think you need to create a combination of items first.

How to get all possible combinations of a list’s elements?

I used the function from Dan H's answer.

from itertools import chain, combinations

def all_subsets(ss):

    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))



uq_items = df.Item.unique()



list(all_subsets(uq_items))



[(),

 ('Shirt1',),

 ('Shirt2',),

 ('Shorts1',),

 ('Shirt1', 'Shirt2'),

 ('Shirt1', 'Shorts1'),

 ('Shirt2', 'Shorts1'),

 ('Shirt1', 'Shirt2', 'Shorts1')]

And use groupby each customer to get their items combination.

ls = 



for _, d in df.groupby('Cust_num', group_keys=False):

    # Get all possible subset of items

    pi = np.array(list(all_subsets(d.Item)))



    # Fliter only > 1

    ls.append(pi[[len(l) > 1 for l in pi]])

Then convert to Series and use value_counts().

pd.Series(np.concatenate(ls)).value_counts()



(Shirt1, Shorts1)            2

(Shirt2, Shorts1)            1

(Shirt1, Shirt2, Shorts1)    1

(Shirt1, Shirt2)             1

edited 2 days ago

answered Apr 8 at 4:56

ResidentSleeper

48710

I think you need to create a combination of items first.

How to get all possible combinations of a list’s elements?

I used the function from Dan H's answer.

from itertools import chain, combinations

def all_subsets(ss):

    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))



uq_items = df.Item.unique()



list(all_subsets(uq_items))



[(),

 ('Shirt1',),

 ('Shirt2',),

 ('Shorts1',),

 ('Shirt1', 'Shirt2'),

 ('Shirt1', 'Shorts1'),

 ('Shirt2', 'Shorts1'),

 ('Shirt1', 'Shirt2', 'Shorts1')]

And use groupby each customer to get their items combination.

ls = 



for _, d in df.groupby('Cust_num', group_keys=False):

    # Get all possible subset of items

    pi = np.array(list(all_subsets(d.Item)))



    # Fliter only > 1

    ls.append(pi[[len(l) > 1 for l in pi]])

Then convert to Series and use value_counts().

pd.Series(np.concatenate(ls)).value_counts()



(Shirt1, Shorts1)            2

(Shirt2, Shorts1)            1

(Shirt1, Shirt2, Shorts1)    1

(Shirt1, Shirt2)             1

edited 2 days ago

answered Apr 8 at 4:56

ResidentSleeper

48710

edited 2 days ago

answered Apr 8 at 4:56

ResidentSleeper

48710

answered Apr 8 at 4:56

ResidentSleeper

48710

answered Apr 8 at 4:56

ResidentSleeper

48710

This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

– lys_dad
2 days ago

I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

– lys_dad
2 days ago

@lys_dad The accepted answer has already solved your memory problem, right?

– ResidentSleeper
2 days ago

1

It does, yes. But thank you for your elegant solution!

– lys_dad
2 days ago

add a comment |

This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

– lys_dad
2 days ago

I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

– lys_dad
2 days ago

@lys_dad The accepted answer has already solved your memory problem, right?

– ResidentSleeper
2 days ago

1

It does, yes. But thank you for your elegant solution!

– lys_dad
2 days ago

This probably works, but I keep getting a "Memory Error". I'll have to look for a solution for the memory issue as I can't use my desktop for this project (sensitive data).

– lys_dad
2 days ago

I did the first 1000 customers and it worked! Any suggestion for low memory laptops?

– lys_dad
2 days ago

@lys_dad The accepted answer has already solved your memory problem, right?

– ResidentSleeper
2 days ago

It does, yes. But thank you for your elegant solution!

– lys_dad
2 days ago

add a comment |

My version which I believe is easier to understand

new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})



new_df ['count'] = range(1, len(new_df ) + 1)

Output:

                            Item      Rev count

                        <lambda> <lambda>      

Cust_num                                       

Cust1      Shirt1 Shirt2 Shorts1      $40     1

Cust2             Shirt1 Shorts1      $40     2

Since you do not need the Rev column, you can drop it:

new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()



new_df

Output:

  Cust_num                    Item count

                          <lambda>      

0    Cust1   Shirt1 Shirt2 Shorts1     1

1    Cust2          Shirt1 Shorts1     2

This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:

[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]

Then next step finds the subsets:

for s1 in subsets:

    for s2 in subsets:

        if s2.issubset(s1):

            print("{}: {}".format(s2,s2.issubset(s1)))

Output:

{' Shirt2', ' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

edited 2 days ago

answered Apr 8 at 5:16

Kill3rbee

13410

@Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

– Kill3rbee
2 days ago

@Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

– Kill3rbee
2 days ago

add a comment |

My version which I believe is easier to understand

new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})



new_df ['count'] = range(1, len(new_df ) + 1)

Output:

                            Item      Rev count

                        <lambda> <lambda>      

Cust_num                                       

Cust1      Shirt1 Shirt2 Shorts1      $40     1

Cust2             Shirt1 Shorts1      $40     2

Since you do not need the Rev column, you can drop it:

new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()



new_df

Output:

  Cust_num                    Item count

                          <lambda>      

0    Cust1   Shirt1 Shirt2 Shorts1     1

1    Cust2          Shirt1 Shorts1     2

This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:

[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]

Then next step finds the subsets:

for s1 in subsets:

    for s2 in subsets:

        if s2.issubset(s1):

            print("{}: {}".format(s2,s2.issubset(s1)))

Output:

{' Shirt2', ' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

edited 2 days ago

answered Apr 8 at 5:16

Kill3rbee

13410

@Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

– Kill3rbee
2 days ago

@Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

– Kill3rbee
2 days ago

add a comment |

My version which I believe is easier to understand

new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})



new_df ['count'] = range(1, len(new_df ) + 1)

Output:

                            Item      Rev count

                        <lambda> <lambda>      

Cust_num                                       

Cust1      Shirt1 Shirt2 Shorts1      $40     1

Cust2             Shirt1 Shorts1      $40     2

Since you do not need the Rev column, you can drop it:

new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()



new_df

Output:

  Cust_num                    Item count

                          <lambda>      

0    Cust1   Shirt1 Shirt2 Shorts1     1

1    Cust2          Shirt1 Shorts1     2

This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:

[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]

Then next step finds the subsets:

for s1 in subsets:

    for s2 in subsets:

        if s2.issubset(s1):

            print("{}: {}".format(s2,s2.issubset(s1)))

Output:

{' Shirt2', ' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

edited 2 days ago

answered Apr 8 at 5:16

Kill3rbee

13410

My version which I believe is easier to understand

new_df = df.groupby("Cust_num").agg({lambda x: ''.join(x.unique())})



new_df ['count'] = range(1, len(new_df ) + 1)

Output:

                            Item      Rev count

                        <lambda> <lambda>      

Cust_num                                       

Cust1      Shirt1 Shirt2 Shorts1      $40     1

Cust2             Shirt1 Shorts1      $40     2

Since you do not need the Rev column, you can drop it:

new_df = new_df = new_df.drop(columns=["Rev"]).reset_index()



new_df

Output:

  Cust_num                    Item count

                          <lambda>      

0    Cust1   Shirt1 Shirt2 Shorts1     1

1    Cust2          Shirt1 Shorts1     2

This edit is to respond to @Chris by looking at his approach written using list comprehension. He created an list of sets:

[{' Shirt1', ' Shirt2', ' Shorts1'}, {' Shirt1', ' Shorts1'}]

Then next step finds the subsets:

for s1 in subsets:

    for s2 in subsets:

        if s2.issubset(s1):

            print("{}: {}".format(s2,s2.issubset(s1)))

Output:

{' Shirt2', ' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

{' Shorts1', ' Shirt1'}: True

edited 2 days ago

answered Apr 8 at 5:16

Kill3rbee

13410

edited 2 days ago

answered Apr 8 at 5:16

Kill3rbee

13410

answered Apr 8 at 5:16

Kill3rbee

13410

answered Apr 8 at 5:16

Kill3rbee

13410

@Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

– Kill3rbee
2 days ago

@Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

– Kill3rbee
2 days ago

add a comment |

@Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

– Kill3rbee
2 days ago

@Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

– Kill3rbee
2 days ago

@Chris thank for mocking my comment and down voting my answer. I did not downvote your answer out of the spirit of learning from each other.

– Kill3rbee
2 days ago

@Chris thanks for removing the downvote. Whole point is to learn from each other while helping each other.

– Kill3rbee
2 days ago

add a comment |

lys_dad is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

lys_dad is a new contributor. Be nice, and check out our Code of Conduct.

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfxtrjtrk