Pandas Functions: Advanced Groupbys with Grouper, Assign, and Query

Introduction

Pandas groupbys are some of the most useful functions in a data scientist's toolkit. And yet, time and again I have found that colleagues do not realize the flexibility these ubiquitous functions can grant them. In the following post, I will demonstrate some of my favorite uses of groupbys in the hope that it will help others in the future.

The Data

Let's start with a simple example and work our way up in difficulty. We'll start with reading in Covid19 data that Johns Hopkins University aggregated from the World Health Organization. The dataset contains approximately 300,000 observations from different countries and regions of the world on Covid19 cases, recoveries, and deaths.

from pandas as import read_csv, Grouper
from datetime import timedelta

covid_ts = read_csv("covid_19_data.csv").dropna()
covid_ts['ObservationDate'] = pd.to_datetime(covid_ts['ObservationDate']

Now that we've loaded our data, we can take a look.

covid_ts.head()
ObservationDateProvince/StateCountry/RegionLastUpdateConfirmedDeathsRecovered
01/22/2020AnhuiMainland China1/22/202017:001.00.00.0
01/22/2020BeijingMainland China1/22/202017:0014.00.00.0
01/22/2020ChongqingMainland China1/22/202017:006.00.00.0
01/22/2020FujianMainland China1/22/202017:001.00.00.0
01/22/2020GansuMainland China1/22/202017:000.00.00.0

Here we can confirm that we have seven columns of regional and national-level disease data. ObservationDate and Last Update don't differ by much, so for our purposes we'll simply stick with Observation Date, dropping the other. 

Since we'll be leveraging the temporality of our dataset, let's quickly check the range of our data.


start = min(covid_ts['ObservationDate'])
end = max(covid_ts['ObservationDate'])
range_ = end - start + timedelta(days=1) # inclusive range

print(f"Observations range form {str(start.date())} to {str(end.date())}, or {range_.days} days"

Observations range from 2020-01-22 to 2021-05-29, or 494 days.

Okay, with that let's jump into some groupbys.

Advanced Groupbys

Being Covid timeseries data, the first thing we can do is check how the disease has progressed over the weeks. Enter pandas.Grouper, a groupby-specific function that allows users to control how their data will be grouped at a time-based level. All we have to do is invoke Grouper within a typical groupby function, provide an offset alias (e.g. D for daily, W for weekly, Y for yearly), and an aggregation metric (e.g. sum, mean, count) such as is done in the following lines of code:

Multiline functions can be surrounded with () for readability


(
 covid_ts.groupby(Grouper(key="ObservationDate", freq="1W"))
         [['Confirmed']]
         .sum()
         .reset_index()
)

Resulting in the Weekly Aggregate Covid19 Cases table below:

WeeklyAggregate Covid19 Cases

ObservationDateConfirmed
2020-01-266059
2020-02-0261132
2020-02-09212706
2020-02-16398536
2020-02-23530646
2020-03-01555826
2020-03-08571316
2020-03-15611431
2020-03-22796662
2020-03-291258958
2020-04-052431633
2020-04-124016436
2020-04-195548413
2020-04-267071798
2020-05-038550311
2020-05-109924735
2020-05-1713640944
2020-05-2419038630
2020-05-3122987145
2020-06-0730014291
2020-06-1437135729
2020-06-2143814150
2020-06-2849634016
2020-07-0555998840
2020-07-1263062417
2020-07-1971136866
2020-07-2680165769
2020-08-0289766838
2020-08-0999485337
2020-08-16109304057
2020-08-23118832590
2020-08-30128335973
2020-09-06138370846
2020-09-13148141308
2020-09-20158402217
2020-09-27168849411
2020-10-04179041022
2020-10-11189900364
2020-10-18200923793
2020-10-25213166206
2020-11-01227018598
2020-11-08242825744
2020-11-15262675742
2020-11-22283806018
2020-11-29303487746
2020-12-06323398270
2020-12-13344855993
2020-12-20367122719
2020-12-27388942368
2021-01-03409729713
2021-01-10433898614
2021-01-17459369919
2021-01-24481613763
2021-01-31501123103
2021-02-07517491260
2021-02-14530940304
2021-02-21542301462
2021-02-28553580949
2021-03-07564515173
2021-03-14576244907
2021-03-21589155584
2021-03-28604094179
2021-04-04620294702
2021-04-11638289571
2021-04-18660556439
2021-04-25687026232
2021-05-02716197375
2021-05-09745782837
2021-05-16773254529
2021-05-23796128490
2021-05-30697481369

By grouping "confirmed" cases in covid_ts using Grouper, the offset alias 1W, and sum, we have easily aggregated weekly confirmed cases over the date range of our data. If we wanted more granular aggregations we could have easily changed our offsets to D, where is any number of day offets. But this is only step one of what we can achieve with Grouper, and groupby aggregations in general.

The convenience of Grouper is extended by its ability to aggregate subgroups by the offsets it's provided—all that matters is the placement of Grouper relative to other groupby columns. For example, in the groupby snippet below, "Country/Region" is placed in a list before our Grouper function at the 1M (one month) offset, producing the National/Regional Covid Cases By Month table below.


(
   covid_ts
   .groupby(["Country/Region", Grouper(key="ObservationDate", freq="1M")])
   [['Confirmed']]
   .sum()
   .reset_index()
)

National/Regional Covid Cases By Month

Country/RegionObservationDateConfirmed
Australia2020-01-3133
Australia2020-02-29486
Australia2020-03-3135620
Australia2020-04-30187842
Australia2020-05-31217625
Australia2020-06-30222611
Australia2020-07-31361723
Australia2020-08-31711091
Australia2020-09-30800899
Australia2020-10-31848364
Australia2020-11-30832514
Australia2020-12-31872046
Australia2021-01-31889012
Australia2021-02-28809190
Australia2021-03-31903649
Australia2021-04-30885710
Australia2021-05-31869015
Austria2020-02-292
Belgium2020-11-3010509939
Belgium2020-12-3119046283
Belgium2021-01-3120983559
Belgium2021-02-2820733707
Belgium2021-03-3125421637
Belgium2021-04-3028207302
Belgium2021-05-3129831648
Brazil2020-05-314737799
Brazil2020-06-3028128253
Brazil2020-07-3162641865
Brazil2020-08-31102903896
Brazil2020-09-30132059962
Brazil2020-10-31160991539
Brazil2020-11-30176837421
Brazil2020-12-31218249737
Brazil2021-01-31261187142
Brazil2021-02-28276456100
Brazil2021-03-31361059664
Brazil2021-04-30412565115
Brazil2021-05-31451476923
Canada2020-01-3112
Canada2020-02-29234
Canada2020-03-3152012
Canada2020-04-30927235
Canada2020-05-312369720
Canada2020-06-303016433
Canada2020-07-313460185
Canada2020-08-313854051
Canada2020-09-304290890
Canada2020-10-316126216
Canada2020-11-309141329
Canada2020-12-3115031370
Canada2021-01-3121663417
Canada2021-02-2823280680
Canada2021-03-3128667193
Canada2021-04-3033242319
Canada2021-05-3138374241
Chile2020-05-31913924
Chile2020-06-305806627
Chile2020-07-319983781
Chile2020-08-3111936049
Chile2020-09-3013147124
Chile2020-10-3115134758
Chile2020-11-3015945192
Chile2020-12-3117938094
Chile2021-01-3120661911
Chile2021-02-2821754084
Chile2021-03-3128062147
Chile2021-04-3033124646
Chile2021-05-3137172836
China2020-01-310
China2020-02-290
China2020-03-310
Colombia2020-05-31103500
Colombia2020-06-301698303
Colombia2020-07-315598108
Colombia2020-08-3114404679
Colombia2020-09-3021921158
Colombia2020-10-3129361707
Colombia2020-11-3035992828
Colombia2020-12-3145578807
Colombia2021-01-3158374546
Colombia2021-02-2861356188
Colombia2021-03-3171861747
Colombia2021-04-3078567887
Colombia2021-05-3189997493
Denmark2020-03-3112126
Denmark2020-04-305845
Denmark2020-05-316151
Denmark2020-06-306000
Denmark2020-07-316444
Denmark2020-08-3111118
Denmark2020-09-3013369
Denmark2020-10-3115461
Denmark2020-11-3015467
Denmark2020-12-3117396
Denmark2021-01-3120883
Denmark2021-02-2819232
Denmark2021-03-3121422
Denmark2021-04-3020789
Denmark2021-05-3120523
France2020-03-31102660
France2020-04-3034314
France2020-05-3173566
France2020-06-30151652
France2020-07-31317736
France2020-08-31434039
France2020-09-30645690
France2020-10-311021451
France2020-11-301443595
France2020-12-311774323
France2021-01-312013527
France2021-02-282149196
France2021-03-312747350
France2021-04-302986432
France2021-05-313289110
Germany2020-01-3117
Germany2020-05-313228720
Germany2020-06-305670762
Germany2020-07-316265079
Germany2020-08-317024733
Germany2020-09-308023521
Germany2020-10-3111676143
Germany2020-11-3024572075
Germany2020-12-3144141683
Germany2021-01-3162555458
Germany2021-02-2865660655
Germany2021-03-3181277883
Germany2021-04-3093513080
Germany2021-05-31103826264
Hong Kong2020-01-3165
Hong Kong2020-02-291543
Hong Kong2020-03-318018
Hong Kong2020-04-3029443
Hong Kong2020-05-3132692
Hong Kong2020-06-3034000
Hong Kong2020-07-3157318
Hong Kong2020-08-31134221
Hong Kong2020-09-30149065
Hong Kong2020-10-31161825
Hong Kong2020-11-30167377
Hong Kong2020-12-31239977
Hong Kong2021-01-31297899
Hong Kong2021-02-28301335
Hong Kong2021-03-31349435
Hong Kong2021-04-30349045
Hong Kong2021-05-31342677
India2020-06-308775360
India2020-07-3132781165
India2020-08-3182734792
India2020-09-30151651367
India2020-10-31228634268
India2020-11-30265834390
India2020-12-31307985456
India2021-01-31326908857
India2021-02-28305984490
India2021-03-31357487053
India2021-04-30447598593
India2021-05-31704714394
Iraq2020-02-290
Israel2020-02-295
Italy2020-05-314114467
Italy2020-06-307115232
Italy2020-07-317560745
Italy2020-08-317919622
Italy2020-09-308749361
Italy2020-10-3113281216
Italy2020-11-3035324374
Italy2020-12-3158293082
Italy2021-01-3172917277
Italy2021-02-2876429829
Italy2021-03-31101162913
Italy2021-04-30114799462
Italy2021-05-31120086716
Japan2020-05-3166738
Japan2020-06-30525609
Japan2020-07-31771595
Japan2020-08-311701916
Japan2020-09-302297182
Japan2020-10-312856221
Japan2020-11-303650314
Japan2020-12-315880523
Japan2021-01-319902662
Japan2021-02-2811627561
Japan2021-03-3113976951
Japan2021-04-3015835437
Japan2021-05-3119573346
Lebanon2020-02-291
Macau2020-01-3146
Macau2020-02-29283
Macau2020-03-31556
Macau2020-04-301335
Macau2020-05-311395
Macau2020-06-301356
Macau2020-07-311426
Macau2020-08-311426
Macau2020-09-301380
Macau2020-10-311426
Macau2020-11-301380
Macau2020-12-311426
Macau2021-01-311437
Macau2021-02-281340
Macau2021-03-311488
Macau2021-04-301464
Macau2021-05-311441
Mainland China2020-01-3138340
Mainland China2020-02-291631535
Mainland China2020-03-312506812
Mainland China2020-04-302469365
Mainland China2020-05-312571194
Mainland China2020-06-302497323
Mainland China2020-07-312595103
Mainland China2020-08-312628794
Mainland China2020-09-302557006
Mainland China2020-10-312655726
Mainland China2020-11-302590049
Mainland China2020-12-312690556
Mainland China2021-01-312735735
Mainland China2021-02-282513697
Mainland China2021-03-312792080
Mainland China2021-04-302714147
Mainland China2021-05-312635134
Mexico2020-05-31881055
Mexico2020-06-304692440
Mexico2020-07-3110107738
Mexico2020-08-3116108026
Mexico2020-09-3020309152
Mexico2020-10-3126077771
Mexico2020-11-3030342144
Mexico2020-12-3139586956
Mexico2021-01-3150652332
Mexico2021-02-2855678011
Mexico2021-03-3167295623
Mexico2021-04-3068840861
Mexico2021-05-3168985170
Netherlands2020-03-3114701
Netherlands2020-04-304764
Netherlands2020-05-316199
Netherlands2020-06-306217
Netherlands2020-07-31797069
Netherlands2020-08-311987056
Netherlands2020-09-302805482
Netherlands2020-10-317020285
Netherlands2020-11-3013649806
Netherlands2020-12-3120507503
Netherlands2021-01-3128343170
Netherlands2021-02-2829303549
Netherlands2021-03-3136901518
Netherlands2021-04-3042290027
Netherlands2021-05-3146763492
Others2020-02-2910076
Others2020-03-3116152
Pakistan2020-06-303594515
Pakistan2020-07-317910429
Pakistan2020-08-318935125
Pakistan2020-09-309118734
Pakistan2020-10-3110005964
Pakistan2020-11-3010891173
Pakistan2020-12-3113839663
Pakistan2021-01-3116029828
Pakistan2021-02-2815805181
Pakistan2021-03-3119174307
Pakistan2021-04-3022498648
Pakistan2021-05-3125442064
Peru2020-05-31603705
Peru2020-06-306931248
Peru2020-07-3110657640
Peru2020-08-3116414082
Peru2020-09-3022054476
Peru2020-10-3126653391
Peru2020-11-3028025436
Peru2020-12-3130624534
Peru2021-01-3132876154
Peru2021-02-2834491490
Peru2021-03-3144328085
Peru2021-04-3050368511
Peru2021-05-3154460864
Russia2020-06-3016097579
Russia2020-07-3123233041
Russia2020-08-3128508044
Russia2020-09-3032289140
Russia2020-10-3142549555
Russia2020-11-3057925419
Russia2020-12-3184102182
Russia2021-01-31108422409
Russia2021-02-28112712911
Russia2021-03-31135088504
Russia2021-04-30138817103
Russia2021-05-31141404158
Spain2020-05-314219335
Spain2020-06-307331192
Spain2020-07-318134290
Spain2020-08-3111159043
Spain2020-09-3018316260
Spain2020-10-3129494693
Spain2020-11-3043932783
Spain2020-12-3155067758
Spain2021-01-3171180786
Spain2021-02-2885568038
Spain2021-03-3199280629
Spain2021-04-30101966198
Spain2021-05-31104480995
Sweden2020-06-301431896
Sweden2020-07-312360582
Sweden2020-08-312600324
Sweden2020-09-302632351
Sweden2020-10-313240753
Sweden2020-11-305546377
Sweden2020-12-3110695430
Sweden2021-01-3115946013
Sweden2021-02-2817201658
Sweden2021-03-3122569251
Sweden2021-04-3026658585
Sweden2021-05-3129888569
Taiwan2020-01-3152
Taiwan2020-02-29616
Taiwan2020-03-31436
UK2020-03-3123768
UK2020-04-3028370
UK2020-05-3140553
UK2020-06-306103975
UK2020-07-319150130
UK2020-08-319928933
UK2020-09-3011564506
UK2020-10-3122002570
UK2020-11-3040830975
UK2020-12-3161365366
UK2021-01-31102180395
UK2021-02-28113211684
UK2021-03-31132721966
UK2021-04-30131952179
UK2021-05-31129432185
US2020-01-3137
US2020-02-29718
US2020-03-311088640
US2020-04-3019534148
US2020-05-3145256716
US2020-06-3065059848
US2020-07-31111580930
US2020-08-31166786671
US2020-09-30199653685
US2020-10-31252160215
US2020-11-30340017832
US2020-12-31528139775
US2021-01-31730894746
US2021-02-28774254261
US2021-03-31916643837
US2021-04-30944543693
US2021-05-31953529915
Ukraine2020-06-301011298
Ukraine2020-07-311797522
Ukraine2020-08-312952987
Ukraine2020-09-305019534
Ukraine2020-10-319292272
Ukraine2020-11-3016972054
Ukraine2020-12-3129178756
Ukraine2021-01-3136713095
Ukraine2021-02-2837009908
Ukraine2021-03-3147885644
Ukraine2021-04-3058596168
Ukraine2021-05-3163796837

Whereas, by placing "Country/Region" in a list after Grouper, one can get a similar, but slightly different aggregation of the data: Monthly Covid Cases by Country/Region.


(
    covid_ts
    .groupby([
    Grouper(key="ObservationDate", freq="1M"), "Country/Region"])
    [['Confirmed']]
    .sum()
    .reset_index()
)

Monthly Covid Cases by Country/Region

ObservationDateCountry/RegionConfirmed
2020-01-31Australia33
2020-01-31Canada12
2020-01-31China0
2020-01-31Germany17
2020-01-31Hong Kong65
2020-01-31Macau46
2020-01-31Mainland China38340
2020-01-31Taiwan52
2020-01-31US37
2020-02-29Australia486
2020-02-29Austria2
2020-02-29Canada234
2020-02-29China0
2020-02-29Hong Kong1543
2020-02-29Iraq0
2020-02-29Israel5
2020-02-29Lebanon1
2020-02-29Macau283
2020-02-29Mainland China1631535
2020-02-29Others10076
2020-02-29Taiwan616
2020-02-29US718
2020-03-31Australia35620
2020-03-31Canada52012
2020-03-31China0
2020-03-31Denmark12126
2020-03-31France102660
2020-03-31Hong Kong8018
2020-03-31Macau556
2020-03-31Mainland China2506812
2020-03-31Netherlands14701
2020-03-31Others16152
2020-03-31Taiwan436
2020-03-31UK23768
2020-03-31US1088640
2020-04-30Australia187842
2020-04-30Canada927235
2020-04-30Denmark5845
2020-04-30France34314
2020-04-30Hong Kong29443
2020-04-30Macau1335
2020-04-30Mainland China2469365
2020-04-30Netherlands4764
2020-04-30UK28370
2020-04-30US19534148
2020-05-31Australia217625
2020-05-31Brazil4737799
2020-05-31Canada2369720
2020-05-31Chile913924
2020-05-31Colombia103500
2020-05-31Denmark6151
2020-05-31France73566
2020-05-31Germany3228720
2020-05-31Hong Kong32692
2020-05-31Italy4114467
2020-05-31Japan66738
2020-05-31Macau1395
2020-05-31Mainland China2571194
2020-05-31Mexico881055
2020-05-31Netherlands6199
2020-05-31Peru603705
2020-05-31Spain4219335
2020-05-31UK40553
2020-05-31US45256716
2020-06-30Australia222611
2020-06-30Brazil28128253
2020-06-30Canada3016433
2020-06-30Chile5806627
2020-06-30Colombia1698303
2020-06-30Denmark6000
2020-06-30France151652
2020-06-30Germany5670762
2020-06-30Hong Kong34000
2020-06-30India8775360
2020-06-30Italy7115232
2020-06-30Japan525609
2020-06-30Macau1356
2020-06-30Mainland China2497323
2020-06-30Mexico4692440
2020-06-30Netherlands6217
2020-06-30Pakistan3594515
2020-06-30Peru6931248
2020-06-30Russia16097579
2020-06-30Spain7331192
2020-06-30Sweden1431896
2020-06-30UK6103975
2020-06-30US65059848
2020-06-30Ukraine1011298
2020-07-31Australia361723
2020-07-31Brazil62641865
2020-07-31Canada3460185
2020-07-31Chile9983781
2020-07-31Colombia5598108
2020-07-31Denmark6444
2020-07-31France317736
2020-07-31Germany6265079
2020-07-31Hong Kong57318
2020-07-31India32781165
2020-07-31Italy7560745
2020-07-31Japan771595
2020-07-31Macau1426
2020-07-31Mainland China2595103
2020-07-31Mexico10107738
2020-07-31Netherlands797069
2020-07-31Pakistan7910429
2020-07-31Peru10657640
2020-07-31Russia23233041
2020-07-31Spain8134290
2020-07-31Sweden2360582
2020-07-31UK9150130
2020-07-31US111580930
2020-07-31Ukraine1797522
2020-08-31Australia711091
2020-08-31Brazil102903896
2020-08-31Canada3854051
2020-08-31Chile11936049
2020-08-31Colombia14404679
2020-08-31Denmark11118
2020-08-31France434039
2020-08-31Germany7024733
2020-08-31Hong Kong134221
2020-08-31India82734792
2020-08-31Italy7919622
2020-08-31Japan1701916
2020-08-31Macau1426
2020-08-31Mainland China2628794
2020-08-31Mexico16108026
2020-08-31Netherlands1987056
2020-08-31Pakistan8935125
2020-08-31Peru16414082
2020-08-31Russia28508044
2020-08-31Spain11159043
2020-08-31Sweden2600324
2020-08-31UK9928933
2020-08-31US166786671
2020-08-31Ukraine2952987
2020-09-30Australia800899
2020-09-30Brazil132059962
2020-09-30Canada4290890
2020-09-30Chile13147124
2020-09-30Colombia21921158
2020-09-30Denmark13369
2020-09-30France645690
2020-09-30Germany8023521
2020-09-30Hong Kong149065
2020-09-30India151651367
2020-09-30Italy8749361
2020-09-30Japan2297182
2020-09-30Macau1380
2020-09-30Mainland China2557006
2020-09-30Mexico20309152
2020-09-30Netherlands2805482
2020-09-30Pakistan9118734
2020-09-30Peru22054476
2020-09-30Russia32289140
2020-09-30Spain18316260
2020-09-30Sweden2632351
2020-09-30UK11564506
2020-09-30US199653685
2020-09-30Ukraine5019534
2020-10-31Australia848364
2020-10-31Brazil160991539
2020-10-31Canada6126216
2020-10-31Chile15134758
2020-10-31Colombia29361707
2020-10-31Denmark15461
2020-10-31France1021451
2020-10-31Germany11676143
2020-10-31Hong Kong161825
2020-10-31India228634268
2020-10-31Italy13281216
2020-10-31Japan2856221
2020-10-31Macau1426
2020-10-31Mainland China2655726
2020-10-31Mexico26077771
2020-10-31Netherlands7020285
2020-10-31Pakistan10005964
2020-10-31Peru26653391
2020-10-31Russia42549555
2020-10-31Spain29494693
2020-10-31Sweden3240753
2020-10-31UK22002570
2020-10-31US252160215
2020-10-31Ukraine9292272
2020-11-30Australia832514
2020-11-30Belgium10509939
2020-11-30Brazil176837421
2020-11-30Canada9141329
2020-11-30Chile15945192
2020-11-30Colombia35992828
2020-11-30Denmark15467
2020-11-30France1443595
2020-11-30Germany24572075
2020-11-30Hong Kong167377
2020-11-30India265834390
2020-11-30Italy35324374
2020-11-30Japan3650314
2020-11-30Macau1380
2020-11-30Mainland China2590049
2020-11-30Mexico30342144
2020-11-30Netherlands13649806
2020-11-30Pakistan10891173
2020-11-30Peru28025436
2020-11-30Russia57925419
2020-11-30Spain43932783
2020-11-30Sweden5546377
2020-11-30UK40830975
2020-11-30US340017832
2020-11-30Ukraine16972054
2020-12-31Australia872046
2020-12-31Belgium19046283
2020-12-31Brazil218249737
2020-12-31Canada15031370
2020-12-31Chile17938094
2020-12-31Colombia45578807
2020-12-31Denmark17396
2020-12-31France1774323
2020-12-31Germany44141683
2020-12-31Hong Kong239977
2020-12-31India307985456
2020-12-31Italy58293082
2020-12-31Japan5880523
2020-12-31Macau1426
2020-12-31Mainland China2690556
2020-12-31Mexico39586956
2020-12-31Netherlands20507503
2020-12-31Pakistan13839663
2020-12-31Peru30624534
2020-12-31Russia84102182
2020-12-31Spain55067758
2020-12-31Sweden10695430
2020-12-31UK61365366
2020-12-31US528139775
2020-12-31Ukraine29178756
2021-01-31Australia889012
2021-01-31Belgium20983559
2021-01-31Brazil261187142
2021-01-31Canada21663417
2021-01-31Chile20661911
2021-01-31Colombia58374546
2021-01-31Denmark20883
2021-01-31France2013527
2021-01-31Germany62555458
2021-01-31Hong Kong297899
2021-01-31India326908857
2021-01-31Italy72917277
2021-01-31Japan9902662
2021-01-31Macau1437
2021-01-31Mainland China2735735
2021-01-31Mexico50652332
2021-01-31Netherlands28343170
2021-01-31Pakistan16029828
2021-01-31Peru32876154
2021-01-31Russia108422409
2021-01-31Spain71180786
2021-01-31Sweden15946013
2021-01-31UK102180395
2021-01-31US730894746
2021-01-31Ukraine36713095
2021-02-28Australia809190
2021-02-28Belgium20733707
2021-02-28Brazil276456100
2021-02-28Canada23280680
2021-02-28Chile21754084
2021-02-28Colombia61356188
2021-02-28Denmark19232
2021-02-28France2149196
2021-02-28Germany65660655
2021-02-28Hong Kong301335
2021-02-28India305984490
2021-02-28Italy76429829
2021-02-28Japan11627561
2021-02-28Macau1340
2021-02-28Mainland China2513697
2021-02-28Mexico55678011
2021-02-28Netherlands29303549
2021-02-28Pakistan15805181
2021-02-28Peru34491490
2021-02-28Russia112712911
2021-02-28Spain85568038
2021-02-28Sweden17201658
2021-02-28UK113211684
2021-02-28US774254261
2021-02-28Ukraine37009908
2021-03-31Australia903649
2021-03-31Belgium25421637
2021-03-31Brazil361059664
2021-03-31Canada28667193
2021-03-31Chile28062147
2021-03-31Colombia71861747
2021-03-31Denmark21422
2021-03-31France2747350
2021-03-31Germany81277883
2021-03-31Hong Kong349435
2021-03-31India357487053
2021-03-31Italy101162913
2021-03-31Japan13976951
2021-03-31Macau1488
2021-03-31Mainland China2792080
2021-03-31Mexico67295623
2021-03-31Netherlands36901518
2021-03-31Pakistan19174307
2021-03-31Peru44328085
2021-03-31Russia135088504
2021-03-31Spain99280629
2021-03-31Sweden22569251
2021-03-31UK132721966
2021-03-31US916643837
2021-03-31Ukraine47885644
2021-04-30Australia885710
2021-04-30Belgium28207302
2021-04-30Brazil412565115
2021-04-30Canada33242319
2021-04-30Chile33124646
2021-04-30Colombia78567887
2021-04-30Denmark20789
2021-04-30France2986432
2021-04-30Germany93513080
2021-04-30Hong Kong349045
2021-04-30India447598593
2021-04-30Italy114799462
2021-04-30Japan15835437
2021-04-30Macau1464
2021-04-30Mainland China2714147
2021-04-30Mexico68840861
2021-04-30Netherlands42290027
2021-04-30Pakistan22498648
2021-04-30Peru50368511
2021-04-30Russia138817103
2021-04-30Spain101966198
2021-04-30Sweden26658585
2021-04-30UK131952179
2021-04-30US944543693
2021-04-30Ukraine58596168
2021-05-31Australia869015
2021-05-31Belgium29831648
2021-05-31Brazil451476923
2021-05-31Canada38374241
2021-05-31Chile37172836
2021-05-31Colombia89997493
2021-05-31Denmark20523
2021-05-31France3289110
2021-05-31Germany103826264
2021-05-31Hong Kong342677
2021-05-31India704714394
2021-05-31Italy120086716
2021-05-31Japan19573346
2021-05-31Macau1441
2021-05-31Mainland China2635134
2021-05-31Mexico68985170
2021-05-31Netherlands46763492
2021-05-31Pakistan25442064
2021-05-31Peru54460864
2021-05-31Russia141404158
2021-05-31Spain104480995
2021-05-31Sweden29888569
2021-05-31UK129432185
2021-05-31US953529915
2021-05-31Ukraine63796837

Note here that as we progress down our table, we pass the total number of covid cases for each country reporting, whereas previously we would pass through the entire history of reported covid cases per country. As ever, the order our data is listed as is determined by the placement of columns in our groupby. With Grouper, our options are simply extended to aggregations of date-values—and the extensibility does not end there.

Complementary Functions

After grouping our data, we often want to operate on the values we have derived. Luckily, pandas provides us with assign, a function for manipulating newly derived columns in place. To use assign, most often one will also want to be comfortable with lambda expressions, so we'll be sure to implement them here as a reminder. Let's return to our National/Regional Covid Cases By Month for a use-case.

First, we'll rename our columns with some named aggregations for clarity, replacing "Confirmed" with the more accurate "Total_Cases".


(
    covid_ts
    .groupby(["Country/Region",
    .Grouper(key="ObservationDate", freq="1M"), ])
    .agg(Total_Cases=("Confirmed", "sum"), 
            )
    .reset_index()
    .head() 
)
Country/RegionObservationDateTotal_Cases
Australia2020-01-3133
Australia2020-02-29.486
Australia2020-03-3135620
Australia2020-04-30187842
Australia2020-05-31217625

Next, we'll add additional data to our table by inserting a named aggregation for "Total_Deaths" per country per month.


(
    covid_ts
    .groupby(["Country/Region",
    .Grouper(key="ObservationDate", freq="1M"), ])
    .agg(Total_Deaths=("Deaths", "sum"), 
         Total_Cases=("Confirmed", "sum"),
         )
    .reset_index()
    .head()
   )
Country/RegionObservationDateTotal_DeathsTotal_Cases
Australia2020-01-31033
Australia2020-02-290486
Australia2020-03-3119235620
Australia2020-04-301817187842
Australia2020-05-313071217625

Finally, we'll leverage assign by referencing our new "Total_Deaths" and "Total_Cases" in-line, using them to create an entirely new column of data: "Death_Case_Ratio", or "Total_Deaths" divided by "Total_Cases". By multiplying our new ratio by 100 we can derive an informative metric: the percentage of infected individuals who die each month in a given country/region. Finally, we'll rename our columns to more aesthetic titles, as spaces aren't allowed in named aggregations.


(
    covid_ts
    .groupby(["Country/Region",
    pd.Grouper(key="ObservationDate", freq="1M"), ])
    .agg(Total_Deaths=("Deaths", "sum"),
         Total_Cases=("Confirmed", "sum")
        )
    .assign(
Death_Case_Ratio=lambda x: round(x['Total_Deaths']/x['Total_Cases']*100,2)           )
    .reset_index()
    .rename({'Total_Deaths':'Total Deaths',
             'Total_Cases':'Total Cases',
             'Death_Case_Ratio':'Death/Case Ratio (%)'},
             axis=1)
)

And wala we have the following National/Regional Death/Case Ratio By Month table.

National/Regional Death/Case Ratio By Month

Country/RegionObservationDateTotal DeathsTotal CasesDeath/Case Ratio (%)
Australia2020-01-310330.0
Australia2020-02-2904860.0
Australia2020-03-31192356200.54
Australia2020-04-3018171878420.97
Australia2020-05-3130712176251.41
Australia2020-06-3030752226111.38
Australia2020-07-3139963617231.1
Australia2020-08-31130067110911.83
Australia2020-09-30244758008993.06
Australia2020-10-31279498483643.29
Australia2020-11-30272128325143.27
Australia2020-12-31281538720463.23
Australia2021-01-31281798890123.17
Australia2021-02-28254528091903.15
Australia2021-03-31281799036493.12
Australia2021-04-30272898857103.08
Australia2021-05-31263908690153.04
Austria2020-02-29020.0
Belgium2020-11-30292870105099392.79
Belgium2020-12-31565082190462832.97
Belgium2021-01-31631655209835593.01
Belgium2021-02-28606413207337072.92
Belgium2021-03-31698886254216372.75
Belgium2021-04-30709354282073022.51
Belgium2021-05-31715015298316482.4
Brazil2020-05-3129099647377996.14
Brazil2020-06-301353418281282534.81
Brazil2020-07-312370476626418653.78
Brazil2020-08-3133454051029038963.25
Brazil2020-09-3040043521320599623.03
Brazil2020-10-3147327891609915392.94
Brazil2020-11-3049863191768374212.82
Brazil2020-12-3157025482182497372.61
Brazil2021-01-3164836862611871422.48
Brazil2021-02-2867133422764561002.43
Brazil2021-03-3188265153610596642.44
Brazil2021-04-30109524144125651152.65
Brazil2021-05-31125767694514769232.79
Canada2020-01-310120.0
Canada2020-02-2902340.0
Canada2020-03-31777520121.49
Canada2020-04-30540809272355.83
Canada2020-05-3119865223697208.38
Canada2020-06-3025410130164338.42
Canada2020-07-3127634234601857.99
Canada2020-08-3128236038540517.33
Canada2020-09-3027808442908906.48
Canada2020-10-3130504161262164.98
Canada2020-11-3033523991413293.67
Canada2020-12-31432388150313702.88
Canada2021-01-31556738216634172.57
Canada2021-02-28594361232806802.55
Canada2021-03-31697637286671932.43
Canada2021-04-30706012332423192.12
Canada2021-05-31721084383742411.88
Chile2020-05-3194479139241.03
Chile2020-06-3010141858066271.75
Chile2020-07-3123908299837812.39
Chile2020-08-31323785119360492.71
Chile2020-09-30361583131471242.75
Chile2020-10-31419247151347582.77
Chile2020-11-30444645159451922.79
Chile2020-12-31495826179380942.76
Chile2021-01-31540938206619112.62
Chile2021-02-28546762217540842.51
Chile2021-03-31676397280621472.41
Chile2021-04-30744626331246462.25
Chile2021-05-31803908371728362.16
China2020-01-3100nan
China2020-02-2900nan
China2020-03-3100nan
Colombia2020-05-3134951035003.38
Colombia2020-06-305806116983033.42
Colombia2020-07-3119691855981083.52
Colombia2020-08-31466128144046793.24
Colombia2020-09-30696562219211583.18
Colombia2020-10-31889237293617073.03
Colombia2020-11-301023991359928282.84
Colombia2020-12-311233616455788072.71
Colombia2021-01-311501117583745462.57
Colombia2021-02-281608609613561882.62
Colombia2021-03-311906300718617472.65
Colombia2021-04-302035213785678872.59
Colombia2021-05-312343680899974932.6
Denmark2020-03-3142121260.35
Denmark2020-04-30058450.0
Denmark2020-05-31061510.0
Denmark2020-06-30060000.0
Denmark2020-07-31064440.0
Denmark2020-08-310111180.0
Denmark2020-09-300133690.0
Denmark2020-10-310154610.0
Denmark2020-11-300154670.0
Denmark2020-12-310173960.0
Denmark2021-01-3126208830.12
Denmark2021-02-2828192320.15
Denmark2021-03-3131214220.14
Denmark2021-04-3030207890.14
Denmark2021-05-3129205230.14
France2020-03-3129511026602.87
France2020-04-30700343142.04
France2020-05-311437735661.95
France2020-06-3019841516521.31
France2020-07-3131853177361.0
France2020-08-3140574340390.93
France2020-09-3051746456900.8
France2020-10-31843510214510.83
France2020-11-301176114435950.81
France2020-12-311450317743230.82
France2021-01-311589620135270.79
France2021-02-281570221491960.73
France2021-03-312086827473500.76
France2021-04-302413129864320.81
France2021-05-312702532891100.82
Germany2020-01-310170.0
Germany2020-05-3114803732287204.59
Germany2020-06-3026428156707624.66
Germany2020-07-3128146062650794.49
Germany2020-08-3128625070247334.07
Germany2020-09-3028145580235213.51
Germany2020-10-31304666116761432.61
Germany2020-11-30393286245720751.6
Germany2020-12-31763375441416831.73
Germany2021-01-311428068625554582.28
Germany2021-02-281824513656606552.78
Germany2021-03-312289398812778832.82
Germany2021-04-302390995935130802.56
Germany2021-05-3124943311038262642.4
Hong Kong2020-01-310650.0
Hong Kong2020-02-293715432.4
Hong Kong2020-03-3110580181.31
Hong Kong2020-04-30120294430.41
Hong Kong2020-05-31124326920.38
Hong Kong2020-06-30145340000.43
Hong Kong2020-07-31371573180.65
Hong Kong2020-08-3120101342211.5
Hong Kong2020-09-3030151490652.02
Hong Kong2020-10-3132551618252.01
Hong Kong2020-11-3032271673771.93
Hong Kong2020-12-3138622399771.61
Hong Kong2021-01-3150592978991.7
Hong Kong2021-02-2853853013351.79
Hong Kong2021-03-3162913494351.8
Hong Kong2021-04-3062293490451.78
Hong Kong2021-05-3160903426771.78
India2020-06-3027056487753603.08
India2020-07-31811787327811652.48
India2020-08-311582152827347921.91
India2020-09-3024757101516513671.63
India2020-10-3134808952286342681.52
India2020-11-3039096592658343901.47
India2020-12-3144684953079854561.45
India2021-01-3147141133269088571.44
India2021-02-2843621993059844901.43
India2021-03-3149412663574870531.38
India2021-04-3053892244475985931.2
India2021-05-3178499497047143941.11
Iraq2020-02-2900nan
Israel2020-02-29050.0
Italy2020-05-31585807411446714.24
Italy2020-06-301028841711523214.46
Italy2020-07-311084927756074514.35
Italy2020-08-311095146791962213.83
Italy2020-09-301069880874936112.23
Italy2020-10-311137835132812168.57
Italy2020-11-301385115353243743.92
Italy2020-12-312046224582930823.51
Italy2021-01-312531790729172773.47
Italy2021-02-282622056764298293.43
Italy2021-03-3132010001011629133.16
Italy2021-04-3034723821147994623.02
Italy2021-05-3135912211200867162.99
Japan2020-05-313560667385.33
Japan2020-06-30281375256095.35
Japan2020-07-31306097715953.97
Japan2020-08-313487417019162.05
Japan2020-09-304382022971821.91
Japan2020-10-315157128562211.81
Japan2020-11-305689136503141.56
Japan2020-12-318223158805231.4
Japan2021-01-3114075099026621.42
Japan2021-02-28195564116275611.68
Japan2021-03-31267223139769511.91
Japan2021-04-30287341158354371.81
Japan2021-05-31333106195733461.7
Lebanon2020-02-29010.0
Macau2020-01-310460.0
Macau2020-02-2902830.0
Macau2020-03-3105560.0
Macau2020-04-30013350.0
Macau2020-05-31013950.0
Macau2020-06-30013560.0
Macau2020-07-31014260.0
Macau2020-08-31014260.0
Macau2020-09-30013800.0
Macau2020-10-31014260.0
Macau2020-11-30013800.0
Macau2020-12-31014260.0
Macau2021-01-31014370.0
Macau2021-02-28013400.0
Macau2021-03-31014880.0
Macau2021-04-30014640.0
Macau2021-05-31014410.0
Mainland China2020-01-31905383402.36
Mainland China2020-02-294638016315352.84
Mainland China2020-03-319837425068123.92
Mainland China2020-04-3011817024693654.79
Mainland China2020-05-3114363925711945.59
Mainland China2020-06-3013902024973235.57
Mainland China2020-07-3114365425951035.54
Mainland China2020-08-3114365426287945.46
Mainland China2020-09-3013902025570065.44
Mainland China2020-10-3114365426557265.41
Mainland China2020-11-3013902025900495.37
Mainland China2020-12-3114365426905565.34
Mainland China2021-01-3114368027357355.25
Mainland China2021-02-2812980825136975.16
Mainland China2021-03-3114371627920805.15
Mainland China2021-04-3013908027141475.12
Mainland China2021-05-3113444426351345.1
Mexico2020-05-319669488105510.97
Mexico2020-06-30562682469244011.99
Mexico2020-07-3111642381010773811.52
Mexico2020-08-3117492131610802610.86
Mexico2020-09-3021504922030915210.59
Mexico2020-10-3126491322607777110.16
Mexico2020-11-302962546303421449.76
Mexico2020-12-313594393395869569.08
Mexico2021-01-314367196506523328.62
Mexico2021-02-284863060556780118.73
Mexico2021-03-316056567672956239.0
Mexico2021-04-306318085688408619.18
Mexico2021-05-316383863689851709.25
Netherlands2020-03-31456147013.1
Netherlands2020-04-3031947646.7
Netherlands2020-05-3157161999.21
Netherlands2020-06-3057062179.17
Netherlands2020-07-319269579706911.63
Netherlands2020-08-3119227919870569.68
Netherlands2020-09-3018969828054826.76
Netherlands2020-10-3121184670202853.02
Netherlands2020-11-30256715136498061.88
Netherlands2020-12-31322195205075031.57
Netherlands2021-01-31401458283431701.42
Netherlands2021-02-28418927293035491.43
Netherlands2021-03-31503675369015181.36
Netherlands2021-04-30511880422900271.21
Netherlands2021-05-31512773467634921.1
Others2020-02-2935100760.35
Others2020-03-31154161520.95
Pakistan2020-06-307140435945151.99
Pakistan2020-07-3116628479104292.1
Pakistan2020-08-3119081189351252.14
Pakistan2020-09-3019195391187342.11
Pakistan2020-10-31205956100059642.06
Pakistan2020-11-30220356108911732.02
Pakistan2020-12-31282857138396632.04
Pakistan2021-01-31340044160298282.12
Pakistan2021-02-28345787158051812.19
Pakistan2021-03-31423884191743072.21
Pakistan2021-04-30483568224986482.15
Pakistan2021-05-31566161254420642.23
Peru2020-05-31170756037052.83
Peru2020-06-3021020069312483.03
Peru2020-07-31424053106576403.98
Peru2020-08-31759195164140824.63
Peru2020-09-30923911220544764.19
Peru2020-10-311039335266533913.9
Peru2020-11-301056154280254363.77
Peru2020-12-311140356306245343.72
Peru2021-01-311204526328761543.66
Peru2021-02-281222507344914903.54
Peru2021-03-311526553443280853.44
Peru2021-04-301692814503685113.36
Peru2021-05-311902554544608643.49
Russia2020-06-30215268160975791.34
Russia2020-07-31366477232330411.58
Russia2020-08-31484136285080441.7
Russia2020-09-30565674322891401.75
Russia2020-10-31738104425495551.73
Russia2020-11-301000094579254191.73
Russia2020-12-311490416841021821.77
Russia2021-01-3119908411084224091.84
Russia2021-02-2822079431127129111.96
Russia2021-03-3128303011350885042.1
Russia2021-04-3030875311388171032.22
Russia2021-05-3132944441414041582.33
Spain2020-05-31496839421933511.78
Spain2020-06-30828405733119211.3
Spain2020-07-31880773813429010.83
Spain2020-08-31889890111590437.97
Spain2020-09-30907525183162604.95
Spain2020-10-311046133294946933.55
Spain2020-11-301232707439327832.81
Spain2020-12-311497045550677582.72
Spain2021-01-311671602711807862.35
Spain2021-02-281817907855680382.12
Spain2021-03-312253421992806292.27
Spain2021-04-3023047721019661982.26
Spain2021-05-3122973211044809952.2
Sweden2020-06-3012986414318969.07
Sweden2020-07-3117291723605827.33
Sweden2020-08-3117937626003246.9
Sweden2020-09-3017567626323516.67
Sweden2020-10-3118327232407535.66
Sweden2020-11-3018776055463773.39
Sweden2020-12-31240064106954302.24
Sweden2021-01-31316144159460131.98
Sweden2021-02-28347461172016582.02
Sweden2021-03-31408718225692511.81
Sweden2021-04-30412253266585851.55
Sweden2021-05-31413737298885691.38
Taiwan2020-01-310520.0
Taiwan2020-02-29146162.27
Taiwan2020-03-31104362.29
UK2020-03-31820237683.45
UK2020-04-30974283703.43
UK2020-05-312438405536.01
UK2020-06-30854141610397513.99
UK2020-07-311399546915013015.3
UK2020-08-311368183992893313.78
UK2020-09-3012545421156450610.85
UK2020-10-311358294220025706.17
UK2020-11-301573977408309753.85
UK2020-12-312042273613653663.33
UK2021-01-3127597281021803952.7
UK2021-02-2832722311132116842.89
UK2021-03-3138967241327219662.94
UK2021-04-3038227391319521792.9
UK2021-05-3137096531294321852.87
US2020-01-310370.0
US2020-02-2917180.14
US2020-03-311844710886401.69
US2020-04-30971766195341484.97
US2020-05-312706969452567165.98
US2020-06-303499386650598485.38
US2020-07-3143078471115809303.86
US2020-08-3152654111667866713.16
US2020-09-3058871211996536852.95
US2020-10-3167908172521602152.69
US2020-11-3075018233400178322.21
US2020-12-3196769875281397751.83
US2021-01-31124439467308947461.7
US2021-02-28136337067742542611.76
US2021-03-31166496339166438371.82
US2021-04-30169679949445436931.8
US2021-05-31169819089535299151.78
Ukraine2020-06-302797510112982.77
Ukraine2020-07-314511717975222.51
Ukraine2020-08-316586229529872.23
Ukraine2020-09-3010190250195342.03
Ukraine2020-10-3117494792922721.88
Ukraine2020-11-30302110169720541.78
Ukraine2020-12-31503232291787561.72
Ukraine2021-01-31670535367130951.83
Ukraine2021-02-28719876370099081.95
Ukraine2021-03-31947071478856441.98
Ukraine2021-04-301214256585961682.07
Ukraine2021-05-311436619637968372.25

Aggregating Text and Filtering

Groupbys are not simply convenient for aggregating numerical data—they are also useful for summarizing text data too. Let's return again to our National/Regional Death/Case Ratio by Month table. This time, we'll leverage a lambda function within  our agg function, expanding its flexibility to its fullest extent.


(
   covid_ts
   .groupby(["Country/Region",
   pd.Grouper(key="ObservationDate", freq="1M"), ])
   .agg(Total_Deaths=("Deaths", "sum"),
        Total_Cases=("Confirmed", "sum"),
        City_States=('Province/State', lambda x: ', '.join(set(x)))
           )
   .assign(
Death_Case_Ratio=lambda x: round(x['Total_Deaths']/x['Total_Cases']*100,2)           )
   .reset_index()
   .rename({'Total_Deaths':'Total Deaths',
            'Total_Cases':'Total Cases',
            'City_States':'City/States',
            'Death_Case_Ratio':'Death/Case Ratio (%)'},
            axis=1)
   .query(""" <code>Country/Region == 'US'""")
       )

Here we generate a "City_States" column (subsequently renamed to "City/States") in which we aggregate "Province/State" text data to the monthly and country level. By implementing a join function, and filtering redundant data with set, we are instructing pandas to list out which cities and towns comprise our data. To visualize this, a filter was also added in the form of Pandas' powerful query function, which allows us to filter columns in place using boolean expressions. Here we filtered "Country/Region" to only include the United States. The result of our work can be seen in the US Death/Case Ratio by Month table below. 

US Death/Case Ratio by Month

Country/RegionObservationDateTotal DeathsTotal CasesCity/StatesDeath/Case Ratio (%)
US2020-02-291718Lackland TX Boston MA Humboldt County CA Unassigned Location (From Diamond Princess) Lackland TX (From Diamond Princess) Santa Clara CA San Antonio TX Ashland NE Madison WI San Diego County CA Orange CA Travis CA (From Diamond Princess) Seattle WA Snohomish County WA Tempe AZ Los Angeles CA Chicago IL Omaha NE (From Diamond Princess) Sacramento County CA San Benito CA Portland OR Travis CA0.14
US2020-03-31184471088640Delaware Mississippi San Mateo CA Montgomery County MD Hillsborough FL Collin County TX Illinois Maryland Humboldt County CA Colorado West Virginia Norfolk County MA US Fulton County GA Harris County TX North Dakota Yolo County CA St. Louis County MO Guam Placer County CA Johnson County KS Minnesota Ramsey County MN New Hampshire Chatham County NC Indiana Riverside County CA Volusia County FL American Samoa Tempe AZ New Mexico Alabama Recovered Rockingham County NH Douglas County OR Omaha NE (From Diamond Princess) Fayette County KY San Benito CA Alameda County CA Plymouth County MA Bergen County NJ Floyd County GA Kershaw County SC Harford County MD Utah Saratoga County NY Oregon Polk County GA Rockland County NY Montgomery County PA Montana Unknown Location MA Marion County OR Santa Clara CA Wyoming Grand Princess Cruise Ship Idaho Jefferson County WA Washington D.C. Hawaii Washington Spokane County WA Grand Princess Virginia Nebraska Douglas County CO Davidson County TN Douglas County NE District of Columbia Berkeley CA Shelby County TN Iowa Los Angeles CA Manatee County FL Madera County CA Fort Bend County TX Chicago IL Pennsylvania United States Virgin Islands Summit County CO Wisconsin Maricopa County AZ Shasta County CA Maine Arkansas Grafton County NH Clark County WA Cherokee County GA New York City NY Honolulu County HI Rhode Island Louisiana South Carolina Norwell County MA Lackland TX (From Diamond Princess) Jefferson Parish LA San Antonio TX Norfolk County MA Montgomery County TX Arizona Texas Orange County CA Massachusetts San Diego County CA Madison WI Queens County NY Sarasota FL Pierce County WA Ulster County NY Orange CA Delaware County PA Nevada Cook County IL Unassigned Location VT Carver County MN South Dakota Wake County NC Clark County NV Harrison County KY Snohomish County WA Wuhan Evacuee Washoe County NV Pinal County AZ Providence County RI Connecticut El Paso County CO Berkshire County MA Diamond Princess cruise ship Tulsa County OK San Francisco County CA Bennington County VT Santa Rosa County FL Fresno County CA Sacramento County CA Unassigned Location WA Klamath County OR Williamson County TN Denver County CO Broward County FL Santa Cruz County CA New Jersey Suffolk County MA California New York County NY Jefferson County KY Hudson County NJ New York Charlotte County FL Grant County WA Johnson County IA Boston MA Spartanburg County SC Unassigned Location (From Diamond Princess) Westchester County NY Middlesex County MA Wayne County PA Missouri Charleston County SC Ohio Lee County FL Providence RI Vermont Nassau County NY Davis County UT Contra Costa County CA Kittitas County WA Tennessee Virgin Islands Washington County OR Hendricks County IN Travis CA (From Diamond Princess) Fairfield County CT Jackson County OR Marion County IN Cobb County GA Seattle WA North Carolina Kansas Suffolk County NY King County WA Okaloosa County FL Michigan Umatilla OR Puerto Rico Fairfax County VA Florida Sonoma County CA Oklahoma Alaska Santa Clara County CA Georgia Virgin Islands U.S. Kentucky Portland OR Northern Mariana Islands1.69
US2020-04-3097176619534148Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California4.97
US2020-05-31270696945256716Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California5.98
US2020-06-30349938665059848Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California5.38
US2020-07-314307847111580930Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California3.86
US2020-08-315265411166786671Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California3.16
US2020-09-305887121199653685Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California2.95
US2020-10-316790817252160215Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California2.69
US2020-11-307501823340017832Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California2.21
US2020-12-319676987528139775Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California1.83
US2021-01-3112443946730894746Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California1.7
US2021-02-2813633706774254261Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California1.76
US2021-03-3116649633916643837Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California1.82
US2021-04-3016967994944543693Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California1.8
US2021-05-3116981908953529915Delaware Mississippi Illinois Oregon Arkansas Louisiana Maryland New York Rhode Island South Carolina Colorado West Virginia Montana Missouri North Dakota Ohio Arizona Wyoming Idaho Texas Vermont Massachusetts Guam Hawaii Nevada Washington Tennessee Virgin Islands Minnesota Utah New Hampshire South Dakota Grand Princess Nebraska Indiana Virginia District of Columbia Kansas North Carolina Alabama Michigan New Mexico Iowa Connecticut Recovered Diamond Princess cruise ship Puerto Rico Oklahoma Florida Alaska Georgia Pennsylvania Wisconsin New Jersey Kentucky Maine Northern Mariana Islands California1.78

Summary

As we can see, pandas groupbys are far more flexible than they are typically used for. We have seen that when we leverage functions such as Grouper, we are able to aggregate timeseries data using offset aliases. By implementing the agg function with named aggregations, we can reference and manipulate these new columns in place by appending an assign function to our code. Finally, we have seen that even text need not be ignored, thanks to our ability to use lambda functions within agg functions as well. To top it off, we saw that Pandas' powerful query function allows us to filter our data to whatever granularity we'd like using boolean expressions.

Leave a Reply

Your email address will not be published. Required fields are marked *