processing date: 2024-06-21 12:56
reading data/data.csv, df.shape=(240, 59)
duplicated record_ids
5 4ce196bbdff36bf4da47b365a8433a16
10 0ba8ee0a0749ef417117a7bbbfea36ae
14 260b69dec89fd672a0ef4b415cab1e05
22 dd9bafda99effda2f73e9aa54ee5441d
25 79fb5958917a2748a9df5c1128e013df
32 65ef887f715f8e2f5f1743bdf0e22674
41 50132229c6f4a8088cc03113436e9d62
49 90f691d8a80a7a0771f4fc50620cec83
58 cc9d73a375f2bb847dfae3f900a3b356
65 2374b05459a6787a8860a7e9674d7a84
71 2d65c3d011f0e383240fbf710f7b76c2
74 276ffcbcd7c9774d786571571b127e4f
106 75c42d0c8f8d19a87dd7e55a8adb6aaf
119 c903ea2f39cf07d3b0c4b96f2516bf0d
130 a73216979646725244e7faa3db08ea1d
134 1c5c4c703c25611a04a8557579b49ce7
137 51219526fc66f7d3aa94fa75ff83fa17
139 829b032c4f5e0a790da4453267842464
142 9be0659ff7e603150cd841c67d6e418c
144 a0998c640bb39c4d66d74771863b0ab0
187 6a3b0bb0617eeb3637ea43b6a27d4a70
Name: record_id, dtype: object
240 rows read with 219 unique values
219 after removing duplicates
reading data/color.csv
WARNING in ned -- confirming "5+ yrs" for: 91.0
WARNING in ned -- confirming "5+ yrs" for: 82.0
WARNING in ned -- confirming "5+ yrs" for: 68.0
WARNING in ned -- confirming "5+ yrs" for: 115.0
WARNING in ned -- confirming "5+ yrs" for: 121.0
WARNING in ned -- confirming "5+ yrs" for: 92.0
WARNING in ned -- confirming "5+ yrs" for: 94.0
WARNING in ned -- confirming "5+ yrs" for: 67.0
WARNING in ned -- confirming "5+ yrs" for: 81.0
WARNING in ned -- confirming "5+ yrs" for: 77.0
WARNING in ned -- confirming "5+ yrs" for: 76.0
WARNING in ned -- confirming "5+ yrs" for: 68.0
WARNING in ned -- confirming "5+ yrs" for: 68.0
WARNING in ned -- confirming "5+ yrs" for: 317.0
WARNING in ned -- confirming "5+ yrs" for: 125.0
WARNING in ned -- confirming "5+ yrs" for: 88.0
WARNING in ned -- confirming "5+ yrs" for: 168.0
WARNING in ned -- confirming "5+ yrs" for: 147.0
WARNING in ned -- confirming "5+ yrs" for: 311.0
WARNING in ned -- confirming "5+ yrs" for: 102.0
WARNING in ned -- confirming "5+ yrs" for: 126.0
WARNING in ned -- confirming "5+ yrs" for: 110.0
WARNING in ned -- confirming "5+ yrs" for: 66.0
WARNING in ned -- confirming "5+ yrs" for: 73.0
WARNING in ned -- confirming "5+ yrs" for: 80.0
WARNING in ned -- confirming "5+ yrs" for: 168.0
WARNING in ned -- confirming "5+ yrs" for: 117.0
WARNING in ned -- confirming "5+ yrs" for: 121.0
WARNING in ned -- confirming "5+ yrs" for: 119.0
WARNING in ned -- confirming "5+ yrs" for: 82.0
WARNING in ned -- confirming "5+ yrs" for: 79.0
WARNING in ned -- confirming "5+ yrs" for: 88.0
WARNING in ned -- confirming "5+ yrs" for: 74.0
WARNING in ned -- confirming "5+ yrs" for: 64.0
WARNING in ned -- confirming "5+ yrs" for: 92.0
Null values: 57 out of 219
WARNING in age_group -- missing data -- returning None for: nan
Null values: 1 out of 219
WARNING in cs_visit, returning 2 for cancer_stage_visit: Responding
WARNING in cs_visit, returning 2 for cancer_stage_visit: Responding
WARNING in cs_visit, returning 2 for cancer_stage_visit: Local or regional recurrence/relapse
WARNING in cs_visit, returning 2 for cancer_stage_visit: Progressive disease
WARNING in cs_visit, returning 2 for cancer_stage_visit: 0, no evidence
WARNING in cs_visit, returning 2 for cancer_stage_visit: Progressive disease
WARNING in cs_visit, returning 2 for cancer_stage_visit: Angioimmunoblastic T-cell lymphoma
WARNING in cs_visit, returning 2 for cancer_stage_visit: recurrent pilocytic astrocytoma s/p - chemotherapy x2, now on observation
WARNING in cs_visit, returning 2 for cancer_stage_visit: Grade II oligodendroglioma
Null values: 124 out of 219
-------- summary by question group ---------
219 entries processed on 2024-06-21 12:56
age gender ecog ned_time cs cs_visit
count 218.000 217.000 218.000 162.000 185.000 95.000
mean 61.032 1.355 0.826 40.352 2.919 0.453
std 17.257 0.480 0.532 47.314 0.932 0.665
min 20.000 1.000 0.000 0.000 1.000 0.000
25% 50.250 1.000 1.000 8.250 2.000 0.000
50% 65.000 1.000 1.000 30.000 3.000 0.000
75% 74.000 2.000 1.000 51.500 4.000 1.000
max 89.000 2.000 2.000 317.000 4.000 2.000
-------- summary by question ---------
219 entries processed on 2024-06-21 12:56
b1 b2 b3
count 219.000 218.000 219.000
mean 1.338 0.830 0.822
std 1.151 0.981 0.977
min 0.000 0.000 0.000
25% 0.000 0.000 0.000
50% 1.000 1.000 1.000
75% 2.000 1.000 1.000
max 4.000 4.000 4.000
c1 c2 c3 c4 c5 c6 c7 c8 c9
count 219.000 217.000 215.000 217.00 217.000 212.000 212.000 211.000 212.000
mean 1.717 1.415 2.070 2.70 2.622 1.726 2.080 1.118 2.750
std 0.944 1.020 1.119 1.25 1.267 1.328 1.306 1.187 1.168
min 0.000 0.000 0.000 0.00 0.000 0.000 0.000 0.000 0.000
25% 1.000 1.000 1.000 2.00 2.000 0.000 1.000 0.000 2.000
50% 2.000 1.000 2.000 3.00 3.000 2.000 2.000 1.000 3.000
75% 2.000 2.000 3.000 4.00 4.000 3.000 3.000 2.000 4.000
max 4.000 4.000 4.000 4.00 4.000 4.000 4.000 4.000 4.000
d1 d2 d3 d4 d5 d6 d7
count 214.000 215.000 213.000 213.000 210.000 216.000 216.000
mean 2.505 2.381 2.399 1.061 2.390 2.681 2.722
std 0.690 0.757 0.697 0.967 0.649 0.496 0.543
min 0.000 0.000 0.000 0.000 0.000 1.000 0.000
25% 2.000 2.000 2.000 0.000 2.000 2.000 3.000
50% 3.000 3.000 2.000 1.000 2.000 3.000 3.000
75% 3.000 3.000 3.000 2.000 3.000 3.000 3.000
max 3.000 3.000 3.000 3.000 3.000 3.000 3.000
e1 e2 e3 e4 e5 e6 e7
count 211.000 212.000 212.000 212.000 211.000 211.000 212.000
mean 2.289 2.127 1.623 2.118 3.028 1.630 1.514
std 1.045 1.001 0.876 1.131 1.099 1.031 0.873
min 1.000 1.000 1.000 1.000 1.000 1.000 1.000
25% 1.000 1.000 1.000 1.000 2.000 1.000 1.000
50% 2.000 2.000 1.000 2.000 3.000 1.000 1.000
75% 3.000 3.000 2.000 3.000 4.000 2.000 2.000
max 4.000 4.000 4.000 4.000 4.000 4.000 4.000
tidy data: merged.shape=(5694, 17)
writing data/tidy.csv