report

processing date: 2024-06-21 12:56
reading data/data.csv, df.shape=(240, 59)
duplicated record_ids
5      4ce196bbdff36bf4da47b365a8433a16
10     0ba8ee0a0749ef417117a7bbbfea36ae
14     260b69dec89fd672a0ef4b415cab1e05
22     dd9bafda99effda2f73e9aa54ee5441d
25     79fb5958917a2748a9df5c1128e013df
32     65ef887f715f8e2f5f1743bdf0e22674
41     50132229c6f4a8088cc03113436e9d62
49     90f691d8a80a7a0771f4fc50620cec83
58     cc9d73a375f2bb847dfae3f900a3b356
65     2374b05459a6787a8860a7e9674d7a84
71     2d65c3d011f0e383240fbf710f7b76c2
74     276ffcbcd7c9774d786571571b127e4f
106    75c42d0c8f8d19a87dd7e55a8adb6aaf
119    c903ea2f39cf07d3b0c4b96f2516bf0d
130    a73216979646725244e7faa3db08ea1d
134    1c5c4c703c25611a04a8557579b49ce7
137    51219526fc66f7d3aa94fa75ff83fa17
139    829b032c4f5e0a790da4453267842464
142    9be0659ff7e603150cd841c67d6e418c
144    a0998c640bb39c4d66d74771863b0ab0
187    6a3b0bb0617eeb3637ea43b6a27d4a70
Name: record_id, dtype: object 

240 rows read with 219 unique values
219 after removing duplicates
reading data/color.csv

Creating "ned" from "ned_time"

WARNING in ned -- confirming "5+ yrs" for: 91.0
WARNING in ned -- confirming "5+ yrs" for: 82.0
WARNING in ned -- confirming "5+ yrs" for: 68.0
WARNING in ned -- confirming "5+ yrs" for: 115.0
WARNING in ned -- confirming "5+ yrs" for: 121.0
WARNING in ned -- confirming "5+ yrs" for: 92.0
WARNING in ned -- confirming "5+ yrs" for: 94.0
WARNING in ned -- confirming "5+ yrs" for: 67.0
WARNING in ned -- confirming "5+ yrs" for: 81.0
WARNING in ned -- confirming "5+ yrs" for: 77.0
WARNING in ned -- confirming "5+ yrs" for: 76.0
WARNING in ned -- confirming "5+ yrs" for: 68.0
WARNING in ned -- confirming "5+ yrs" for: 68.0
WARNING in ned -- confirming "5+ yrs" for: 317.0
WARNING in ned -- confirming "5+ yrs" for: 125.0
WARNING in ned -- confirming "5+ yrs" for: 88.0
WARNING in ned -- confirming "5+ yrs" for: 168.0
WARNING in ned -- confirming "5+ yrs" for: 147.0
WARNING in ned -- confirming "5+ yrs" for: 311.0
WARNING in ned -- confirming "5+ yrs" for: 102.0
WARNING in ned -- confirming "5+ yrs" for: 126.0
WARNING in ned -- confirming "5+ yrs" for: 110.0
WARNING in ned -- confirming "5+ yrs" for: 66.0
WARNING in ned -- confirming "5+ yrs" for: 73.0
WARNING in ned -- confirming "5+ yrs" for: 80.0
WARNING in ned -- confirming "5+ yrs" for: 168.0
WARNING in ned -- confirming "5+ yrs" for: 117.0
WARNING in ned -- confirming "5+ yrs" for: 121.0
WARNING in ned -- confirming "5+ yrs" for: 119.0
WARNING in ned -- confirming "5+ yrs" for: 82.0
WARNING in ned -- confirming "5+ yrs" for: 79.0
WARNING in ned -- confirming "5+ yrs" for: 88.0
WARNING in ned -- confirming "5+ yrs" for: 74.0
WARNING in ned -- confirming "5+ yrs" for: 64.0
WARNING in ned -- confirming "5+ yrs" for: 92.0
Null values: 57 out of 219

Creating "age_group" from "age"

WARNING in age_group -- missing data -- returning None for: nan
Null values: 1 out of 219

Creating "cs_visit" from "cancer_state_visit"

WARNING in cs_visit, returning 2 for cancer_stage_visit: Responding
WARNING in cs_visit, returning 2 for cancer_stage_visit: Responding
WARNING in cs_visit, returning 2 for cancer_stage_visit: Local or regional recurrence/relapse
WARNING in cs_visit, returning 2 for cancer_stage_visit: Progressive disease
WARNING in cs_visit, returning 2 for cancer_stage_visit: 0, no evidence
WARNING in cs_visit, returning 2 for cancer_stage_visit: Progressive disease
WARNING in cs_visit, returning 2 for cancer_stage_visit: Angioimmunoblastic T-cell lymphoma
WARNING in cs_visit, returning 2 for cancer_stage_visit: recurrent pilocytic astrocytoma s/p - chemotherapy  x2, now on observation
WARNING in cs_visit, returning 2 for cancer_stage_visit: Grade II oligodendroglioma
Null values: 124 out of 219

summary stats


-------- summary by question group ---------

219 entries processed on 2024-06-21 12:56 

           age   gender     ecog  ned_time       cs  cs_visit
count  218.000  217.000  218.000   162.000  185.000    95.000
mean    61.032    1.355    0.826    40.352    2.919     0.453
std     17.257    0.480    0.532    47.314    0.932     0.665
min     20.000    1.000    0.000     0.000    1.000     0.000
25%     50.250    1.000    1.000     8.250    2.000     0.000
50%     65.000    1.000    1.000    30.000    3.000     0.000
75%     74.000    2.000    1.000    51.500    4.000     1.000
max     89.000    2.000    2.000   317.000    4.000     2.000

-------- summary by question ---------

219 entries processed on 2024-06-21 12:56

             b1       b2       b3
count  219.000  218.000  219.000
mean     1.338    0.830    0.822
std      1.151    0.981    0.977
min      0.000    0.000    0.000
25%      0.000    0.000    0.000
50%      1.000    1.000    1.000
75%      2.000    1.000    1.000
max      4.000    4.000    4.000

             c1       c2       c3      c4       c5       c6       c7       c8       c9
count  219.000  217.000  215.000  217.00  217.000  212.000  212.000  211.000  212.000
mean     1.717    1.415    2.070    2.70    2.622    1.726    2.080    1.118    2.750
std      0.944    1.020    1.119    1.25    1.267    1.328    1.306    1.187    1.168
min      0.000    0.000    0.000    0.00    0.000    0.000    0.000    0.000    0.000
25%      1.000    1.000    1.000    2.00    2.000    0.000    1.000    0.000    2.000
50%      2.000    1.000    2.000    3.00    3.000    2.000    2.000    1.000    3.000
75%      2.000    2.000    3.000    4.00    4.000    3.000    3.000    2.000    4.000
max      4.000    4.000    4.000    4.00    4.000    4.000    4.000    4.000    4.000

             d1       d2       d3       d4       d5       d6       d7
count  214.000  215.000  213.000  213.000  210.000  216.000  216.000
mean     2.505    2.381    2.399    1.061    2.390    2.681    2.722
std      0.690    0.757    0.697    0.967    0.649    0.496    0.543
min      0.000    0.000    0.000    0.000    0.000    1.000    0.000
25%      2.000    2.000    2.000    0.000    2.000    2.000    3.000
50%      3.000    3.000    2.000    1.000    2.000    3.000    3.000
75%      3.000    3.000    3.000    2.000    3.000    3.000    3.000
max      3.000    3.000    3.000    3.000    3.000    3.000    3.000

             e1       e2       e3       e4       e5       e6       e7
count  211.000  212.000  212.000  212.000  211.000  211.000  212.000
mean     2.289    2.127    1.623    2.118    3.028    1.630    1.514
std      1.045    1.001    0.876    1.131    1.099    1.031    0.873
min      1.000    1.000    1.000    1.000    1.000    1.000    1.000
25%      1.000    1.000    1.000    1.000    2.000    1.000    1.000
50%      2.000    2.000    1.000    2.000    3.000    1.000    1.000
75%      3.000    3.000    2.000    3.000    4.000    2.000    2.000
max      4.000    4.000    4.000    4.000    4.000    4.000    4.000

tidy data: merged.shape=(5694, 17)
writing data/tidy.csv