Overview

Dataset Statistics

Number of Variables 10
Number of Rows 76272
Missing Cells 201168
Missing Cells (%) 26.4%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 31.4 MB
Average Row Size in Memory 431.7 B
Variable Types
  • Categorical: 6
  • Numerical: 4

Dataset Insights

Cash_Discount has 26307 (34.49%) missing values Missing
DeIdentified_Max_Allowed has 31524 (41.33%) missing values Missing
Deidentified_Min_Allowed has 38176 (50.05%) missing values Missing
description has 18252 (23.93%) missing values Missing
iobSelection has 26307 (34.49%) missing values Missing
payer has 26307 (34.49%) missing values Missing
Payer_Allowed_Amount has 34286 (44.95%) missing values Missing
Cash_Discount is skewed Skewed
DeIdentified_Max_Allowed is skewed Skewed
Deidentified_Min_Allowed is skewed Skewed
Payer_Allowed_Amount is skewed Skewed
Associated_Codes has a high cardinality: 3185 distinct values High Cardinality
description has a high cardinality: 10873 distinct values High Cardinality
Gross_Charge has a high cardinality: 23308 distinct values High Cardinality
Filename has constant value "wakemed" Constant
Filename has constant length 7 Constant Length
Payer_Allowed_Amount has 29125 (38.19%) zeros Zeros
  • 1
  • 2

Variables

Associated_Codes

categorical

Approximate Distinct Count 3185
Approximate Unique (%) 4.2%
Missing 0
Missing (%) 0.0%
Memory Size 5.2 MB

Length

Mean 7.1343
Standard Deviation 6.6073
Median 5
Minimum 3
Maximum 143

Sample

1st row 0100
2nd row 0100
3rd row 0100
4th row 0100
5th row 0100

Letter

Count 6473
Lowercase Letter 0
Space Separator 526
Uppercase Letter 6473
Dash Punctuation 1575
Decimal Number 504351
  • Associated_Codes contains many words: 3202 words

Cash_Discount

numerical

Approximate Distinct Count 3214
Approximate Unique (%) 6.4%
Missing 26307
Missing (%) 34.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 780.7 KB
Mean 9475.2687
Minimum 0
Maximum 531895.93
Zeros 320
Zeros (%) 0.4%
Negatives 0
Negatives (%) 0.0%
  • Cash_Discount is skewed right (γ1 = 7.2575)

Quantile Statistics

Minimum 0
5-th Percentile 32.27
Q1 152.25
Median 1496.82
Q3 8352.76
95-th Percentile 36664.53
Maximum 531895.93
Range 531895.93
IQR 8200.51

Descriptive Statistics

Mean 9475.2687
Standard Deviation 25179.4871
Variance 6.3401e+08
Sum 4.7343e+08
Skewness 7.2575
Kurtosis 88.4157
Coefficient of Variation 2.6574
  • Cash_Discount is not normally distributed (p-value 1.406499048734433e-24)
  • Cash_Discount has 5046 outliers

DeIdentified_Max_Allowed

numerical

Approximate Distinct Count 2651
Approximate Unique (%) 5.9%
Missing 31524
Missing (%) 41.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 699.2 KB
Mean 4985.4533
Minimum 0
Maximum 202120.4534
Zeros 666
Zeros (%) 0.9%
Negatives 0
Negatives (%) 0.0%
  • DeIdentified_Max_Allowed is skewed right (γ1 = 5.9694)

Quantile Statistics

Minimum 0
5-th Percentile 20.08
Q1 85.87
Median 1037.38
Q3 5329.62
95-th Percentile 22586.38
Maximum 202120.4534
Range 202120.4534
IQR 5243.75

Descriptive Statistics

Mean 4985.4533
Standard Deviation 11484.5468
Variance 1.3189e+08
Sum 2.2309e+08
Skewness 5.9694
Kurtosis 58.3198
Coefficient of Variation 2.3036
  • DeIdentified_Max_Allowed is not normally distributed (p-value 5.809314524603803e-24)
  • DeIdentified_Max_Allowed has 3694 outliers

Deidentified_Min_Allowed

numerical

Approximate Distinct Count 2043
Approximate Unique (%) 5.4%
Missing 38176
Missing (%) 50.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 595.2 KB
Mean 4656.4835
Minimum 0
Maximum 180018.82
Zeros 1240
Zeros (%) 1.6%
Negatives 0
Negatives (%) 0.0%
  • Deidentified_Min_Allowed is skewed right (γ1 = 6.4206)

Quantile Statistics

Minimum 0
5-th Percentile 5.11
Q1 46.63
Median 699.49
Q3 5384.78
95-th Percentile 18250.65
Maximum 180018.82
Range 180018.82
IQR 5338.15

Descriptive Statistics

Mean 4656.4835
Standard Deviation 11096.7215
Variance 1.2314e+08
Sum 1.7739e+08
Skewness 6.4206
Kurtosis 61.7873
Coefficient of Variation 2.3831
  • Deidentified_Min_Allowed is not normally distributed (p-value 4.612676267902044e-24)
  • Deidentified_Min_Allowed has 2994 outliers

description

categorical

Approximate Distinct Count 10873
Approximate Unique (%) 18.7%
Missing 18252
Missing (%) 23.9%
Memory Size 5.2 MB

Length

Mean 29.4657
Standard Deviation 23.7025
Median 27
Minimum 3
Maximum 124

Sample

1st row Room & Board
2nd row Room & Board
3rd row Room & Board
4th row Room & Board
5th row Room & Board

Letter

Count 1260926
Lowercase Letter 575574
Space Separator 217416
Uppercase Letter 685352
Dash Punctuation 28110
Decimal Number 164827
  • description contains many words: 9884 words

Gross_Charge

categorical

Approximate Distinct Count 23308
Approximate Unique (%) 30.6%
Missing 9
Missing (%) 0.0%
Memory Size 6.6 MB

Length

Mean 25.3307
Standard Deviation 22.4983
Median 10
Minimum 3
Maximum 115

Sample

1st row 28788.621
2nd row 28788.621
3rd row 28788.621
4th row 28788.621
5th row 28788.621

Letter

Count 1387115
Lowercase Letter 767985
Space Separator 228959
Uppercase Letter 619130
Dash Punctuation 23452
Decimal Number 243518
  • Gross_Charge contains many words: 20930 words

iobSelection

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 26307
Missing (%) 34.5%
Memory Size 3.6 MB
  • The largest value (Outpatient) is over 20.52 times larger than the second largest value (Inpatient)

Length

Mean 9.9535
Standard Deviation 0.2105
Median 10
Minimum 9
Maximum 10

Sample

1st row Inpatient
2nd row Inpatient
3rd row Inpatient
4th row Inpatient
5th row Inpatient

Letter

Count 497328
Lowercase Letter 447363
Space Separator 0
Uppercase Letter 49965
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Outpatient, Inpatient) take over 50.0%
  • The largest value (outpatient) is over 20.52 times larger than the second largest value (inpatient)

payer

categorical

Approximate Distinct Count 18
Approximate Unique (%) 0.0%
Missing 26307
Missing (%) 34.5%
Memory Size 4.0 MB

Length

Mean 19.1063
Standard Deviation 8.6594
Median 16
Minimum 8
Maximum 36

Sample

1st row AETNA COMMERCIAL
2nd row AETNA PPO
3rd row BCBS COMMERCIAL
4th row BCBS PPO
5th row CIGNA COMMERCIAL

Letter

Count 866874
Lowercase Letter 0
Space Separator 87771
Uppercase Letter 866874
Dash Punctuation 0
Decimal Number 0

Payer_Allowed_Amount

numerical

Approximate Distinct Count 9325
Approximate Unique (%) 22.2%
Missing 34286
Missing (%) 45.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 656.0 KB
Mean 1527.5666
Minimum 0
Maximum 180018.82
Zeros 29125
Zeros (%) 38.2%
Negatives 0
Negatives (%) 0.0%
  • Payer_Allowed_Amount is skewed right (γ1 = 8.1191)

Quantile Statistics

Minimum 0
5-th Percentile 0
Q1 0
Median 0
Q3 21.395
95-th Percentile 8218.8725
Maximum 180018.82
Range 180018.82
IQR 21.395

Descriptive Statistics

Mean 1527.5666
Standard Deviation 6871.4798
Variance 4.7217e+07
Sum 6.4136e+07
Skewness 8.1191
Kurtosis 91.6821
Coefficient of Variation 4.4983
  • Payer_Allowed_Amount is not normally distributed (p-value 4.53916768600298e-25)
  • Payer_Allowed_Amount has 9106 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 5.2 MB

Length

Mean 7
Standard Deviation 0
Median 7
Minimum 7
Maximum 7

Sample

1st row wakemed
2nd row wakemed
3rd row wakemed
4th row wakemed
5th row wakemed

Letter

Count 533904
Lowercase Letter 533904
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • Filename has words of constant length

Interactions

Correlations

Missing Values