Overview

Dataset Statistics

Number of Variables 14
Number of Rows 792
Missing Cells 2749
Missing Cells (%) 24.8%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 371.8 KB
Average Row Size in Memory 480.7 B
Variable Types
  • Categorical: 5
  • Numerical: 9

Dataset Insights

Gross Charge and Humana have similar distributions Similar Distribution
Gross Charge has 648 (81.82%) missing values Missing
Aetna has 400 (50.51%) missing values Missing
BCBS has 30 (3.79%) missing values Missing
Cigna has 327 (41.29%) missing values Missing
UHC has 237 (29.92%) missing values Missing
Medcost has 498 (62.88%) missing values Missing
Humana has 609 (76.89%) missing values Missing
Gross Charge is skewed Skewed
Aetna is skewed Skewed
BCBS is skewed Skewed
Cigna is skewed Skewed
UHC is skewed Skewed
Medcost is skewed Skewed
Humana is skewed Skewed
De-Identified Minimum is skewed Skewed
De-Identified Maximum is skewed Skewed
CPT/MS-DRG has a high cardinality: 656 distinct values High Cardinality
Procedure Description has a high cardinality: 747 distinct values High Cardinality
system has constant value "NHRMC" Constant
system has constant length 5 Constant Length
  • 1
  • 2
  • 3

Variables

CPT/MS-DRG

categorical

Approximate Distinct Count 656
Approximate Unique (%) 82.8%
Missing 0
Missing (%) 0.0%
Memory Size 52.7 KB

Length

Mean 3.1465
Standard Deviation 0.5611
Median 3
Minimum 1
Maximum 4

Sample

1st row 1
2nd row 3
3rd row 4
4th row 12
5th row 13

Letter

Count 7
Lowercase Letter 0
Space Separator 0
Uppercase Letter 7
Dash Punctuation 0
Decimal Number 2485

Patient Type

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 57.4 KB
  • The largest value (Inpatient) is over 3.21 times larger than the second largest value (Outpatient)

Length

Mean 9.2374
Standard Deviation 0.4257
Median 9
Minimum 9
Maximum 10

Sample

1st row Inpatient
2nd row Inpatient
3rd row Inpatient
4th row Inpatient
5th row Inpatient

Letter

Count 7316
Lowercase Letter 6524
Space Separator 0
Uppercase Letter 792
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Inpatient, Outpatient) take over 50.0%
  • The largest value (inpatient) is over 3.21 times larger than the second largest value (outpatient)

Gross Charge

numerical

Approximate Distinct Count 144
Approximate Unique (%) 100.0%
Missing 648
Missing (%) 81.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 2.2 KB
Mean 15574.6052
Minimum 125.55
Maximum 216838.06
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Gross Charge is skewed right (γ1 = 5.3931)

Quantile Statistics

Minimum 125.55
5-th Percentile 735.2126
Q1 3733.0512
Median 10193.1492
Q3 18117.8691
95-th Percentile 32935.026
Maximum 216838.06
Range 216712.51
IQR 14384.8178

Descriptive Statistics

Mean 15574.6052
Standard Deviation 23874.9706
Variance 5.7001e+08
Sum 2.2427e+06
Skewness 5.3931
Kurtosis 37.4234
Coefficient of Variation 1.5329
  • Gross Charge is not normally distributed (p-value 5.053227745443528e-13)
  • Gross Charge has 7 outliers

Procedure Description

categorical

Approximate Distinct Count 747
Approximate Unique (%) 94.3%
Missing 0
Missing (%) 0.0%
Memory Size 85.2 KB

Length

Mean 45.1326
Standard Deviation 19.5322
Median 42
Minimum 8
Maximum 100

Sample

1st row Heart Transplant O...
2nd row Ecmo Or Tracheosto...
3rd row Tracheostomy With ...
4th row Tracheostomy For F...
5th row Tracheostomy For F...

Letter

Count 30737
Lowercase Letter 22813
Space Separator 4318
Uppercase Letter 7924
Dash Punctuation 101
Decimal Number 193

Aetna

numerical

Approximate Distinct Count 325
Approximate Unique (%) 82.9%
Missing 400
Missing (%) 50.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 6.1 KB
Mean 21616.2838
Minimum 97
Maximum 567266
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Aetna is skewed right (γ1 = 6.9902)

Quantile Statistics

Minimum 97
5-th Percentile 577.95
Q1 4502.25
Median 8025.75
Q3 21750
95-th Percentile 83815.6
Maximum 567266
Range 567169
IQR 17247.75

Descriptive Statistics

Mean 21616.2838
Standard Deviation 43618.1075
Variance 1.9025e+09
Sum 8.4736e+06
Skewness 6.9902
Kurtosis 70.6246
Coefficient of Variation 2.0178
  • Aetna is not normally distributed (p-value 1.6338012617065157e-23)
  • Aetna has 45 outliers

BCBS

numerical

Approximate Distinct Count 678
Approximate Unique (%) 89.0%
Missing 30
Missing (%) 3.8%
Infinite 0
Infinite (%) 0.0%
Memory Size 11.9 KB
Mean 18188.5898
Minimum 49
Maximum 193683
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • BCBS is skewed right (γ1 = 3.4045)

Quantile Statistics

Minimum 49
5-th Percentile 494.3
Q1 6016.5
Median 11409
Q3 21978.75
95-th Percentile 62380.65
Maximum 193683
Range 193634
IQR 15962.25

Descriptive Statistics

Mean 18188.5898
Standard Deviation 22417.338
Variance 5.0254e+08
Sum 1.386e+07
Skewness 3.4045
Kurtosis 15.6762
Coefficient of Variation 1.2325
  • BCBS is not normally distributed (p-value 2.88401507318874e-10)
  • BCBS has 56 outliers

Cigna

numerical

Approximate Distinct Count 446
Approximate Unique (%) 95.9%
Missing 327
Missing (%) 41.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 7.3 KB
Mean 18108.3312
Minimum 60
Maximum 365997
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Cigna is skewed right (γ1 = 5.4272)

Quantile Statistics

Minimum 60
5-th Percentile 373.2
Q1 3625
Median 9800
Q3 20859
95-th Percentile 63749.6
Maximum 365997
Range 365937
IQR 17234

Descriptive Statistics

Mean 18108.3312
Standard Deviation 28790.6787
Variance 8.289e+08
Sum 8.4204e+06
Skewness 5.4272
Kurtosis 49.2301
Coefficient of Variation 1.5899
  • Cigna is not normally distributed (p-value 8.131124264494371e-19)
  • Cigna has 46 outliers

UHC

numerical

Approximate Distinct Count 511
Approximate Unique (%) 92.1%
Missing 237
Missing (%) 29.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 8.7 KB
Mean 22258.4955
Minimum 47
Maximum 371175
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • UHC is skewed right (γ1 = 4.5433)

Quantile Statistics

Minimum 47
5-th Percentile 438.7
Q1 4157
Median 13373
Q3 28030
95-th Percentile 76608
Maximum 371175
Range 371128
IQR 23873

Descriptive Statistics

Mean 22258.4955
Standard Deviation 32037.4781
Variance 1.0264e+09
Sum 1.2353e+07
Skewness 4.5433
Kurtosis 33.7281
Coefficient of Variation 1.4393
  • UHC is not normally distributed (p-value 1.641365078269059e-17)
  • UHC has 38 outliers

Medcost

numerical

Approximate Distinct Count 290
Approximate Unique (%) 98.6%
Missing 498
Missing (%) 62.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 4.6 KB
Mean 14876.3435
Minimum 90
Maximum 240757
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Medcost is skewed right (γ1 = 5.1977)

Quantile Statistics

Minimum 90
5-th Percentile 367.55
Q1 2425.75
Median 7961
Q3 15476.75
95-th Percentile 51197.05
Maximum 240757
Range 240667
IQR 13051

Descriptive Statistics

Mean 14876.3435
Standard Deviation 26285.0903
Variance 6.9091e+08
Sum 4.3736e+06
Skewness 5.1977
Kurtosis 34.345
Coefficient of Variation 1.7669
  • Medcost is not normally distributed (p-value 9.442936925692483e-17)
  • Medcost has 23 outliers

Humana

numerical

Approximate Distinct Count 183
Approximate Unique (%) 100.0%
Missing 609
Missing (%) 76.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 2.9 KB
Mean 12524.7104
Minimum 102
Maximum 175639
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Humana is skewed right (γ1 = 4.6509)

Quantile Statistics

Minimum 102
5-th Percentile 318
Q1 2408
Median 7271
Q3 14621
95-th Percentile 40360
Maximum 175639
Range 175537
IQR 12213

Descriptive Statistics

Mean 12524.7104
Standard Deviation 19692.7263
Variance 3.878e+08
Sum 2.292e+06
Skewness 4.6509
Kurtosis 28.9455
Coefficient of Variation 1.5723
  • Humana is not normally distributed (p-value 1.4247518240379277e-16)
  • Humana has 12 outliers

De-Identified Minimum

numerical

Approximate Distinct Count 710
Approximate Unique (%) 89.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.4 KB
Mean 16487.8179
Minimum 47
Maximum 432352
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • De-Identified Minimum is skewed right (γ1 = 7.1281)

Quantile Statistics

Minimum 47
5-th Percentile 349.45
Q1 4306.5
Median 9633.5
Q3 19435.25
95-th Percentile 60419.85
Maximum 432352
Range 432305
IQR 15128.75

Descriptive Statistics

Mean 16487.8179
Standard Deviation 25674.6163
Variance 6.5919e+08
Sum 1.3058e+07
Skewness 7.1281
Kurtosis 91.6283
Coefficient of Variation 1.5572
  • De-Identified Minimum is not normally distributed (p-value 1.1759459734588933e-19)
  • De-Identified Minimum has 63 outliers

De-Identified Maximum

numerical

Approximate Distinct Count 784
Approximate Unique (%) 99.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 12.4 KB
Mean 31569.3731
Minimum 102
Maximum 567266
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • De-Identified Maximum is skewed right (γ1 = 5.0602)

Quantile Statistics

Minimum 102
5-th Percentile 1111.4
Q1 10044.75
Median 19171
Q3 35460.25
95-th Percentile 104131.55
Maximum 567266
Range 567164
IQR 25415.5

Descriptive Statistics

Mean 31569.3731
Standard Deviation 44186.6803
Variance 1.9525e+09
Sum 2.5003e+07
Skewness 5.0602
Kurtosis 41.3507
Coefficient of Variation 1.3997
  • De-Identified Maximum is not normally distributed (p-value 2.5653096991247994e-15)
  • De-Identified Maximum has 76 outliers

Filename

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 97.5 KB
  • The largest value (340141-new-hanover-regional-medical-center-standard-charges.csv) is over 4.5 times larger than the second largest value (341307-pender-memorial-hospital-standard-charges.csv)

Length

Mean 61
Standard Deviation 4.2453
Median 63
Minimum 52
Maximum 63

Sample

1st row 340141-new-hanover...
2nd row 340141-new-hanover...
3rd row 340141-new-hanover...
4th row 340141-new-hanover...
5th row 340141-new-hanover...

Letter

Count 37512
Lowercase Letter 37512
Space Separator 0
Uppercase Letter 0
Dash Punctuation 5256
Decimal Number 4752
  • The top 2 categories (340141-new-hanover-regional-medical-center-standard-charges.csv, 341307-pender-memorial-hospital-standard-charges.csv) take over 50.0%
  • The largest value (340141newhanoverregionalmedicalcenterstandardchargescsv) is over 4.5 times larger than the second largest value (341307pendermemorialhospitalstandardchargescsv)

system

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 54.1 KB

Length

Mean 5
Standard Deviation 0
Median 5
Minimum 5
Maximum 5

Sample

1st row NHRMC
2nd row NHRMC
3rd row NHRMC
4th row NHRMC
5th row NHRMC

Letter

Count 3960
Lowercase Letter 0
Space Separator 0
Uppercase Letter 3960
Dash Punctuation 0
Decimal Number 0
  • system has words of constant length

Interactions

Correlations

Missing Values