Overview

Dataset Statistics

Number of Variables 21
Number of Rows 2.7529e+06
Missing Cells 2998
Missing Cells (%) 0.0%
Duplicate Rows 0
Duplicate Rows (%) 0.0%
Total Size in Memory 2.7 GB
Average Row Size in Memory 1.0 KB
Variable Types
  • Categorical: 12
  • Numerical: 9

Dataset Insights

PRIMARY_REV_CODE and SUPPORTING_REV_CODE have similar distributions Similar Distribution
PRIMARY_REV_CODE is skewed Skewed
SUPPORTING_REV_CODE is skewed Skewed
PERCENT_OCCURRENCE_WITHIN_PRIMARY_CODE is skewed Skewed
PRICE is skewed Skewed
GROSS_CHARGES is skewed Skewed
CASH_PRICE is skewed Skewed
PAYER_NEGOTIATED_RATE is skewed Skewed
MIN_NEGOTIATED_RATE is skewed Skewed
MAX_NEGOTIATED_RATE is skewed Skewed
PAYER_NAME has a high cardinality: 883 distinct values High Cardinality
PRIMARY_CODE has a high cardinality: 1425 distinct values High Cardinality
PRIMARY_CODE_DESCRIPTION has a high cardinality: 1423 distinct values High Cardinality
PRIMARY_REV_CODE_DESCRIPTION has a high cardinality: 81 distinct values High Cardinality
SUPPORTING_SERVICE_CODE has a high cardinality: 1466 distinct values High Cardinality
SUPPORTING_SERVICE_CODE_DESCRIPTION has a high cardinality: 1422 distinct values High Cardinality
LOCATION has constant value "Northern Regional Medical Center" Constant
BILL_TYPE has constant value "INSTITUTIONAL" Constant
Filename has constant value "northern-regional-hospital_standardcharges.csv" Constant
LOCATION has constant length 32 Constant Length
BILL_TYPE has constant length 13 Constant Length
Filename has constant length 46 Constant Length
  • 1
  • 2
  • 3

Variables

PAYER_GROUP

categorical

Approximate Distinct Count 14
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 201.4 MB
  • The largest value (Blue Cross) is over 1.62 times larger than the second largest value (Other Commercial)

Length

Mean 11.6984
Standard Deviation 4.0672
Median 12
Minimum 5
Maximum 21

Sample

1st row Medicare
2nd row Medicare
3rd row Cigna
4th row Blue Cross
5th row Auto Insurance

Letter

Count 30093059
Lowercase Letter 25228243
Space Separator 2111891
Uppercase Letter 4864816
Dash Punctuation 0
Decimal Number 0

PAYER_NAME

categorical

Approximate Distinct Count 883
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 219.1 MB
  • The largest value (MEDICARE) is over 2.06 times larger than the second largest value (CAROLINA ACCESS MCD NC)

Length

Mean 18.4427
Standard Deviation 6.936
Median 18
Minimum 3
Maximum 30

Sample

1st row MEDICARE
2nd row MEDICARE
3rd row CIGNA/WEB-TPA
4th row ANTHEM BLUE CROSS ...
5th row STATE FARM

Letter

Count 44727683
Lowercase Letter 215075
Space Separator 5386143
Uppercase Letter 44512608
Dash Punctuation 73497
Decimal Number 51977

LOCATION

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 254.7 MB

Length

Mean 32
Standard Deviation 0
Median 32
Minimum 32
Maximum 32

Sample

1st row Northern Regional ...
2nd row Northern Regional ...
3rd row Northern Regional ...
4th row Northern Regional ...
5th row Northern Regional ...

Letter

Count 79834825
Lowercase Letter 68823125
Space Separator 8258775
Uppercase Letter 11011700
Dash Punctuation 0
Decimal Number 0
  • LOCATION has words of constant length

BILL_TYPE

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 204.8 MB

Length

Mean 13
Standard Deviation 0
Median 13
Minimum 13
Maximum 13

Sample

1st row INSTITUTIONAL
2nd row INSTITUTIONAL
3rd row INSTITUTIONAL
4th row INSTITUTIONAL
5th row INSTITUTIONAL

Letter

Count 35788025
Lowercase Letter 0
Space Separator 0
Uppercase Letter 35788025
Dash Punctuation 0
Decimal Number 0
  • BILL_TYPE has words of constant length

PT_SUMMARY

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 202.2 MB

Length

Mean 12.0229
Standard Deviation 1.9999
Median 14
Minimum 10
Maximum 14

Sample

1st row Outpatient
2nd row Outpatient
3rd row Emergency Room
4th row Outpatient
5th row Emergency Room

Letter

Count 31705958
Lowercase Letter 27560797
Space Separator 1392236
Uppercase Letter 4145161
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Emergency Room, Outpatient) take over 50.0%

PACKAGE_TYPE

categorical

Approximate Distinct Count 3
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 204.5 MB
  • The largest value (NON PACKAGED) is over 2.19 times larger than the second largest value (PACKAGED SUBSET)

Length

Mean 12.8818
Standard Deviation 1.4805
Median 15
Minimum 8
Maximum 15

Sample

1st row PACKAGED SUBSET
2nd row PACKAGED SUBSET
3rd row NON PACKAGED
4th row NON PACKAGED
5th row NON PACKAGED

Letter

Count 32741692
Lowercase Letter 0
Space Separator 2721069
Uppercase Letter 32741692
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (NON PACKAGED, PACKAGED SUBSET) take over 50.0%

PRIMARY_CODE

categorical

Approximate Distinct Count 1425
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 183.8 MB

Length

Mean 5
Standard Deviation 0.003998
Median 5
Minimum 3
Maximum 5

Sample

1st row 11102
2nd row 52601
3rd row 00140
4th row 00140
5th row 00140

Letter

Count 612153
Lowercase Letter 0
Space Separator 0
Uppercase Letter 612153
Dash Punctuation 0
Decimal Number 13152450
  • PRIMARY_CODE contains many words: 1425 words

PRIMARY_CODE_DESCRIPTION

categorical

Approximate Distinct Count 1423
Approximate Unique (%) 0.1%
Missing 2
Missing (%) 0.0%
Memory Size 320.1 MB

Length

Mean 56.9118
Standard Deviation 25.7917
Median 72
Minimum 20
Maximum 254

Sample

1st row Tangential biopsy ...
2nd row Electro-removal of...
3rd row Anesthesia for pro...
4th row Anesthesia for pro...
5th row Anesthesia for pro...

Letter

Count 113372615
Lowercase Letter 99604756
Space Separator 20417038
Uppercase Letter 13767859
Dash Punctuation 476983
Decimal Number 14122658
  • PRIMARY_CODE_DESCRIPTION contains many words: 3283 words
  • The largest value (cpt) is over 3.56 times larger than the second largest value (hcpcs)

PRIMARY_REV_CODE

numerical

Approximate Distinct Count 81
Approximate Unique (%) 0.0%
Missing 11
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 400.4887
Minimum 250
Maximum 985
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PRIMARY_REV_CODE is skewed right (γ1 = 1.44)

Quantile Statistics

Minimum 250
5-th Percentile 275
Q1 301
Median 312
Q3 450
95-th Percentile 636
Maximum 985
Range 735
IQR 149

Descriptive Statistics

Mean 400.4887
Standard Deviation 153.7715
Variance 23645.6893
Sum 1.1025e+09
Skewness 1.44
Kurtosis 1.2061
Coefficient of Variation 0.384
  • PRIMARY_REV_CODE is not normally distributed (p-value 1.008828983306467e-20)
  • PRIMARY_REV_CODE has 111093 outliers

PRIMARY_REV_CODE_DESCRIPTION

categorical

Approximate Distinct Count 81
Approximate Unique (%) 0.0%
Missing 11
Missing (%) 0.0%
Memory Size 271.8 MB

Length

Mean 38.5285
Standard Deviation 17.7869
Median 40
Minimum 3
Maximum 91

Sample

1st row Treatment or Obser...
2nd row Operating Room Ser...
3rd row Professional Fees ...
4th row Professional Fees ...
5th row Professional Fees ...

Letter

Count 91895472
Lowercase Letter 80978855
Space Separator 10881703
Uppercase Letter 10916617
Dash Punctuation 2766112
Decimal Number 0

SUPPORTING_REV_CODE

numerical

Approximate Distinct Count 85
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 400.5356
Minimum 250
Maximum 985
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • SUPPORTING_REV_CODE is skewed right (γ1 = 1.4401)

Quantile Statistics

Minimum 250
5-th Percentile 276
Q1 301
Median 312
Q3 450
95-th Percentile 636
Maximum 985
Range 735
IQR 149

Descriptive Statistics

Mean 400.5356
Standard Deviation 153.858
Variance 23672.2918
Sum 1.1026e+09
Skewness 1.4401
Kurtosis 1.2066
Coefficient of Variation 0.3841
  • SUPPORTING_REV_CODE is not normally distributed (p-value 1.0076950589171226e-20)
  • SUPPORTING_REV_CODE has 111585 outliers

PERCENT_OCCURRENCE_WITHIN_PRIMARY_CODE

numerical

Approximate Distinct Count 7252
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 82.6979
Minimum 0.01
Maximum 100
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PERCENT_OCCURRENCE_WITHIN_PRIMARY_CODE is skewed left (γ1 = -1.5896)

Quantile Statistics

Minimum 0.01
5-th Percentile 3.57
Q1 100
Median 100
Q3 100
95-th Percentile 100
Maximum 100
Range 99.99
IQR 0

Descriptive Statistics

Mean 82.6979
Standard Deviation 34.0298
Variance 1158.0269
Sum 2.2766e+08
Skewness -1.5896
Kurtosis 0.7421
Coefficient of Variation 0.4115
  • PERCENT_OCCURRENCE_WITHIN_PRIMARY_CODE is not normally distributed (p-value 5.406964365968387e-25)
  • PERCENT_OCCURRENCE_WITHIN_PRIMARY_CODE has 611073 outliers

PRICE

numerical

Approximate Distinct Count 6725
Approximate Unique (%) 0.2%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 355.5332
Minimum 0.01
Maximum 23360
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PRICE is skewed right (γ1 = 12.1758)

Quantile Statistics

Minimum 0.01
5-th Percentile 9.4914
Q1 41.35
Median 88.74
Q3 260.13
95-th Percentile 1975
Maximum 23360
Range 23359.99
IQR 218.78

Descriptive Statistics

Mean 355.5332
Standard Deviation 954.3373
Variance 910759.6301
Sum 9.7876e+08
Skewness 12.1758
Kurtosis 243.7291
Coefficient of Variation 2.6842
  • PRICE is not normally distributed (p-value 6.603893900522938e-25)
  • PRICE has 366166 outliers

GROSS_CHARGES

numerical

Approximate Distinct Count 10797
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 1437.0515
Minimum 0.01
Maximum 57697
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • GROSS_CHARGES is skewed right (γ1 = 3.8461)

Quantile Statistics

Minimum 0.01
5-th Percentile 19
Q1 66
Median 261
Q3 1536.85
95-th Percentile 7521.7
Maximum 57697
Range 57696.99
IQR 1470.85

Descriptive Statistics

Mean 1437.0515
Standard Deviation 2832.2572
Variance 8.0217e+06
Sum 3.9561e+09
Skewness 3.8461
Kurtosis 22.4808
Coefficient of Variation 1.9709
  • GROSS_CHARGES is not normally distributed (p-value 2.0810650332828073e-24)
  • GROSS_CHARGES has 328004 outliers

CASH_PRICE

numerical

Approximate Distinct Count 10794
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 977.195
Minimum 0.0068
Maximum 39233.96
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • CASH_PRICE is skewed right (γ1 = 3.8461)

Quantile Statistics

Minimum 0.0068
5-th Percentile 12.92
Q1 44.88
Median 177.48
Q3 1045.058
95-th Percentile 5114.756
Maximum 39233.96
Range 39233.9532
IQR 1000.178

Descriptive Statistics

Mean 977.195
Standard Deviation 1925.9349
Variance 3.7092e+06
Sum 2.6901e+09
Skewness 3.8461
Kurtosis 22.4808
Coefficient of Variation 1.9709
  • CASH_PRICE is not normally distributed (p-value 2.0810650332828073e-24)
  • CASH_PRICE has 328004 outliers

PAYER_NEGOTIATED_RATE

numerical

Approximate Distinct Count 17051
Approximate Unique (%) 0.6%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 479.8035
Minimum 0.001942
Maximum 18335.37
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • PAYER_NEGOTIATED_RATE is skewed right (γ1 = 6.0855)

Quantile Statistics

Minimum 0.001942
5-th Percentile 8.5132
Q1 31.5707
Median 99.45
Q3 387.2868
95-th Percentile 2199.38
Maximum 18335.37
Range 18335.3681
IQR 355.7161

Descriptive Statistics

Mean 479.8035
Standard Deviation 1181.6915
Variance 1.3964e+06
Sum 1.3209e+09
Skewness 6.0855
Kurtosis 50.0124
Coefficient of Variation 2.4629
  • PAYER_NEGOTIATED_RATE is not normally distributed (p-value 1.2854740371086676e-24)
  • PAYER_NEGOTIATED_RATE has 368398 outliers

MIN_NEGOTIATED_RATE

numerical

Approximate Distinct Count 1407
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 62.4037
Minimum 0.001942
Maximum 14631.34
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • MIN_NEGOTIATED_RATE is skewed right (γ1 = 19.3675)

Quantile Statistics

Minimum 0.001942
5-th Percentile 1.2
Q1 6.98
Median 13.7852
Q3 44.268
95-th Percentile 305.7989
Maximum 14631.34
Range 14631.3381
IQR 37.288

Descriptive Statistics

Mean 62.4037
Standard Deviation 226.1467
Variance 51142.3422
Sum 1.7179e+08
Skewness 19.3675
Kurtosis 683.0076
Coefficient of Variation 3.6239
  • MIN_NEGOTIATED_RATE is not normally distributed (p-value 4.603260988683077e-25)
  • MIN_NEGOTIATED_RATE has 331154 outliers

MAX_NEGOTIATED_RATE

numerical

Approximate Distinct Count 1493
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 42.0 MB
Mean 3720.7096
Minimum 0.007459
Maximum 32980.06
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • MAX_NEGOTIATED_RATE is skewed right (γ1 = 1.4968)

Quantile Statistics

Minimum 0.007459
5-th Percentile 14.2102
Q1 50.7241
Median 253.44
Q3 6033.09
95-th Percentile 16591.35
Maximum 32980.06
Range 32980.0525
IQR 5982.3659

Descriptive Statistics

Mean 3720.7096
Standard Deviation 5930.1806
Variance 3.5167e+07
Sum 1.0243e+10
Skewness 1.4968
Kurtosis 0.9006
Coefficient of Variation 1.5938
  • MAX_NEGOTIATED_RATE is not normally distributed (p-value 2.1108302080863746e-24)
  • MAX_NEGOTIATED_RATE has 279938 outliers

SUPPORTING_SERVICE_CODE

categorical

Approximate Distinct Count 1466
Approximate Unique (%) 0.1%
Missing 1486
Missing (%) 0.1%
Memory Size 183.8 MB

Length

Mean 5.0476
Standard Deviation 0.305
Median 5
Minimum 5
Maximum 7

Sample

1st row 00140
2nd row 00140
3rd row 00140
4th row 00140
5th row 00140

Letter

Count 611827
Lowercase Letter 0
Space Separator 0
Uppercase Letter 611827
Dash Punctuation 0
Decimal Number 13210898
  • SUPPORTING_SERVICE_CODE contains many words: 1466 words

SUPPORTING_SERVICE_CODE_DESCRIPTION

categorical

Approximate Distinct Count 1422
Approximate Unique (%) 0.1%
Missing 1488
Missing (%) 0.1%
Memory Size 319.9 MB

Length

Mean 56.909
Standard Deviation 25.7908
Median 68
Minimum 20
Maximum 254

Sample

1st row Anesthesia for pro...
2nd row Anesthesia for pro...
3rd row Anesthesia for pro...
4th row Anesthesia for pro...
5th row Anesthesia for pro...

Letter

Count 113305157
Lowercase Letter 99544395
Space Separator 20404362
Uppercase Letter 13760762
Dash Punctuation 476821
Decimal Number 14115140
  • SUPPORTING_SERVICE_CODE_DESCRIPTION contains many words: 3279 words
  • The largest value (cpt) is over 3.56 times larger than the second largest value (hcpcs)

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 291.4 MB

Length

Mean 46
Standard Deviation 0
Median 46
Minimum 46
Maximum 46

Sample

1st row northern-regional-...
2nd row northern-regional-...
3rd row northern-regional-...
4th row northern-regional-...
5th row northern-regional-...

Letter

Count 115622850
Lowercase Letter 115622850
Space Separator 0
Uppercase Letter 0
Dash Punctuation 5505850
Decimal Number 0
  • Filename has words of constant length

Interactions

Correlations

Missing Values