Overview

Dataset Statistics

Number of Variables 18
Number of Rows 713978
Missing Cells 1.0564e+06
Missing Cells (%) 8.2%
Duplicate Rows 12956
Duplicate Rows (%) 1.8%
Total Size in Memory 661.7 MB
Average Row Size in Memory 971.7 B
Variable Types
  • Categorical: 15
  • Numerical: 3

Dataset Insights

BCBS has 13256 (1.86%) missing values Missing
BCBS MEDICARE has 9560 (1.34%) missing values Missing
AETNA has 14017 (1.96%) missing values Missing
AETNA MEDICARE has 9560 (1.34%) missing values Missing
CIGNA has 13256 (1.86%) missing values Missing
CIGNA MEDICARE has 713978 (100.0%) missing values Missing
UHC has 14017 (1.96%) missing values Missing
UHC MEDICARE has 9561 (1.34%) missing values Missing
MEDCOST has 14017 (1.96%) missing values Missing
CPT/MS-DRG has 202968 (28.43%) missing values Missing
Rev Code has 13716 (1.92%) missing values Missing
Gross Charge has 13716 (1.92%) missing values Missing
Self Pay has 14118 (1.98%) missing values Missing
De-identified Minimum is skewed Skewed
De-identified Maximum is skewed Skewed
Gross Charge is skewed Skewed
Dataset has 12956 (1.81%) duplicate rows Duplicates
BCBS has a high cardinality: 86092 distinct values High Cardinality
BCBS MEDICARE has a high cardinality: 20286 distinct values High Cardinality
AETNA has a high cardinality: 22266 distinct values High Cardinality
AETNA MEDICARE has a high cardinality: 21015 distinct values High Cardinality
CIGNA has a high cardinality: 44590 distinct values High Cardinality
UHC has a high cardinality: 28617 distinct values High Cardinality
UHC MEDICARE has a high cardinality: 21140 distinct values High Cardinality
MEDCOST has a high cardinality: 27954 distinct values High Cardinality
CPT/MS-DRG has a high cardinality: 4641 distinct values High Cardinality
Rev Code has a high cardinality: 43134 distinct values High Cardinality
Procedure Description has a high cardinality: 43597 distinct values High Cardinality
Self Pay has a high cardinality: 16275 distinct values High Cardinality
system has constant value "VIDANT" Constant
Rev Code has constant length 7 Constant Length
system has constant length 6 Constant Length
CIGNA MEDICARE has all distinct values Unique
  • 1
  • 2
  • 3
  • 4

Variables

BCBS

categorical

Approximate Distinct Count 86092
Approximate Unique (%) 12.3%
Missing 13256
Missing (%) 1.9%
Memory Size 49.8 MB
  • The largest value (INCLUDED IN DRG RATE) is over 2.32 times larger than the second largest value (0)

Length

Mean 9.4586
Standard Deviation 5.1707
Median 8
Minimum 1
Maximum 36

Sample

1st row 456.09
2nd row 707.27
3rd row 456.09
4th row 687.44
5th row 313.975

Letter

Count 693964
Lowercase Letter 0
Space Separator 122413
Uppercase Letter 693964
Dash Punctuation 225
Decimal Number 5190916
  • BCBS contains many words: 80006 words

BCBS MEDICARE

categorical

Approximate Distinct Count 20286
Approximate Unique (%) 2.9%
Missing 9560
Missing (%) 1.3%
Memory Size 54.6 MB

Length

Mean 16.2646
Standard Deviation 5.6915
Median 20
Minimum 1
Maximum 36

Sample

1st row INCLUDED IN DRG RA...
2nd row INCLUDED IN DRG RA...
3rd row INCLUDED IN DRG RA...
4th row INCLUDED IN DRG RA...
5th row INCLUDED IN DRG RA...

Letter

Count 9026591
Lowercase Letter 0
Space Separator 1452109
Uppercase Letter 9026591
Dash Punctuation 135
Decimal Number 852584
  • The top 2 categories (INCLUDED IN DRG RATE, NOT PAID SEPARATELY) take over 50.0%
  • BCBS MEDICARE contains many words: 18989 words

AETNA

categorical

Approximate Distinct Count 22266
Approximate Unique (%) 3.2%
Missing 14017
Missing (%) 2.0%
Memory Size 48.1 MB
  • The largest value (0) is over 2.44 times larger than the second largest value (1800)

Length

Mean 7.0777
Standard Deviation 4.8475
Median 6
Minimum 1
Maximum 36

Sample

1st row 565.8
2nd row 877.4
3rd row 565.8
4th row 852.8
5th row 389.5

Letter

Count 6720
Lowercase Letter 0
Space Separator 1440
Uppercase Letter 6720
Dash Punctuation 240
Decimal Number 4386815
  • AETNA contains many words: 20553 words
  • The largest value (0) is over 2.37 times larger than the second largest value (1800)

AETNA MEDICARE

categorical

Approximate Distinct Count 21015
Approximate Unique (%) 3.0%
Missing 9560
Missing (%) 1.3%
Memory Size 54.9 MB

Length

Mean 16.7421
Standard Deviation 5.0407
Median 20
Minimum 1
Maximum 36

Sample

1st row INCLUDED IN DRG RA...
2nd row INCLUDED IN DRG RA...
3rd row INCLUDED IN DRG RA...
4th row INCLUDED IN DRG RA...
5th row INCLUDED IN DRG RA...

Letter

Count 9025551
Lowercase Letter 0
Space Separator 1452020
Uppercase Letter 9025551
Dash Punctuation 135
Decimal Number 1184653
  • The top 2 categories (INCLUDED IN DRG RATE, NOT PAID SEPARATELY) take over 50.0%
  • AETNA MEDICARE contains many words: 19876 words

CIGNA

categorical

Approximate Distinct Count 44590
Approximate Unique (%) 6.4%
Missing 13256
Missing (%) 1.9%
Memory Size 49.6 MB
  • The largest value (INCLUDED IN CASE/ RATE) is over 2.32 times larger than the second largest value (0)

Length

Mean 9.2379
Standard Deviation 5.977
Median 7
Minimum 1
Maximum 39

Sample

1st row 512.67
2nd row 795.01
3rd row 512.67
4th row 772.72
5th row 352.925

Letter

Count 748501
Lowercase Letter 287
Space Separator 127389
Uppercase Letter 748214
Dash Punctuation 237
Decimal Number 4970978
  • CIGNA contains many words: 41330 words

CIGNA MEDICARE

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 46.3 MB

Length

Mean 3
Standard Deviation 0
Median 3
Minimum 3
Maximum 3

Sample

1st row nan
2nd row nan
3rd row nan
4th row nan
5th row nan

Letter

Count 2141934
Lowercase Letter 2141934
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 0
  • CIGNA MEDICARE has words of constant length

UHC

categorical

Approximate Distinct Count 28617
Approximate Unique (%) 4.1%
Missing 14017
Missing (%) 2.0%
Memory Size 49.4 MB
  • The largest value (INCLUDED IN RATE) is over 7.68 times larger than the second largest value (0)

Length

Mean 8.9371
Standard Deviation 5.5059
Median 17
Minimum 1
Maximum 36

Sample

1st row 607.2
2nd row 941.6
3rd row 607.2
4th row 915.2
5th row 418.0

Letter

Count 1623524
Lowercase Letter 0
Space Separator 347927
Uppercase Letter 1623524
Dash Punctuation 195
Decimal Number 3772862
  • UHC contains many words: 26334 words

UHC MEDICARE

categorical

Approximate Distinct Count 21140
Approximate Unique (%) 3.0%
Missing 9561
Missing (%) 1.3%
Memory Size 54.9 MB

Length

Mean 16.7402
Standard Deviation 5.1131
Median 20
Minimum 1
Maximum 36

Sample

1st row INCLUDED IN DRG RA...
2nd row INCLUDED IN DRG RA...
3rd row INCLUDED IN DRG RA...
4th row INCLUDED IN DRG RA...
5th row INCLUDED IN DRG RA...

Letter

Count 9026667
Lowercase Letter 0
Space Separator 1452113
Uppercase Letter 9026667
Dash Punctuation 135
Decimal Number 1182607
  • The top 2 categories (INCLUDED IN DRG RATE, NOT PAID SEPARATELY) take over 50.0%
  • UHC MEDICARE contains many words: 20032 words

MEDCOST

categorical

Approximate Distinct Count 27954
Approximate Unique (%) 4.0%
Missing 14017
Missing (%) 2.0%
Memory Size 48.5 MB
  • The largest value (0) is over 2.46 times larger than the second largest value (1600)

Length

Mean 7.6983
Standard Deviation 5.1734
Median 6
Minimum 1
Maximum 36

Sample

1st row 517.5
2nd row 802.5
3rd row 517.5
4th row 780.0
5th row 356.25

Letter

Count 6720
Lowercase Letter 0
Space Separator 1440
Uppercase Letter 6720
Dash Punctuation 240
Decimal Number 4820631
  • MEDCOST contains many words: 25769 words
  • The largest value (0) is over 2.44 times larger than the second largest value (1600)

De-identified Minimum

numerical

Approximate Distinct Count 70342
Approximate Unique (%) 9.9%
Missing 298
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 10.9 MB
Mean 1867.8695
Minimum 0
Maximum 469129.8873
Zeros 30484
Zeros (%) 4.3%
Negatives 0
Negatives (%) 0.0%
  • De-identified Minimum is skewed right (γ1 = 17.2547)

Quantile Statistics

Minimum 0
5-th Percentile 6.97
Q1 93.21
Median 418
Q3 1477.06
95-th Percentile 8531.25
Maximum 469129.8873
Range 469129.8873
IQR 1383.85

Descriptive Statistics

Mean 1867.8695
Standard Deviation 6184.9261
Variance 3.8253e+07
Sum 1.3331e+09
Skewness 17.2547
Kurtosis 647.251
Coefficient of Variation 3.3112
  • De-identified Minimum is not normally distributed (p-value 4.376952109830472e-25)
  • De-identified Minimum has 78349 outliers

De-identified Maximum

numerical

Approximate Distinct Count 44514
Approximate Unique (%) 6.2%
Missing 298
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 10.9 MB
Mean 3595.6889
Minimum 0
Maximum 724724.92
Zeros 21155
Zeros (%) 3.0%
Negatives 0
Negatives (%) 0.0%
  • De-identified Maximum is skewed right (γ1 = 12.9684)

Quantile Statistics

Minimum 0
5-th Percentile 31.9967
Q1 217.35
Median 973.36
Q3 3087.5
95-th Percentile 15701.4
Maximum 724724.92
Range 724724.92
IQR 2870.15

Descriptive Statistics

Mean 3595.6889
Standard Deviation 9727.7321
Variance 9.4629e+07
Sum 2.5662e+09
Skewness 12.9684
Kurtosis 415.8919
Coefficient of Variation 2.7054
  • De-identified Maximum is not normally distributed (p-value 4.44324908493944e-25)
  • De-identified Maximum has 74700 outliers

CPT/MS-DRG

categorical

Approximate Distinct Count 4641
Approximate Unique (%) 0.9%
Missing 202968
Missing (%) 28.4%
Memory Size 34.1 MB
  • The largest value (C1713) is over 7.37 times larger than the second largest value (J3490)

Length

Mean 5.0001
Standard Deviation 0.01187
Median 5
Minimum 5
Maximum 7

Sample

1st row 10004
2nd row 10005
3rd row 10006
4th row 10021
5th row 10021

Letter

Count 473402
Lowercase Letter 18
Space Separator 36
Uppercase Letter 473384
Dash Punctuation 0
Decimal Number 2081648
  • CPT/MS-DRG contains many words: 4639 words
  • The largest value (c1713) is over 7.37 times larger than the second largest value (j3490)

Rev Code

categorical

Approximate Distinct Count 43134
Approximate Unique (%) 6.2%
Missing 13716
Missing (%) 1.9%
Memory Size 48.1 MB

Length

Mean 7
Standard Deviation 0
Median 7
Minimum 7
Maximum 7

Sample

1st row 1101000
2nd row 1101003
3rd row 1121000
4th row 1140000
5th row 1711000

Letter

Count 414
Lowercase Letter 0
Space Separator 0
Uppercase Letter 414
Dash Punctuation 0
Decimal Number 4901420
  • Rev Code contains many words: 43134 words
  • Rev Code has words of constant length

Procedure Description

categorical

Approximate Distinct Count 43597
Approximate Unique (%) 6.1%
Missing 18
Missing (%) 0.0%
Memory Size 65.2 MB

Length

Mean 30.7972
Standard Deviation 5.999
Median 31
Minimum 3
Maximum 118

Sample

1st row HB-MEDICAL GENERAL...
2nd row HB-MEDICAL GENERAL...
3rd row HB-LD/OB GENERAL R...
4th row HB-PSYCH ADULT PRI...
5th row HB-NEO I NEWBORN N...

Letter

Count 15579552
Lowercase Letter 0
Space Separator 2722952
Uppercase Letter 15579552
Dash Punctuation 760494
Decimal Number 2338058
  • Procedure Description contains many words: 28720 words
  • The largest value (hbscrew) is over 2.76 times larger than the second largest value (hbcath)

Gross Charge

numerical

Approximate Distinct Count 6395
Approximate Unique (%) 0.9%
Missing 13716
Missing (%) 1.9%
Infinite 0
Infinite (%) 0.0%
Memory Size 10.7 MB
Mean 3270.5624
Minimum 0
Maximum 324757
Zeros 20846
Zeros (%) 2.9%
Negatives 0
Negatives (%) 0.0%
  • Gross Charge is skewed right (γ1 = 8.9132)

Quantile Statistics

Minimum 0
5-th Percentile 32
Q1 220
Median 961
Q3 3116
95-th Percentile 13468
Maximum 324757
Range 324757
IQR 2896

Descriptive Statistics

Mean 3270.5624
Standard Deviation 8004.1176
Variance 6.4066e+07
Sum 2.2903e+09
Skewness 8.9132
Kurtosis 172.5416
Coefficient of Variation 2.4473
  • Gross Charge is not normally distributed (p-value 5.758100810687769e-25)
  • Gross Charge has 65848 outliers

Self Pay

categorical

Approximate Distinct Count 16275
Approximate Unique (%) 2.3%
Missing 14118
Missing (%) 2.0%
Memory Size 47.3 MB
  • The largest value (0) is over 2.21 times larger than the second largest value (1500)

Length

Mean 5.9252
Standard Deviation 3.0366
Median 6
Minimum 1
Maximum 36

Sample

1st row 517.5
2nd row 802.5
3rd row 517.5
4th row 780.0
5th row 356.25

Letter

Count 6720
Lowercase Letter 0
Space Separator 1440
Uppercase Letter 6720
Dash Punctuation 240
Decimal Number 3594962
  • Self Pay contains many words: 15306 words
  • The largest value (0) is over 2.12 times larger than the second largest value (1500)

Patient Type

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 50.7 MB

Length

Mean 9.5
Standard Deviation 0.5
Median 10
Minimum 9
Maximum 10

Sample

1st row INPATIENT
2nd row INPATIENT
3rd row INPATIENT
4th row INPATIENT
5th row INPATIENT

Letter

Count 6782791
Lowercase Letter 0
Space Separator 0
Uppercase Letter 6782791
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (INPATIENT, OUTPATIENT) take over 50.0%

system

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 48.3 MB

Length

Mean 6
Standard Deviation 0
Median 6
Minimum 6
Maximum 6

Sample

1st row VIDANT
2nd row VIDANT
3rd row VIDANT
4th row VIDANT
5th row VIDANT

Letter

Count 4283868
Lowercase Letter 0
Space Separator 0
Uppercase Letter 4283868
Dash Punctuation 0
Decimal Number 0
  • system has words of constant length

Interactions

Correlations

Missing Values