Overview

Dataset Statistics

Number of Variables 22
Number of Rows 1.1811e+06
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 277909
Duplicate Rows (%) 23.5%
Total Size in Memory 1.5 GB
Average Row Size in Memory 1.3 KB
Variable Types
  • Categorical: 20
  • Numerical: 2

Dataset Insights

DRG is skewed Skewed
Rev Code is skewed Skewed
Dataset has 277909 (23.53%) duplicate rows Duplicates
CPT has a high cardinality: 4865 distinct values High Cardinality
NDC has a high cardinality: 9690 distinct values High Cardinality
Procedure Description has a high cardinality: 282592 distinct values High Cardinality
Gross Charge has a high cardinality: 50971 distinct values High Cardinality
Self Pay has a high cardinality: 48814 distinct values High Cardinality
De-identified Minimum has a high cardinality: 68992 distinct values High Cardinality
De-identified Maximum has a high cardinality: 80232 distinct values High Cardinality
Aetna has a high cardinality: 68267 distinct values High Cardinality
Aetna Medicare has a high cardinality: 4153 distinct values High Cardinality
BCBS has a high cardinality: 73648 distinct values High Cardinality
BCBS Medicare has a high cardinality: 3780 distinct values High Cardinality
Cigna has a high cardinality: 73479 distinct values High Cardinality
Humana has a high cardinality: 69622 distinct values High Cardinality
Humana Medicare has a high cardinality: 24051 distinct values High Cardinality
UHC has a high cardinality: 74063 distinct values High Cardinality
UHC Medicare has a high cardinality: 3933 distinct values High Cardinality
Medcost has a high cardinality: 70752 distinct values High Cardinality
system has constant value "BAPTIST" Constant
system has constant length 7 Constant Length
DRG has 1137174 (96.28%) negatives Negatives
  • 1
  • 2
  • 3

Variables

Patient Type

categorical

Approximate Distinct Count 2
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 84.5 MB
  • The largest value (Outpatient) is over 614.82 times larger than the second largest value (Inpatient)

Length

Mean 9.9984
Standard Deviation 0.04026
Median 10
Minimum 9
Maximum 10

Sample

1st row Inpatient
2nd row Inpatient
3rd row Inpatient
4th row Inpatient
5th row Inpatient

Letter

Count 11809422
Lowercase Letter 10628288
Space Separator 0
Uppercase Letter 1181134
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (Outpatient, Inpatient) take over 50.0%
  • The largest value (outpatient) is over 614.82 times larger than the second largest value (inpatient)

DRG

numerical

Approximate Distinct Count 726
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 18.0 MB
Mean -0.1788
Minimum -1
Maximum 989
Zeros 42053
Zeros (%) 3.6%
Negatives 1137174
Negatives (%) 96.3%
  • DRG is skewed right (γ1 = 32.1245)

Quantile Statistics

Minimum -1
5-th Percentile -1
Q1 -1
Median -1
Q3 -1
95-th Percentile -1
Maximum 989
Range 990
IQR 0

Descriptive Statistics

Mean -0.1788
Standard Deviation 22.301
Variance 497.3363
Sum -211235
Skewness 32.1245
Kurtosis 1108.084
Coefficient of Variation -124.6977
  • DRG is not normally distributed (p-value 4.226515732166182e-25)
  • DRG has 43960 outliers

Rev Code

numerical

Approximate Distinct Count 168
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Infinite 0
Infinite (%) 0.0%
Memory Size 18.0 MB
Mean 282.213
Minimum -1
Maximum 999
Zeros 0
Zeros (%) 0.0%
Negatives 1910
Negatives (%) 0.2%
  • Rev Code is skewed right (γ1 = 7.6492)

Quantile Statistics

Minimum -1
5-th Percentile 272
Q1 276
Median 278
Q3 278
95-th Percentile 278
Maximum 999
Range 1000
IQR 2

Descriptive Statistics

Mean 282.213
Standard Deviation 50.4172
Variance 2541.8982
Sum 3.3333e+08
Skewness 7.6492
Kurtosis 71.8931
Coefficient of Variation 0.1786
  • Rev Code is not normally distributed (p-value 4.284639025710412e-25)
  • Rev Code has 306285 outliers

CPT

categorical

Approximate Distinct Count 4865
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Memory Size 78.3 MB
  • The largest value (C1713) is over 2.61 times larger than the second largest value (-1)

Length

Mean 4.5142
Standard Deviation 1.1052
Median 5
Minimum 1
Maximum 5

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 967166
Lowercase Letter 0
Space Separator 0
Uppercase Letter 967166
Dash Punctuation 191280
Decimal Number 4173380
  • The top 2 categories (C1713, -1) take over 50.0%
  • CPT contains many words: 4865 words
  • The largest value (c1713) is over 2.61 times larger than the second largest value (1)

NDC

categorical

Approximate Distinct Count 9690
Approximate Unique (%) 0.8%
Missing 0
Missing (%) 0.0%
Memory Size 75.7 MB
  • The largest value (-1) is over 21878.92 times larger than the second largest value (00338-0049-04)

Length

Mean 2.2007
Standard Deviation 1.4722
Median 2
Minimum 2
Maximum 13

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 1
Lowercase Letter 0
Space Separator 0
Uppercase Letter 1
Dash Punctuation 1202689
Decimal Number 1396638
  • The top 2 categories (-1, 00338-0049-04) take over 50.0%
  • NDC contains many words: 9690 words
  • The largest value (1) is over 21878.92 times larger than the second largest value (00338004904)

Procedure Description

categorical

Approximate Distinct Count 282592
Approximate Unique (%) 23.9%
Missing 0
Missing (%) 0.0%
Memory Size 144.2 MB
  • The largest value (-1) is over 2569.58 times larger than the second largest value (GRAFT ENDOVASCULAR ZENITH H)

Length

Mean 63.0456
Standard Deviation 41.8615
Median 84
Minimum 2
Maximum 243

Sample

1st row HEART TRANSPLANT O...
2nd row HEART TRANSPLANT O...
3rd row ECMO OR TRACHEOSTO...
4th row TRACHEOSTOMY WITH ...
5th row SIMULTANEOUS PANCR...

Letter

Count 57248810
Lowercase Letter 535589
Space Separator 11506978
Uppercase Letter 56713221
Dash Punctuation 480071
Decimal Number 4665929
  • Procedure Description contains many words: 35030 words
  • The largest value (mm) is over 4.34 times larger than the second largest value (1)

Gross Charge

categorical

Approximate Distinct Count 50971
Approximate Unique (%) 4.3%
Missing 0
Missing (%) 0.0%
Memory Size 80.4 MB

Length

Mean 6.3661
Standard Deviation 1.5134
Median 8
Minimum 1
Maximum 18

Sample

1st row 703,522.14
2nd row 565,469.16
3rd row 550,057.02
4th row 406,331.11
5th row 386,239.81

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 6190234
  • Gross Charge contains many words: 35405 words

Self Pay

categorical

Approximate Distinct Count 48814
Approximate Unique (%) 4.1%
Missing 0
Missing (%) 0.0%
Memory Size 80.3 MB

Length

Mean 6.3146
Standard Deviation 1.6283
Median 7
Minimum 1
Maximum 18

Sample

1st row 351,761.07
2nd row 282,734.58
3rd row 275,028.51
4th row 203,165.56
5th row 193,119.91

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 0
Decimal Number 6215028
  • Self Pay contains many words: 40689 words
  • The largest value (175245) is over 2.1 times larger than the second largest value (189849)

De-identified Minimum

categorical

Approximate Distinct Count 68992
Approximate Unique (%) 5.8%
Missing 0
Missing (%) 0.0%
Memory Size 79.6 MB

Length

Mean 5.6502
Standard Deviation 1.3537
Median 6
Minimum 1
Maximum 12

Sample

1st row 83,135.00
2nd row 83,135.00
3rd row 123,645.40
4th row 78,121.58
5th row 37,029.33

Letter

Count 78
Lowercase Letter 0
Space Separator 0
Uppercase Letter 78
Dash Punctuation 475
Decimal Number 5530559
  • De-identified Minimum contains many words: 60376 words

De-identified Maximum

categorical

Approximate Distinct Count 80232
Approximate Unique (%) 6.8%
Missing 0
Missing (%) 0.0%
Memory Size 80.7 MB

Length

Mean 6.6414
Standard Deviation 1.6308
Median 8
Minimum 1
Maximum 13

Sample

1st row 597,993.82
2nd row 480,648.79
3rd row 467,548.47
4th row 345,381.44
5th row 328,303.84

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 475
Decimal Number 6277704
  • De-identified Maximum contains many words: 65529 words

Aetna

categorical

Approximate Distinct Count 68267
Approximate Unique (%) 5.8%
Missing 0
Missing (%) 0.0%
Memory Size 79.3 MB
  • The largest value (0) is over 15.99 times larger than the second largest value (1,987)

Length

Mean 5.4421
Standard Deviation 2.5763
Median 8
Minimum 1
Maximum 13

Sample

1st row 337,690.63
2nd row 271,425.20
3rd row 264,027.37
4th row 195,038.93
5th row 185,395.11

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 1
Decimal Number 5148691
  • Aetna contains many words: 64244 words
  • The largest value (0) is over 15.99 times larger than the second largest value (1987)

Aetna Medicare

categorical

Approximate Distinct Count 4153
Approximate Unique (%) 0.4%
Missing 0
Missing (%) 0.0%
Memory Size 75.5 MB

Length

Mean 2.0163
Standard Deviation 1.1313
Median 3
Minimum 1
Maximum 13

Sample

1st row 188,196.94
2nd row 104,567.95
3rd row 124,202.56
4th row 78,460.76
5th row 37,171.74

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 75
Decimal Number 1812973
  • The top 2 categories (0, 0.0) take over 50.0%
  • Aetna Medicare contains many words: 3871 words

BCBS

categorical

Approximate Distinct Count 73648
Approximate Unique (%) 6.2%
Missing 0
Missing (%) 0.0%
Memory Size 80.0 MB

Length

Mean 6.0296
Standard Deviation 1.4615
Median 7
Minimum 1
Maximum 18

Sample

1st row 396,417.73
2nd row 218,031.67
3rd row 222,255.00
4th row 309,810.00
5th row 74,271.18

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 2
Decimal Number 5891005
  • BCBS contains many words: 66700 words

BCBS Medicare

categorical

Approximate Distinct Count 3780
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 75.5 MB

Length

Mean 2.01
Standard Deviation 1.1262
Median 3
Minimum 1
Maximum 13

Sample

1st row 188,196.94
2nd row 104,567.95
3rd row 124,202.56
4th row 78,460.76
5th row 37,171.74

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 76
Decimal Number 1808071
  • The top 2 categories (0, 0.0) take over 50.0%
  • BCBS Medicare contains many words: 3523 words

Cigna

categorical

Approximate Distinct Count 73479
Approximate Unique (%) 6.2%
Missing 0
Missing (%) 0.0%
Memory Size 80.1 MB

Length

Mean 6.1393
Standard Deviation 1.4323
Median 7
Minimum 1
Maximum 18

Sample

1st row 402,616.28
2nd row 221,440.90
3rd row 263,977.68
4th row 164,881.80
5th row 75,432.52

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 2
Decimal Number 5989840
  • Cigna contains many words: 67467 words

Humana

categorical

Approximate Distinct Count 69622
Approximate Unique (%) 5.9%
Missing 0
Missing (%) 0.0%
Memory Size 79.7 MB
  • The largest value (-1) is over 22.73 times larger than the second largest value (3,084.31)

Length

Mean 5.779
Standard Deviation 2.297
Median 8
Minimum 1
Maximum 18

Sample

1st row 415,078.06
2nd row 333,626.80
3rd row 324,533.64
4th row 239,735.35
5th row 227,881.49

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 282448
Decimal Number 5421208
  • Humana contains many words: 66713 words
  • The largest value (1) is over 22.73 times larger than the second largest value (308431)

Humana Medicare

categorical

Approximate Distinct Count 24051
Approximate Unique (%) 2.0%
Missing 0
Missing (%) 0.0%
Memory Size 76.9 MB
  • The largest value (0.0) is over 1.62 times larger than the second largest value (0)

Length

Mean 3.2891
Standard Deviation 2.0737
Median 3
Minimum 1
Maximum 12

Sample

1st row 187,334.80
2nd row 104,104.36
3rd row 123,645.40
4th row 78,121.58
5th row 37,029.33

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 76
Decimal Number 2954929
  • The top 2 categories (0.0, 0) take over 50.0%
  • Humana Medicare contains many words: 23019 words
  • The largest value (00) is over 1.62 times larger than the second largest value (0)

UHC

categorical

Approximate Distinct Count 74063
Approximate Unique (%) 6.3%
Missing 0
Missing (%) 0.0%
Memory Size 80.1 MB

Length

Mean 6.1452
Standard Deviation 1.4555
Median 7
Minimum 1
Maximum 18

Sample

1st row 492,916.70
2nd row 330,262.08
3rd row 296,463.79
4th row 184,802.54
5th row 88,025.46

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 7
Decimal Number 5998031
  • UHC contains many words: 67755 words

UHC Medicare

categorical

Approximate Distinct Count 3933
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 75.5 MB

Length

Mean 2.0027
Standard Deviation 1.1173
Median 3
Minimum 1
Maximum 13

Sample

1st row 188,196.94
2nd row 104,567.95
3rd row 124,202.56
4th row 78,460.76
5th row 37,171.74

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 76
Decimal Number 1801196
  • The top 2 categories (0, 0.0) take over 50.0%
  • UHC Medicare contains many words: 3768 words

Medcost

categorical

Approximate Distinct Count 70752
Approximate Unique (%) 6.0%
Missing 0
Missing (%) 0.0%
Memory Size 80.1 MB
  • The largest value (-1) is over 3.44 times larger than the second largest value (2,509.51)

Length

Mean 6.0865
Standard Deviation 1.6044
Median 7
Minimum 1
Maximum 18

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 42772
Decimal Number 5908619
  • Medcost contains many words: 65694 words
  • The largest value (1) is over 3.44 times larger than the second largest value (250951)

Filename

categorical

Approximate Distinct Count 5
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 147.8 MB

Length

Mean 66.1969
Standard Deviation 7.0244
Median 68
Minimum 58
Maximum 77

Sample

1st row North-Carolina-Bap...
2nd row North-Carolina-Bap...
3rd row North-Carolina-Bap...
4th row North-Carolina-Bap...
5th row North-Carolina-Bap...

Letter

Count 58415015
Lowercase Letter 46568434
Space Separator 0
Uppercase Letter 11846581
Dash Punctuation 7122045
Decimal Number 9106936

system

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 81.1 MB

Length

Mean 7
Standard Deviation 0
Median 7
Minimum 7
Maximum 7

Sample

1st row BAPTIST
2nd row BAPTIST
3rd row BAPTIST
4th row BAPTIST
5th row BAPTIST

Letter

Count 8267938
Lowercase Letter 0
Space Separator 0
Uppercase Letter 8267938
Dash Punctuation 0
Decimal Number 0
  • system has words of constant length

Interactions

Correlations

Missing Values