Overview

Dataset Statistics

Number of Variables 18
Number of Rows 273851
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 195695
Duplicate Rows (%) 71.5%
Total Size in Memory 305.8 MB
Average Row Size in Memory 1.1 KB
Variable Types
  • Categorical: 18

Dataset Insights

Dataset has 195695 (71.46%) duplicate rows Duplicates
CPT/MS-DRG has a high cardinality: 14442 distinct values High Cardinality
Procedure Description has a high cardinality: 762 distinct values High Cardinality
Gross Charge has a high cardinality: 33477 distinct values High Cardinality
AETNA MEDICARE has a high cardinality: 34370 distinct values High Cardinality
BCBS MEDICARE has a high cardinality: 34722 distinct values High Cardinality
HUMANA MEDICARE has a high cardinality: 35077 distinct values High Cardinality
UHC MEDICARE has a high cardinality: 35145 distinct values High Cardinality
AETNA has a high cardinality: 18439 distinct values High Cardinality
BCBS has a high cardinality: 43495 distinct values High Cardinality
CIGNA has a high cardinality: 43545 distinct values High Cardinality
Medcost has a high cardinality: 42525 distinct values High Cardinality
UHC has a high cardinality: 44180 distinct values High Cardinality
Tricare has a high cardinality: 34493 distinct values High Cardinality
Self Pay has a high cardinality: 19121 distinct values High Cardinality
De-identified Minimum has a high cardinality: 1356 distinct values High Cardinality
De-identified Maximum has a high cardinality: 47701 distinct values High Cardinality
system has constant value "DUKE" Constant
system has constant length 4 Constant Length
  • 1
  • 2

Variables

CPT/MS-DRG

categorical

Approximate Distinct Count 14442
Approximate Unique (%) 5.3%
Missing 0
Missing (%) 0.0%
Memory Size 18.6 MB
  • The largest value (C1713) is over 2.17 times larger than the second largest value (27800169)

Length

Mean 6.356
Standard Deviation 2.1337
Median 5
Minimum 1
Maximum 47

Sample

1st row 1
2nd row 2
3rd row 3
4th row 4
5th row 5

Letter

Count 178945
Lowercase Letter 0
Space Separator 7349
Uppercase Letter 178945
Dash Punctuation 24493
Decimal Number 1522477
  • CPT/MS-DRG contains many words: 14358 words
  • The largest value (c1713) is over 2.17 times larger than the second largest value (27800169)

Procedure Description

categorical

Approximate Distinct Count 762
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 17.6 MB
  • The largest value (-1) is over 90522.67 times larger than the second largest value (1 - HEART TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM WITH MCC)

Length

Mean 2.4614
Standard Deviation 5.3857
Median 2
Minimum 2
Maximum 148

Sample

1st row 1 - HEART TRANSPLA...
2nd row 2 - HEART TRANSPLA...
3rd row 3 - ECMO OR TRACHE...
4th row 4 - TRACHEOSTOMY W...
5th row 5 - LIVER TRANSPLA...

Letter

Count 101484
Lowercase Letter 0
Space Separator 18576
Uppercase Letter 101484
Dash Punctuation 273962
Decimal Number 278219
  • The top 2 categories (-1, 1 - HEART TRANSPLANT OR IMPLANT OF HEART ASSIST SYSTEM WITH MCC) take over 50.0%
  • Procedure Description contains many words: 1270 words
  • The largest value (1) is over 171.78 times larger than the second largest value (with)

Gross Charge

categorical

Approximate Distinct Count 33477
Approximate Unique (%) 12.2%
Missing 0
Missing (%) 0.0%
Memory Size 19.0 MB

Length

Mean 7.7979
Standard Deviation 1.8982
Median 10
Minimum 1
Maximum 12

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 0
Lowercase Letter 0
Space Separator 181196
Uppercase Letter 0
Dash Punctuation 2283
Decimal Number 1520011
  • Gross Charge contains many words: 19598 words
  • The largest value (71288) is over 1.53 times larger than the second largest value (476140)

AETNA MEDICARE

categorical

Approximate Distinct Count 34370
Approximate Unique (%) 12.6%
Missing 0
Missing (%) 0.0%
Memory Size 18.5 MB

Length

Mean 5.9844
Standard Deviation 1.1433
Median 6
Minimum 2
Maximum 10

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 2283
Decimal Number 1316910
  • AETNA MEDICARE contains many words: 34370 words

BCBS MEDICARE

categorical

Approximate Distinct Count 34722
Approximate Unique (%) 12.7%
Missing 0
Missing (%) 0.0%
Memory Size 18.6 MB

Length

Mean 6.1603
Standard Deviation 4.1571
Median 6
Minimum 2
Maximum 100

Sample

1st row Average payment of...
2nd row -1
3rd row Average payment of...
4th row -1
5th row Average payment of...

Letter

Count 29920
Lowercase Letter 29376
Space Separator 7072
Uppercase Letter 544
Dash Punctuation 1739
Decimal Number 1325021
  • BCBS MEDICARE contains many words: 35225 words

HUMANA MEDICARE

categorical

Approximate Distinct Count 35077
Approximate Unique (%) 12.8%
Missing 0
Missing (%) 0.0%
Memory Size 18.6 MB

Length

Mean 6.2197
Standard Deviation 4.6087
Median 6
Minimum 2
Maximum 100

Sample

1st row Average payment of...
2nd row Average payment of...
3rd row Average payment of...
4th row Average payment of...
5th row Average payment of...

Letter

Count 37510
Lowercase Letter 36828
Space Separator 8866
Uppercase Letter 682
Dash Punctuation 1601
Decimal Number 1330151
  • HUMANA MEDICARE contains many words: 35744 words

UHC MEDICARE

categorical

Approximate Distinct Count 35145
Approximate Unique (%) 12.8%
Missing 0
Missing (%) 0.0%
Memory Size 18.6 MB

Length

Mean 6.3049
Standard Deviation 5.4408
Median 6
Minimum 2
Maximum 100

Sample

1st row Average payment of...
2nd row -1
3rd row Average payment of...
4th row Average payment of...
5th row Average payment of...

Letter

Count 53185
Lowercase Letter 52218
Space Separator 12571
Uppercase Letter 967
Dash Punctuation 1316
Decimal Number 1332933
  • UHC MEDICARE contains many words: 36349 words

AETNA

categorical

Approximate Distinct Count 18439
Approximate Unique (%) 6.7%
Missing 0
Missing (%) 0.0%
Memory Size 18.8 MB
  • The largest value (440.56) is over 1.53 times larger than the second largest value (2,942.55)

Length

Mean 7.0119
Standard Deviation 1.3104
Median 8
Minimum 2
Maximum 12

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 2283
Decimal Number 1501743
  • AETNA contains many words: 18439 words
  • The largest value (44056) is over 1.53 times larger than the second largest value (294255)

BCBS

categorical

Approximate Distinct Count 43495
Approximate Unique (%) 15.9%
Missing 0
Missing (%) 0.0%
Memory Size 18.8 MB

Length

Mean 7.0276
Standard Deviation 1.3113
Median 8
Minimum 2
Maximum 12

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 2283
Decimal Number 1504555
  • BCBS contains many words: 43495 words

CIGNA

categorical

Approximate Distinct Count 43545
Approximate Unique (%) 15.9%
Missing 0
Missing (%) 0.0%
Memory Size 18.8 MB
  • The largest value (448.40) is over 1.82 times larger than the second largest value (-1)

Length

Mean 6.989
Standard Deviation 1.3152
Median 8
Minimum 2
Maximum 12

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 2283
Decimal Number 1497360
  • CIGNA contains many words: 43545 words
  • The largest value (44840) is over 1.82 times larger than the second largest value (1)

Medcost

categorical

Approximate Distinct Count 42525
Approximate Unique (%) 15.5%
Missing 0
Missing (%) 0.0%
Memory Size 18.8 MB

Length

Mean 6.9364
Standard Deviation 2.4741
Median 8
Minimum 2
Maximum 100

Sample

1st row -1
2nd row -1
3rd row Average payment of...
4th row -1
5th row -1

Letter

Count 8305
Lowercase Letter 8154
Space Separator 1963
Uppercase Letter 151
Dash Punctuation 2132
Decimal Number 1481008
  • Medcost contains many words: 42616 words

UHC

categorical

Approximate Distinct Count 44180
Approximate Unique (%) 16.1%
Missing 0
Missing (%) 0.0%
Memory Size 18.8 MB

Length

Mean 7.1357
Standard Deviation 1.3157
Median 8
Minimum 2
Maximum 12

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 0
Lowercase Letter 0
Space Separator 0
Uppercase Letter 0
Dash Punctuation 2283
Decimal Number 1524980
  • UHC contains many words: 44180 words

Tricare

categorical

Approximate Distinct Count 34493
Approximate Unique (%) 12.6%
Missing 0
Missing (%) 0.0%
Memory Size 18.6 MB

Length

Mean 6.1268
Standard Deviation 4.0744
Median 6
Minimum 2
Maximum 100

Sample

1st row Average payment of...
2nd row Average payment of...
3rd row Average payment of...
4th row Average payment of...
5th row Average payment of...

Letter

Count 28765
Lowercase Letter 28242
Space Separator 6799
Uppercase Letter 523
Dash Punctuation 1760
Decimal Number 1318701
  • Tricare contains many words: 35038 words

Self Pay

categorical

Approximate Distinct Count 19121
Approximate Unique (%) 7.0%
Missing 0
Missing (%) 0.0%
Memory Size 18.8 MB
  • The largest value (199.61) is over 1.53 times larger than the second largest value (1,333.19)

Length

Mean 7.1753
Standard Deviation 7.2944
Median 6
Minimum 2
Maximum 102

Sample

1st row Average payment of...
2nd row Average payment of...
3rd row Average payment of...
4th row Average payment of...
5th row Average payment of...

Letter

Count 99003
Lowercase Letter 97200
Space Separator 23400
Uppercase Letter 1803
Dash Punctuation 482
Decimal Number 1453908
  • Self Pay contains many words: 22216 words
  • The largest value (19961) is over 1.53 times larger than the second largest value (133319)

De-identified Minimum

categorical

Approximate Distinct Count 1356
Approximate Unique (%) 0.5%
Missing 0
Missing (%) 0.0%
Memory Size 18.3 MB
  • The largest value ( - ) is over 9074.57 times larger than the second largest value ( 5,358 )

Length

Mean 5.0124
Standard Deviation 0.1748
Median 5
Minimum 4
Maximum 9

Sample

1st row 82,344
2nd row 117,661
3rd row 26,986
4th row 52,237
5th row 82,932

Letter

Count 0
Lowercase Letter 0
Space Separator 1092176
Uppercase Letter 0
Dash Punctuation 272237
Decimal Number 6785
  • The top 2 categories ( - , 5,358 ) take over 50.0%
  • De-identified Minimum contains many words: 1355 words
  • The largest value (5358) is over 1.67 times larger than the second largest value (3139)

De-identified Maximum

categorical

Approximate Distinct Count 47701
Approximate Unique (%) 17.4%
Missing 0
Missing (%) 0.0%
Memory Size 19.0 MB

Length

Mean 7.7862
Standard Deviation 1.6562
Median 10
Minimum 1
Maximum 12

Sample

1st row 2,561,560
2nd row 748,301
3rd row 2,590,858
4th row 1,609,887
5th row 792,107

Letter

Count 0
Lowercase Letter 0
Space Separator 187100
Uppercase Letter 0
Dash Punctuation 669
Decimal Number 1518674
  • De-identified Maximum contains many words: 44952 words

Filename

categorical

Approximate Distinct Count 6
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 27.0 MB

Length

Mean 38.3336
Standard Deviation 0.4715
Median 38
Minimum 38
Maximum 39

Sample

1st row 56-2070036_DUH_sta...
2nd row 56-2070036_DUH_sta...
3rd row 56-2070036_DUH_sta...
4th row 56-2070036_DUH_sta...
5th row 56-2070036_DUH_sta...

Letter

Count 6663783
Lowercase Letter 5842230
Space Separator 0
Uppercase Letter 821553
Dash Punctuation 273851
Decimal Number 2464659
  • The top 2 categories (56-2070036_DUH_standardcharges_cdm.csv, 56-2070036_DRaH_standardcharges_cdm.csv) take over 50.0%

system

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 18.0 MB

Length

Mean 4
Standard Deviation 0
Median 4
Minimum 4
Maximum 4

Sample

1st row DUKE
2nd row DUKE
3rd row DUKE
4th row DUKE
5th row DUKE

Letter

Count 1095404
Lowercase Letter 0
Space Separator 0
Uppercase Letter 1095404
Dash Punctuation 0
Decimal Number 0
  • system has words of constant length

Missing Values