Overview

Dataset Statistics

Number of Variables 21
Number of Rows 3.6853e+06
Missing Cells 0
Missing Cells (%) 0.0%
Duplicate Rows 824529
Duplicate Rows (%) 22.4%
Total Size in Memory 4.8 GB
Average Row Size in Memory 1.4 KB
Variable Types
  • Categorical: 21

Dataset Insights

Dataset has 824529 (22.37%) duplicate rows Duplicates
CPT/MS-DRG has a high cardinality: 43737 distinct values High Cardinality
PROCEDURE DESCRIPTION has a high cardinality: 464108 distinct values High Cardinality
GROSS CHARGE has a high cardinality: 148403 distinct values High Cardinality
SELF PAY has a high cardinality: 104879 distinct values High Cardinality
DE-IDENTIFIED MAXIMUM has a high cardinality: 196177 distinct values High Cardinality
DE-IDENTIFIED MINIMUM has a high cardinality: 155443 distinct values High Cardinality
AETNA has a high cardinality: 155117 distinct values High Cardinality
AETNA MEDICARE has a high cardinality: 74375 distinct values High Cardinality
BCBS has a high cardinality: 222110 distinct values High Cardinality
BCBS MEDICARE has a high cardinality: 65541 distinct values High Cardinality
HUMANA has a high cardinality: 104349 distinct values High Cardinality
HUMANA MEDICARE has a high cardinality: 62512 distinct values High Cardinality
CIGNA has a high cardinality: 194097 distinct values High Cardinality
CIGNA MEDICARE has a high cardinality: 9452 distinct values High Cardinality
MEDCOST has a high cardinality: 167413 distinct values High Cardinality
TRICARE has a high cardinality: 35910 distinct values High Cardinality
UHC has a high cardinality: 165480 distinct values High Cardinality
UHC MEDICARE has a high cardinality: 67629 distinct values High Cardinality
  • 1
  • 2

Variables

CPT/MS-DRG

categorical

Approximate Distinct Count 43737
Approximate Unique (%) 1.2%
Missing 0
Missing (%) 0.0%
Memory Size 258.0 MB
  • The largest value (-1) is over 3.78 times larger than the second largest value (HCPCS C1713)

Length

Mean 5.6839
Standard Deviation 4.1029
Median 7
Minimum 1
Maximum 143

Sample

1st row 1
2nd row 2
3rd row 3
4th row 4
5th row 5

Letter

Count 6579418
Lowercase Letter 42379
Space Separator 1148576
Uppercase Letter 6537039
Dash Punctuation 1669478
Decimal Number 11083808
  • The top 2 categories (-1, HCPCS C1713) take over 50.0%
  • CPT/MS-DRG contains many words: 25392 words
  • The largest value (1) is over 1.99 times larger than the second largest value (c1713)

PROCEDURE DESCRIPTION

categorical

Approximate Distinct Count 464108
Approximate Unique (%) 12.6%
Missing 0
Missing (%) 0.0%
Memory Size 364.1 MB
  • The largest value (-1) is over 651.92 times larger than the second largest value (IMPLANT BEARING DEPUY)

Length

Mean 38.571
Standard Deviation 37.7789
Median 81
Minimum 2
Maximum 16297

Sample

1st row 1 - HEART TRANSPLA...
2nd row 2 - HEART TRANSPLA...
3rd row 3 - ECMO OR TRACHE...
4th row 4 - TRACHEOSTOMY W...
5th row 5 - LIVER TRANSPLA...

Letter

Count 104902884
Lowercase Letter 9319978
Space Separator 20545034
Uppercase Letter 95582906
Dash Punctuation 1897248
Decimal Number 12568665
  • PROCEDURE DESCRIPTION contains many words: 145982 words
  • The largest value (mm) is over 2.18 times larger than the second largest value (1)

PATIENT TYPE

categorical

Approximate Distinct Count 13
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 304.9 MB
  • The largest value (Outpatient) is over 3.08 times larger than the second largest value (-1)

Length

Mean 21.7503
Standard Deviation 20.8956
Median 10
Minimum 2
Maximum 62

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 66311294
Lowercase Letter 53771206
Space Separator 10648
Uppercase Letter 12540088
Dash Punctuation 433208
Decimal Number 10155992
  • The largest value (outpatient) is over 3.39 times larger than the second largest value (inpatient)

SYSTEM

categorical

Approximate Distinct Count 12
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 248.1 MB

Length

Mean 5.5954
Standard Deviation 1.287
Median 6
Minimum 3
Maximum 8

Sample

1st row DUKE
2nd row DUKE
3rd row DUKE
4th row DUKE
5th row DUKE

Letter

Count 20620455
Lowercase Letter 0
Space Separator 0
Uppercase Letter 20620455
Dash Punctuation 0
Decimal Number 0
  • The top 2 categories (BAPTIST, CONE) take over 50.0%

GROSS CHARGE

categorical

Approximate Distinct Count 148403
Approximate Unique (%) 4.0%
Missing 0
Missing (%) 0.0%
Memory Size 252.1 MB

Length

Mean 6.7404
Standard Deviation 1.9382
Median 9
Minimum 1
Maximum 47

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 285
Lowercase Letter 173
Space Separator 1620102
Uppercase Letter 112
Dash Punctuation 24558
Decimal Number 18602514
  • GROSS CHARGE contains many words: 88400 words

SELF PAY

categorical

Approximate Distinct Count 104879
Approximate Unique (%) 2.8%
Missing 0
Missing (%) 0.0%
Memory Size 250.1 MB
  • The largest value (-1) is over 17.13 times larger than the second largest value (1752.45)

Length

Mean 6.1495
Standard Deviation 3.1255
Median 7
Minimum 1
Maximum 102

Sample

1st row -1
2nd row -1
3rd row Average payment of...
4th row Average payment of...
5th row -1

Letter

Count 105723
Lowercase Letter 97200
Space Separator 1123322
Uppercase Letter 8523
Dash Punctuation 431217
Decimal Number 17321399
  • SELF PAY contains many words: 89502 words
  • The largest value (1) is over 10.69 times larger than the second largest value (175245)

DE-IDENTIFIED MAXIMUM

categorical

Approximate Distinct Count 196177
Approximate Unique (%) 5.3%
Missing 0
Missing (%) 0.0%
Memory Size 259.8 MB
  • The largest value ( Included in DRG Rate ) is over 1.87 times larger than the second largest value (-1)

Length

Mean 8.9296
Standard Deviation 5.9446
Median 10
Minimum 1
Maximum 22

Sample

1st row -
2nd row -
3rd row 211,296
4th row 70,554
5th row -

Letter

Count 9132536
Lowercase Letter 6446496
Space Separator 3740890
Uppercase Letter 2686040
Dash Punctuation 288056
Decimal Number 15916185
  • DE-IDENTIFIED MAXIMUM contains many words: 159049 words

DE-IDENTIFIED MINIMUM

categorical

Approximate Distinct Count 155443
Approximate Unique (%) 4.2%
Missing 0
Missing (%) 0.0%
Memory Size 257.0 MB
  • The largest value ( Included in DRG Rate ) is over 1.87 times larger than the second largest value (-1)

Length

Mean 8.1267
Standard Deviation 6.1503
Median 8
Minimum 1
Maximum 22

Sample

1st row -
2nd row -
3rd row 103,190
4th row 62,876
5th row -

Letter

Count 9132614
Lowercase Letter 6446496
Space Separator 4645966
Uppercase Letter 2686118
Dash Punctuation 559624
Decimal Number 12869242
  • DE-IDENTIFIED MINIMUM contains many words: 123747 words

AETNA

categorical

Approximate Distinct Count 155117
Approximate Unique (%) 4.2%
Missing 0
Missing (%) 0.0%
Memory Size 250.2 MB
  • The largest value (-1) is over 4.92 times larger than the second largest value (0)

Length

Mean 6.1962
Standard Deviation 5.0162
Median 9
Minimum 1
Maximum 36

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 617296
Lowercase Letter 610544
Space Separator 1729373
Uppercase Letter 6752
Dash Punctuation 966672
Decimal Number 16237870
  • AETNA contains many words: 129956 words
  • The largest value (1) is over 4.92 times larger than the second largest value (0)

AETNA MEDICARE

categorical

Approximate Distinct Count 74375
Approximate Unique (%) 2.0%
Missing 0
Missing (%) 0.0%
Memory Size 247.0 MB
  • The largest value (-1) is over 2.12 times larger than the second largest value (0)

Length

Mean 5.274
Standard Deviation 6.169
Median 7
Minimum 1
Maximum 36

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 9025551
Lowercase Letter 0
Space Separator 1520146
Uppercase Letter 9025551
Dash Punctuation 1345823
Decimal Number 6352709
  • The top 2 categories (-1, 0) take over 50.0%
  • AETNA MEDICARE contains many words: 66061 words
  • The largest value (1) is over 2.13 times larger than the second largest value (0)

BCBS

categorical

Approximate Distinct Count 222110
Approximate Unique (%) 6.0%
Missing 0
Missing (%) 0.0%
Memory Size 250.6 MB
  • The largest value (-1) is over 8.95 times larger than the second largest value ( )

Length

Mean 6.3134
Standard Deviation 4.8778
Median 8
Minimum 1
Maximum 36

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 1111362
Lowercase Letter 417380
Space Separator 1479728
Uppercase Letter 693982
Dash Punctuation 1055076
Decimal Number 16619998
  • BCBS contains many words: 198461 words
  • The largest value (1) is over 26.2 times larger than the second largest value (in)

BCBS MEDICARE

categorical

Approximate Distinct Count 65541
Approximate Unique (%) 1.8%
Missing 0
Missing (%) 0.0%
Memory Size 246.6 MB
  • The largest value (-1) is over 2.28 times larger than the second largest value (0)

Length

Mean 5.1627
Standard Deviation 6.2442
Median 7
Minimum 1
Maximum 100

Sample

1st row -1
2nd row -1
3rd row Average payment of...
4th row Average payment of...
5th row -1

Letter

Count 9056511
Lowercase Letter 29376
Space Separator 1584059
Uppercase Letter 9027135
Dash Punctuation 1449854
Decimal Number 5798419
  • The top 2 categories (-1, 0) take over 50.0%
  • BCBS MEDICARE contains many words: 60365 words
  • The largest value (1) is over 2.28 times larger than the second largest value (0)

HUMANA

categorical

Approximate Distinct Count 104349
Approximate Unique (%) 2.8%
Missing 0
Missing (%) 0.0%
Memory Size 242.6 MB
  • The largest value (-1) is over 91.54 times larger than the second largest value ( ** )

Length

Mean 4.0183
Standard Deviation 2.7102
Median 6
Minimum 1
Maximum 18

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 32
Lowercase Letter 0
Space Separator 708194
Uppercase Letter 32
Dash Punctuation 2267820
Decimal Number 9869781
  • The top 2 categories (-1, ** ) take over 50.0%
  • HUMANA contains many words: 93459 words
  • The largest value (1) is over 182.52 times larger than the second largest value (308431)

HUMANA MEDICARE

categorical

Approximate Distinct Count 62512
Approximate Unique (%) 1.7%
Missing 0
Missing (%) 0.0%
Memory Size 238.5 MB
  • The largest value (-1) is over 3.97 times larger than the second largest value (0.0)

Length

Mean 2.8533
Standard Deviation 2.2332
Median 3
Minimum 1
Maximum 100

Sample

1st row -1
2nd row -1
3rd row Average payment of...
4th row -1
5th row -1

Letter

Count 37510
Lowercase Letter 36828
Space Separator 133860
Uppercase Letter 682
Dash Punctuation 2151995
Decimal Number 6803414
  • The top 2 categories (-1, 0.0) take over 50.0%
  • HUMANA MEDICARE contains many words: 59415 words
  • The largest value (1) is over 3.97 times larger than the second largest value (00)

CIGNA

categorical

Approximate Distinct Count 194097
Approximate Unique (%) 5.3%
Missing 0
Missing (%) 0.0%
Memory Size 249.2 MB
  • The largest value (-1) is over 24.7 times larger than the second largest value (INCLUDED IN CASE/ RATE)

Length

Mean 5.9085
Standard Deviation 3.8879
Median 8
Minimum 1
Maximum 39

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 784677
Lowercase Letter 287
Space Separator 884726
Uppercase Letter 784390
Dash Punctuation 994696
Decimal Number 15979836
  • CIGNA contains many words: 161638 words
  • The largest value (1) is over 24.25 times larger than the second largest value (case)

CIGNA MEDICARE

categorical

Approximate Distinct Count 9452
Approximate Unique (%) 0.3%
Missing 0
Missing (%) 0.0%
Memory Size 235.9 MB
  • The largest value (-1) is over 452.44 times larger than the second largest value ( * )

Length

Mean 2.1271
Standard Deviation 0.9486
Median 2
Minimum 2
Maximum 20

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 0
Lowercase Letter 0
Space Separator 124878
Uppercase Letter 0
Dash Punctuation 3607322
Decimal Number 3973844
  • The top 2 categories (-1, * ) take over 50.0%
  • CIGNA MEDICARE contains many words: 9279 words
  • The largest value (1) is over 2370.12 times larger than the second largest value (11544)

MEDCOST

categorical

Approximate Distinct Count 167413
Approximate Unique (%) 4.5%
Missing 0
Missing (%) 0.0%
Memory Size 251.3 MB
  • The largest value (-1) is over 42.91 times larger than the second largest value ( ** )

Length

Mean 6.4989
Standard Deviation 5.0408
Median 9
Minimum 1
Maximum 100

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 656301
Lowercase Letter 649372
Space Separator 1674620
Uppercase Letter 6929
Dash Punctuation 1058493
Decimal Number 17337335
  • MEDCOST contains many words: 145912 words
  • The largest value (1) is over 55.55 times larger than the second largest value (0)

TRICARE

categorical

Approximate Distinct Count 35910
Approximate Unique (%) 1.0%
Missing 0
Missing (%) 0.0%
Memory Size 236.6 MB
  • The largest value (-1) is over 9.5 times larger than the second largest value ( )

Length

Mean 2.3124
Standard Deviation 1.57
Median 2
Minimum 2
Maximum 100

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 28765
Lowercase Letter 28242
Space Separator 657991
Uppercase Letter 523
Dash Punctuation 3085226
Decimal Number 4424438
  • The top 2 categories (-1, ) take over 50.0%
  • TRICARE contains many words: 36185 words
  • The largest value (1) is over 1361.53 times larger than the second largest value (9888)

UHC

categorical

Approximate Distinct Count 165480
Approximate Unique (%) 4.5%
Missing 0
Missing (%) 0.0%
Memory Size 247.4 MB
  • The largest value (-1) is over 3.71 times larger than the second largest value ( )

Length

Mean 5.374
Standard Deviation 3.7271
Median 8
Minimum 1
Maximum 36

Sample

1st row -1
2nd row -1
3rd row -1
4th row -1
5th row -1

Letter

Count 1636600
Lowercase Letter 0
Space Separator 1057772
Uppercase Letter 1636600
Dash Punctuation 1009732
Decimal Number 13615087
  • UHC contains many words: 141790 words
  • The largest value (1) is over 8.73 times larger than the second largest value (in)

UHC MEDICARE

categorical

Approximate Distinct Count 67629
Approximate Unique (%) 1.8%
Missing 0
Missing (%) 0.0%
Memory Size 246.9 MB
  • The largest value (-1) is over 2.27 times larger than the second largest value (0)

Length

Mean 5.2309
Standard Deviation 6.3709
Median 7
Minimum 1
Maximum 100

Sample

1st row -1
2nd row -1
3rd row -1
4th row Average payment of...
5th row -1

Letter

Count 9079852
Lowercase Letter 52218
Space Separator 1519318
Uppercase Letter 9027634
Dash Punctuation 1449177
Decimal Number 6118411
  • The top 2 categories (-1, 0) take over 50.0%
  • UHC MEDICARE contains many words: 65007 words
  • The largest value (1) is over 2.27 times larger than the second largest value (0)

FILENAME

categorical

Approximate Distinct Count 35
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 334.8 MB
  • The largest value (-1) is over 1.8 times larger than the second largest value (Inpatient)

Length

Mean 30.267
Standard Deviation 28.6999
Median 10
Minimum 2
Maximum 77

Sample

1st row 56-2070036_DRaH_st...
2nd row 56-2070036_DRaH_st...
3rd row 56-2070036_DRaH_st...
4th row 56-2070036_DRaH_st...
5th row 56-2070036_DRaH_st...

Letter

Count 83973910
Lowercase Letter 69479365
Space Separator 0
Uppercase Letter 14494545
Dash Punctuation 8567052
Decimal Number 13873697
  • The largest value (1) is over 1.8 times larger than the second largest value (outpatient)

Missing Values