Overview

Dataset Statistics

Number of Variables 6
Number of Rows 83439
Missing Cells 754
Missing Cells (%) 0.2%
Duplicate Rows 37
Duplicate Rows (%) 0.0%
Total Size in Memory 31.9 MB
Average Row Size in Memory 400.4 B
Variable Types
  • Categorical: 5
  • Numerical: 1

Dataset Insights

Discounted Cash Price (Gross Charges) is skewed Skewed
Procedure ID has a high cardinality: 83289 distinct values High Cardinality
HCPCS/CPT Code has a high cardinality: 1631 distinct values High Cardinality
Description has a high cardinality: 74993 distinct values High Cardinality
Gross Charge has a high cardinality: 12237 distinct values High Cardinality
Filename has constant value "832048950_highlands-cashiers-hospital_standardcharges.csv" Constant
Filename has constant length 57 Constant Length

Variables

Procedure ID

categorical

Approximate Distinct Count 83289
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Memory Size 5.6 MB

Length

Mean 6.0023
Standard Deviation 1.5223
Median 6
Minimum 2
Maximum 115

Sample

1st row 297
2nd row 317
3rd row 334
4th row 374
5th row 380

Letter

Count 5276
Lowercase Letter 4315
Space Separator 574
Uppercase Letter 961
Dash Punctuation 36
Decimal Number 494828
  • Procedure ID contains many words: 83314 words

HCPCS/CPT Code

categorical

Approximate Distinct Count 1631
Approximate Unique (%) 2.0%
Missing 230
Missing (%) 0.3%
Memory Size 6.3 MB
  • The largest value (0C1713 ) is over 1.57 times larger than the second largest value (0C1776 )

Length

Mean 14.0373
Standard Deviation 4.3944
Median 14
Minimum 4
Maximum 815

Sample

1st row 0J0130
2nd row
3rd row
4th row
5th row

Letter

Count 64551
Lowercase Letter 156
Space Separator 777216
Uppercase Letter 64395
Dash Punctuation 195
Decimal Number 325738
  • The top 2 categories (0C1713 , 0C1776 ) take over 50.0%
  • HCPCS/CPT Code contains many words: 1773 words
  • The largest value (0c1713) is over 1.57 times larger than the second largest value (0c1776)

Description

categorical

Approximate Distinct Count 74993
Approximate Unique (%) 89.9%
Missing 26
Missing (%) 0.0%
Memory Size 7.1 MB

Length

Mean 23.9512
Standard Deviation 0.8253
Median 24
Minimum 3
Maximum 24

Sample

1st row ABCIXIMAB 10 MG IN...
2nd row ACARBOSE 50MG TAB ...
3rd row ACEBUTOLOL 200MG T...
4th row ACETAMIN/COD 12.5M...
5th row ACETAMIN/COD 5ML L...

Letter

Count 1196299
Lowercase Letter 439
Space Separator 455685
Uppercase Letter 1195860
Dash Punctuation 7242
Decimal Number 296733
  • Description contains many words: 33252 words
  • The largest value (screw) is over 1.65 times larger than the second largest value (bn)

Gross Charge

categorical

Approximate Distinct Count 12237
Approximate Unique (%) 14.7%
Missing 218
Missing (%) 0.3%
Memory Size 5.7 MB

Length

Mean 6.6577
Standard Deviation 0.8705
Median 7
Minimum 3
Maximum 13

Sample

1st row 9353.93
2nd row 5.40
3rd row 5.40
4th row 13.68
5th row 10.80

Letter

Count 178
Lowercase Letter 84
Space Separator 83
Uppercase Letter 94
Dash Punctuation 0
Decimal Number 470532
  • Gross Charge contains many words: 12237 words

Discounted Cash Price (Gross Charges)

numerical

Approximate Distinct Count 12190
Approximate Unique (%) 14.7%
Missing 280
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.3 MB
Mean 4764.7042
Minimum 0.05
Maximum 279731.65
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Discounted Cash Price (Gross Charges) is skewed right (γ1 = 5.9339)

Quantile Statistics

Minimum 0.05
5-th Percentile 31.16
Q1 457.68
Median 2181.2
Q3 5833.15
95-th Percentile 18711
Maximum 279731.65
Range 279731.6
IQR 5375.47

Descriptive Statistics

Mean 4764.7042
Standard Deviation 7567.0532
Variance 5.726e+07
Sum 3.9623e+08
Skewness 5.9339
Kurtosis 101.0593
Coefficient of Variation 1.5881
  • Discounted Cash Price (Gross Charges) is not normally distributed (p-value 5.13345586625414e-24)
  • Discounted Cash Price (Gross Charges) has 6771 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9.7 MB

Length

Mean 57
Standard Deviation 0
Median 57
Minimum 57
Maximum 57

Sample

1st row 832048950_highland...
2nd row 832048950_highland...
3rd row 832048950_highland...
4th row 832048950_highland...
5th row 832048950_highland...

Letter

Count 3587877
Lowercase Letter 3587877
Space Separator 0
Uppercase Letter 0
Dash Punctuation 166878
Decimal Number 750951
  • Filename has words of constant length

Interactions

Correlations

Missing Values