Overview

Dataset Statistics

Number of Variables 6
Number of Rows 15234
Missing Cells 164
Missing Cells (%) 0.2%
Duplicate Rows 6
Duplicate Rows (%) 0.0%
Total Size in Memory 5.8 MB
Average Row Size in Memory 400.7 B
Variable Types
  • Categorical: 5
  • Numerical: 1

Dataset Insights

Discounted Cash Price (Gross Charges) is skewed Skewed
Procedure ID has a high cardinality: 15195 distinct values High Cardinality
HCPCS/CPT Code has a high cardinality: 1906 distinct values High Cardinality
Description has a high cardinality: 10360 distinct values High Cardinality
Gross Charge has a high cardinality: 5638 distinct values High Cardinality
Filename has constant value "832047931_asheville-specialty-hospital_standardcharges.csv" Constant
Filename has constant length 58 Constant Length

Variables

Procedure ID

categorical

Approximate Distinct Count 15195
Approximate Unique (%) 99.7%
Missing 0
Missing (%) 0.0%
Memory Size 1.0 MB

Length

Mean 5.6472
Standard Deviation 0.9704
Median 6
Minimum 2
Maximum 39

Sample

1st row 297
2nd row 317
3rd row 334
4th row 374
5th row 380

Letter

Count 927
Lowercase Letter 767
Space Separator 66
Uppercase Letter 160
Dash Punctuation 0
Decimal Number 85028
  • Procedure ID contains many words: 15203 words
  • The largest value (other) is over 2.0 times larger than the second largest value (inpatient)

HCPCS/CPT Code

categorical

Approximate Distinct Count 1906
Approximate Unique (%) 12.6%
Missing 44
Missing (%) 0.3%
Memory Size 1.1 MB
  • The largest value ( ) is over 2.03 times larger than the second largest value (0C1713 )

Length

Mean 14.1919
Standard Deviation 10.1974
Median 14
Minimum 4
Maximum 815

Sample

1st row 0J0130
2nd row
3rd row
4th row
5th row

Letter

Count 8507
Lowercase Letter 56
Space Separator 152316
Uppercase Letter 8451
Dash Punctuation 190
Decimal Number 54284
  • The top 2 categories ( , 0C1713 ) take over 50.0%
  • HCPCS/CPT Code contains many words: 2031 words
  • The largest value (0c1713) is over 4.21 times larger than the second largest value (0v2788)

Description

categorical

Approximate Distinct Count 10360
Approximate Unique (%) 68.0%
Missing 7
Missing (%) 0.0%
Memory Size 1.3 MB

Length

Mean 23.935
Standard Deviation 0.9794
Median 24
Minimum 3
Maximum 24

Sample

1st row ABCIXIMAB 10 MG IN...
2nd row ACARBOSE 50MG TAB ...
3rd row ACEBUTOLOL 200MG T...
4th row ACETAMIN/COD 12.5M...
5th row ACETAMIN/COD 5ML L...

Letter

Count 229906
Lowercase Letter 93
Space Separator 92538
Uppercase Letter 229813
Dash Punctuation 854
Decimal Number 34091
  • Description contains many words: 8598 words

Gross Charge

categorical

Approximate Distinct Count 5638
Approximate Unique (%) 37.1%
Missing 44
Missing (%) 0.3%
Memory Size 1.0 MB
  • The largest value (411.69) is over 1.51 times larger than the second largest value (2020.00)

Length

Mean 6.1581
Standard Deviation 0.9768
Median 6
Minimum 3
Maximum 13

Sample

1st row 9353.93
2nd row 5.67
3rd row 5.67
4th row 13.68
5th row 10.80

Letter

Count 54
Lowercase Letter 24
Space Separator 22
Uppercase Letter 30
Dash Punctuation 0
Decimal Number 78250
  • Gross Charge contains many words: 5639 words
  • The largest value (41169) is over 1.51 times larger than the second largest value (202000)

Discounted Cash Price (Gross Charges)

numerical

Approximate Distinct Count 5620
Approximate Unique (%) 37.1%
Missing 69
Missing (%) 0.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 237.0 KB
Mean 2750.929
Minimum 0.02
Maximum 279731.65
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Discounted Cash Price (Gross Charges) is skewed right (γ1 = 11.4243)

Quantile Statistics

Minimum 0.02
5-th Percentile 5.67
Q1 123.06
Median 541.37
Q3 2020
95-th Percentile 10515
Maximum 279731.65
Range 279731.63
IQR 1896.94

Descriptive Statistics

Mean 2750.929
Standard Deviation 7795.1828
Variance 6.0765e+07
Sum 4.1718e+07
Skewness 11.4243
Kurtosis 237.8086
Coefficient of Variation 2.8337
  • Discounted Cash Price (Gross Charges) is not normally distributed (p-value 9.022609006911154e-25)
  • Discounted Cash Price (Gross Charges) has 2412 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 1.8 MB

Length

Mean 58
Standard Deviation 0
Median 58
Minimum 58
Maximum 58

Sample

1st row 832047931_ashevill...
2nd row 832047931_ashevill...
3rd row 832047931_ashevill...
4th row 832047931_ashevill...
5th row 832047931_ashevill...

Letter

Count 670296
Lowercase Letter 670296
Space Separator 0
Uppercase Letter 0
Dash Punctuation 30468
Decimal Number 137106
  • Filename has words of constant length

Interactions

Correlations

Missing Values