Overview

Dataset Statistics

Number of Variables 6
Number of Rows 85988
Missing Cells 950
Missing Cells (%) 0.2%
Duplicate Rows 37
Duplicate Rows (%) 0.0%
Total Size in Memory 32.0 MB
Average Row Size in Memory 390.4 B
Variable Types
  • Categorical: 5
  • Numerical: 1

Dataset Insights

Discounted Cash Price (Gross Charges) is skewed Skewed
Procedure ID has a high cardinality: 85803 distinct values High Cardinality
HCPCS/CPT Code has a high cardinality: 1773 distinct values High Cardinality
Description has a high cardinality: 76157 distinct values High Cardinality
Gross Charge has a high cardinality: 11539 distinct values High Cardinality
Filename has constant value "832048888_mcdowell-hospital_standardcharges.csv" Constant
Filename has constant length 47 Constant Length

Variables

Procedure ID

categorical

Approximate Distinct Count 85803
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Memory Size 5.8 MB

Length

Mean 6.0084
Standard Deviation 1.5539
Median 6
Minimum 2
Maximum 115

Sample

1st row 297
2nd row 317
3rd row 334
4th row 374
5th row 380

Letter

Count 6304
Lowercase Letter 5159
Space Separator 653
Uppercase Letter 1145
Dash Punctuation 43
Decimal Number 509539
  • Procedure ID contains many words: 85830 words

HCPCS/CPT Code

categorical

Approximate Distinct Count 1773
Approximate Unique (%) 2.1%
Missing 304
Missing (%) 0.4%
Memory Size 6.5 MB
  • The largest value (0C1713 ) is over 1.64 times larger than the second largest value (0C1776 )

Length

Mean 14.0331
Standard Deviation 4.2981
Median 14
Minimum 4
Maximum 815

Sample

1st row 0J0130
2nd row
3rd row
4th row
5th row

Letter

Count 67445
Lowercase Letter 156
Space Separator 796016
Uppercase Letter 67289
Dash Punctuation 196
Decimal Number 338464
  • The top 2 categories (0C1713 , 0C1776 ) take over 50.0%
  • HCPCS/CPT Code contains many words: 1899 words
  • The largest value (0c1713) is over 1.64 times larger than the second largest value (0c1776)

Description

categorical

Approximate Distinct Count 76157
Approximate Unique (%) 88.6%
Missing 25
Missing (%) 0.0%
Memory Size 7.3 MB
  • The largest value (57% of BC) is over 1.7 times larger than the second largest value (OMNIFIT SERIES I ACET. I)

Length

Mean 23.9387
Standard Deviation 0.9333
Median 24
Minimum 3
Maximum 24

Sample

1st row ABCIXIMAB 10 MG IN...
2nd row ACARBOSE 50MG TAB ...
3rd row ACEBUTOLOL 200MG T...
4th row ACETAMIN/COD 12.5M...
5th row ACETAMIN/COD 5ML L...

Letter

Count 1228991
Lowercase Letter 579
Space Separator 473946
Uppercase Letter 1228412
Dash Punctuation 7137
Decimal Number 304526
  • Description contains many words: 33209 words
  • The largest value (screw) is over 1.51 times larger than the second largest value (bn)

Gross Charge

categorical

Approximate Distinct Count 11539
Approximate Unique (%) 13.5%
Missing 264
Missing (%) 0.3%
Memory Size 5.9 MB

Length

Mean 6.6448
Standard Deviation 0.8641
Median 7
Minimum 3
Maximum 13

Sample

1st row 9353.93
2nd row 5.40
3rd row 5.40
4th row 13.68
5th row 10.80

Letter

Count 297
Lowercase Letter 140
Space Separator 142
Uppercase Letter 157
Dash Punctuation 0
Decimal Number 483415
  • Gross Charge contains many words: 11537 words

Discounted Cash Price (Gross Charges)

numerical

Approximate Distinct Count 11498
Approximate Unique (%) 13.4%
Missing 357
Missing (%) 0.4%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.3 MB
Mean 4606.1086
Minimum 0.05
Maximum 279731.65
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Discounted Cash Price (Gross Charges) is skewed right (γ1 = 6.1577)

Quantile Statistics

Minimum 0.05
5-th Percentile 31.16
Q1 448.7
Median 2181.2
Q3 5636.84
95-th Percentile 17820
Maximum 279731.65
Range 279731.6
IQR 5188.14

Descriptive Statistics

Mean 4606.1086
Standard Deviation 7412.365
Variance 5.4943e+07
Sum 3.9443e+08
Skewness 6.1577
Kurtosis 107.2333
Coefficient of Variation 1.6092
  • Discounted Cash Price (Gross Charges) is not normally distributed (p-value 4.620182803759254e-24)
  • Discounted Cash Price (Gross Charges) has 6479 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9.2 MB

Length

Mean 47
Standard Deviation 0
Median 47
Minimum 47
Maximum 47

Sample

1st row 832048888_mcdowell...
2nd row 832048888_mcdowell...
3rd row 832048888_mcdowell...
4th row 832048888_mcdowell...
5th row 832048888_mcdowell...

Letter

Count 2923592
Lowercase Letter 2923592
Space Separator 0
Uppercase Letter 0
Dash Punctuation 85988
Decimal Number 773892
  • Filename has words of constant length

Interactions

Correlations

Missing Values