Overview

Dataset Statistics

Number of Variables 6
Number of Rows 91729
Missing Cells 1022
Missing Cells (%) 0.2%
Duplicate Rows 46
Duplicate Rows (%) 0.1%
Total Size in Memory 34.1 MB
Average Row Size in Memory 389.4 B
Variable Types
  • Categorical: 5
  • Numerical: 1

Dataset Insights

Discounted Cash Price (Gross Charges) is skewed Skewed
Procedure ID has a high cardinality: 91474 distinct values High Cardinality
HCPCS/CPT Code has a high cardinality: 2197 distinct values High Cardinality
Description has a high cardinality: 77707 distinct values High Cardinality
Gross Charge has a high cardinality: 12921 distinct values High Cardinality
Filename has constant value "832048706_mission-hospital_standardcharges.csv" Constant
Filename has constant length 46 Constant Length

Variables

Procedure ID

categorical

Approximate Distinct Count 91474
Approximate Unique (%) 99.7%
Missing 0
Missing (%) 0.0%
Memory Size 6.2 MB

Length

Mean 5.9845
Standard Deviation 1.4303
Median 6
Minimum 2
Maximum 115

Sample

1st row 297
2nd row 317
3rd row 334
4th row 374
5th row 380

Letter

Count 6155
Lowercase Letter 4988
Space Separator 607
Uppercase Letter 1167
Dash Punctuation 35
Decimal Number 542050
  • Procedure ID contains many words: 91513 words
  • The largest value (other) is over 1.59 times larger than the second largest value (outpatient)

HCPCS/CPT Code

categorical

Approximate Distinct Count 2197
Approximate Unique (%) 2.4%
Missing 291
Missing (%) 0.3%
Memory Size 6.9 MB
  • The largest value (0C1713 ) is over 1.58 times larger than the second largest value ( )

Length

Mean 14.0512
Standard Deviation 4.3446
Median 14
Minimum 4
Maximum 815

Sample

1st row 0J0130
2nd row
3rd row
4th row
5th row

Letter

Count 69966
Lowercase Letter 196
Space Separator 853226
Uppercase Letter 69770
Dash Punctuation 315
Decimal Number 360777
  • The top 2 categories (0C1713 , ) take over 50.0%
  • HCPCS/CPT Code contains many words: 2424 words
  • The largest value (0c1713) is over 1.63 times larger than the second largest value (0c1776)

Description

categorical

Approximate Distinct Count 77707
Approximate Unique (%) 84.8%
Missing 34
Missing (%) 0.0%
Memory Size 7.8 MB

Length

Mean 23.9379
Standard Deviation 0.9419
Median 24
Minimum 3
Maximum 24

Sample

1st row ABCIXIMAB 10 MG IN...
2nd row ACARBOSE 50MG TAB ...
3rd row ACEBUTOLOL 200MG T...
4th row ACETAMIN/COD 12.5M...
5th row ACETAMIN/COD 5ML L...

Letter

Count 1313253
Lowercase Letter 459
Space Separator 515661
Uppercase Letter 1312794
Dash Punctuation 7514
Decimal Number 312328
  • Description contains many words: 33874 words

Gross Charge

categorical

Approximate Distinct Count 12921
Approximate Unique (%) 14.1%
Missing 303
Missing (%) 0.3%
Memory Size 6.2 MB
  • The largest value (411.69) is over 1.53 times larger than the second largest value (3739.20)

Length

Mean 6.64
Standard Deviation 0.8633
Median 7
Minimum 3
Maximum 15

Sample

1st row 9353.93
2nd row 5.40
3rd row 5.40
4th row 13.68
5th row 10.80

Letter

Count 206
Lowercase Letter 106
Space Separator 89
Uppercase Letter 100
Dash Punctuation 0
Decimal Number 515256
  • Gross Charge contains many words: 12921 words
  • The largest value (41169) is over 1.53 times larger than the second largest value (373920)

Discounted Cash Price (Gross Charges)

numerical

Approximate Distinct Count 12851
Approximate Unique (%) 14.1%
Missing 394
Missing (%) 0.4%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.4 MB
Mean 4582.2157
Minimum 0.02
Maximum 279731.65
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Discounted Cash Price (Gross Charges) is skewed right (γ1 = 5.9891)

Quantile Statistics

Minimum 0.02
5-th Percentile 31.16
Q1 411.69
Median 2081.99
Q3 5635.91
95-th Percentile 18069.48
Maximum 279731.65
Range 279731.63
IQR 5224.22

Descriptive Statistics

Mean 4582.2157
Standard Deviation 7470.3188
Variance 5.5806e+07
Sum 4.1852e+08
Skewness 5.9891
Kurtosis 99.888
Coefficient of Variation 1.6303
  • Discounted Cash Price (Gross Charges) is not normally distributed (p-value 3.8659590007949094e-24)
  • Discounted Cash Price (Gross Charges) has 6940 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9.7 MB

Length

Mean 46
Standard Deviation 0
Median 46
Minimum 46
Maximum 46

Sample

1st row 832048706_mission-...
2nd row 832048706_mission-...
3rd row 832048706_mission-...
4th row 832048706_mission-...
5th row 832048706_mission-...

Letter

Count 3027057
Lowercase Letter 3027057
Space Separator 0
Uppercase Letter 0
Dash Punctuation 91729
Decimal Number 825561
  • Filename has words of constant length

Interactions

Correlations

Missing Values