Overview

Dataset Statistics

Number of Variables 6
Number of Rows 84658
Missing Cells 1070
Missing Cells (%) 0.2%
Duplicate Rows 40
Duplicate Rows (%) 0.0%
Total Size in Memory 32.6 MB
Average Row Size in Memory 403.4 B
Variable Types
  • Categorical: 5
  • Numerical: 1

Dataset Insights

Discounted Cash Price (Gross Charges) is skewed Skewed
Procedure ID has a high cardinality: 84443 distinct values High Cardinality
HCPCS/CPT Code has a high cardinality: 1705 distinct values High Cardinality
Description has a high cardinality: 74154 distinct values High Cardinality
Gross Charge has a high cardinality: 12446 distinct values High Cardinality
Filename has constant value "832048854_transylvania-regional-hospital_standardcharges.csv" Constant
Filename has constant length 60 Constant Length

Variables

Procedure ID

categorical

Approximate Distinct Count 84443
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Memory Size 5.7 MB

Length

Mean 6.0138
Standard Deviation 1.5994
Median 6
Minimum 2
Maximum 115

Sample

1st row 297
2nd row 317
3rd row 334
4th row 374
5th row 380

Letter

Count 6986
Lowercase Letter 5715
Space Separator 705
Uppercase Letter 1271
Dash Punctuation 45
Decimal Number 501255
  • Procedure ID contains many words: 84480 words

HCPCS/CPT Code

categorical

Approximate Distinct Count 1705
Approximate Unique (%) 2.0%
Missing 338
Missing (%) 0.4%
Memory Size 6.4 MB
  • The largest value (0C1713 ) is over 1.57 times larger than the second largest value (0C1776 )

Length

Mean 14.0384
Standard Deviation 4.3701
Median 14
Minimum 4
Maximum 815

Sample

1st row 0J0130
2nd row
3rd row
4th row
5th row

Letter

Count 66053
Lowercase Letter 164
Space Separator 783353
Uppercase Letter 65889
Dash Punctuation 201
Decimal Number 333749
  • The top 2 categories (0C1713 , 0C1776 ) take over 50.0%
  • HCPCS/CPT Code contains many words: 1854 words
  • The largest value (0c1713) is over 1.57 times larger than the second largest value (0c1776)

Description

categorical

Approximate Distinct Count 74154
Approximate Unique (%) 87.6%
Missing 28
Missing (%) 0.0%
Memory Size 7.2 MB
  • The largest value (49.67% of BC) is over 1.73 times larger than the second largest value (OMNIFIT SERIES I ACET. I)

Length

Mean 23.932
Standard Deviation 0.9762
Median 24
Minimum 3
Maximum 24

Sample

1st row ABCIXIMAB 10 MG IN...
2nd row ACARBOSE 50MG TAB ...
3rd row ACEBUTOLOL 200MG T...
4th row ACETAMIN/COD 12.5M...
5th row ACETAMIN/COD 5ML L...

Letter

Count 1207624
Lowercase Letter 609
Space Separator 469203
Uppercase Letter 1207015
Dash Punctuation 6961
Decimal Number 299208
  • Description contains many words: 32724 words
  • The largest value (screw) is over 1.51 times larger than the second largest value (bn)

Gross Charge

categorical

Approximate Distinct Count 12446
Approximate Unique (%) 14.8%
Missing 300
Missing (%) 0.4%
Memory Size 5.8 MB

Length

Mean 6.6588
Standard Deviation 0.8747
Median 7
Minimum 3
Maximum 13

Sample

1st row 9353.93
2nd row 5.40
3rd row 5.40
4th row 13.68
5th row 10.80

Letter

Count 311
Lowercase Letter 150
Space Separator 149
Uppercase Letter 161
Dash Punctuation 0
Decimal Number 476826
  • Gross Charge contains many words: 12445 words

Discounted Cash Price (Gross Charges)

numerical

Approximate Distinct Count 12393
Approximate Unique (%) 14.7%
Missing 404
Missing (%) 0.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.3 MB
Mean 4739.5344
Minimum 0.05
Maximum 279731.65
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Discounted Cash Price (Gross Charges) is skewed right (γ1 = 5.9559)

Quantile Statistics

Minimum 0.05
5-th Percentile 31.16
Q1 458.3225
Median 2181.2
Q3 5858.08
95-th Percentile 18562.5
Maximum 279731.65
Range 279731.6
IQR 5399.7575

Descriptive Statistics

Mean 4739.5344
Standard Deviation 7528.0936
Variance 5.6672e+07
Sum 3.9932e+08
Skewness 5.9559
Kurtosis 101.9016
Coefficient of Variation 1.5884
  • Discounted Cash Price (Gross Charges) is not normally distributed (p-value 5.2851406463300965e-24)
  • Discounted Cash Price (Gross Charges) has 6742 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 10.1 MB

Length

Mean 60
Standard Deviation 0
Median 60
Minimum 60
Maximum 60

Sample

1st row 832048854_transylv...
2nd row 832048854_transylv...
3rd row 832048854_transylv...
4th row 832048854_transylv...
5th row 832048854_transylv...

Letter

Count 3894268
Lowercase Letter 3894268
Space Separator 0
Uppercase Letter 0
Dash Punctuation 169316
Decimal Number 761922
  • Filename has words of constant length

Interactions

Correlations

Missing Values