Overview

Dataset Statistics

Number of Variables 6
Number of Rows 85833
Missing Cells 804
Missing Cells (%) 0.2%
Duplicate Rows 40
Duplicate Rows (%) 0.0%
Total Size in Memory 32.9 MB
Average Row Size in Memory 401.4 B
Variable Types
  • Categorical: 5
  • Numerical: 1

Dataset Insights

Discounted Cash Price (Gross Charges) is skewed Skewed
Procedure ID has a high cardinality: 85673 distinct values High Cardinality
HCPCS/CPT Code has a high cardinality: 1611 distinct values High Cardinality
Description has a high cardinality: 76007 distinct values High Cardinality
Gross Charge has a high cardinality: 11477 distinct values High Cardinality
Filename has constant value "832048759_blue-ridge-regional-hospital_standardcharges.csv" Constant
Filename has constant length 58 Constant Length

Variables

Procedure ID

categorical

Approximate Distinct Count 85673
Approximate Unique (%) 99.8%
Missing 0
Missing (%) 0.0%
Memory Size 5.8 MB

Length

Mean 6.0037
Standard Deviation 1.509
Median 6
Minimum 2
Maximum 115

Sample

1st row 297
2nd row 317
3rd row 334
4th row 374
5th row 380

Letter

Count 5505
Lowercase Letter 4496
Space Separator 598
Uppercase Letter 1009
Dash Punctuation 37
Decimal Number 509065
  • Procedure ID contains many words: 85706 words

HCPCS/CPT Code

categorical

Approximate Distinct Count 1611
Approximate Unique (%) 1.9%
Missing 248
Missing (%) 0.3%
Memory Size 6.5 MB
  • The largest value (0C1713 ) is over 1.64 times larger than the second largest value (0C1776 )

Length

Mean 14.033
Standard Deviation 4.3006
Median 14
Minimum 4
Maximum 815

Sample

1st row 0J0130
2nd row
3rd row
4th row
5th row

Letter

Count 67022
Lowercase Letter 164
Space Separator 795558
Uppercase Letter 66858
Dash Punctuation 190
Decimal Number 337953
  • The top 2 categories (0C1713 , 0C1776 ) take over 50.0%
  • HCPCS/CPT Code contains many words: 1737 words
  • The largest value (0c1713) is over 1.64 times larger than the second largest value (0c1776)

Description

categorical

Approximate Distinct Count 76007
Approximate Unique (%) 88.6%
Missing 27
Missing (%) 0.0%
Memory Size 7.3 MB

Length

Mean 23.9499
Standard Deviation 0.8353
Median 24
Minimum 3
Maximum 24

Sample

1st row ABCIXIMAB 10 MG IN...
2nd row ACARBOSE 50MG TAB ...
3rd row ACEBUTOLOL 200MG T...
4th row ACETAMIN/COD 12.5M...
5th row ACETAMIN/COD 5ML L...

Letter

Count 1226510
Lowercase Letter 477
Space Separator 473988
Uppercase Letter 1226033
Dash Punctuation 7098
Decimal Number 304259
  • Description contains many words: 33165 words
  • The largest value (screw) is over 1.51 times larger than the second largest value (bn)

Gross Charge

categorical

Approximate Distinct Count 11477
Approximate Unique (%) 13.4%
Missing 232
Missing (%) 0.3%
Memory Size 5.8 MB

Length

Mean 6.6444
Standard Deviation 0.8651
Median 7
Minimum 3
Maximum 13

Sample

1st row 9353.93
2nd row 5.40
3rd row 5.40
4th row 13.68
5th row 10.80

Letter

Count 199
Lowercase Letter 94
Space Separator 93
Uppercase Letter 105
Dash Punctuation 0
Decimal Number 482835
  • Gross Charge contains many words: 11475 words

Discounted Cash Price (Gross Charges)

numerical

Approximate Distinct Count 11435
Approximate Unique (%) 13.4%
Missing 297
Missing (%) 0.3%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.3 MB
Mean 4606.8703
Minimum 0.02
Maximum 279731.65
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Discounted Cash Price (Gross Charges) is skewed right (γ1 = 6.1553)

Quantile Statistics

Minimum 0.02
5-th Percentile 31.16
Q1 452.9475
Median 2181.2
Q3 5636.84
95-th Percentile 17820
Maximum 279731.65
Range 279731.63
IQR 5183.8925

Descriptive Statistics

Mean 4606.8703
Standard Deviation 7404.1009
Variance 5.4821e+07
Sum 3.9405e+08
Skewness 6.1553
Kurtosis 107.5218
Coefficient of Variation 1.6072
  • Discounted Cash Price (Gross Charges) is not normally distributed (p-value 4.6453250519350384e-24)
  • Discounted Cash Price (Gross Charges) has 6468 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 10.1 MB

Length

Mean 58
Standard Deviation 0
Median 58
Minimum 58
Maximum 58

Sample

1st row 832048759_blue-rid...
2nd row 832048759_blue-rid...
3rd row 832048759_blue-rid...
4th row 832048759_blue-rid...
5th row 832048759_blue-rid...

Letter

Count 3690819
Lowercase Letter 3690819
Space Separator 0
Uppercase Letter 0
Dash Punctuation 257499
Decimal Number 772497
  • Filename has words of constant length

Interactions

Correlations

Missing Values