Overview

Dataset Statistics

Number of Variables 6
Number of Rows 85199
Missing Cells 1113
Missing Cells (%) 0.2%
Duplicate Rows 40
Duplicate Rows (%) 0.0%
Total Size in Memory 32.0 MB
Average Row Size in Memory 393.3 B
Variable Types
  • Categorical: 5
  • Numerical: 1

Dataset Insights

Discounted Cash Price (Gross Charges) is skewed Skewed
Procedure ID has a high cardinality: 84970 distinct values High Cardinality
HCPCS/CPT Code has a high cardinality: 1648 distinct values High Cardinality
Description has a high cardinality: 75232 distinct values High Cardinality
Gross Charge has a high cardinality: 11494 distinct values High Cardinality
Filename has constant value "832053115_angel-medical-center_standardcharges.csv" Constant
Filename has constant length 50 Constant Length

Variables

Procedure ID

categorical

Approximate Distinct Count 84970
Approximate Unique (%) 99.7%
Missing 0
Missing (%) 0.0%
Memory Size 5.8 MB

Length

Mean 6.0158
Standard Deviation 1.5921
Median 6
Minimum 2
Maximum 115

Sample

1st row 297
2nd row 317
3rd row 334
4th row 374
5th row 380

Letter

Count 7193
Lowercase Letter 5892
Space Separator 719
Uppercase Letter 1301
Dash Punctuation 41
Decimal Number 504447
  • Procedure ID contains many words: 85005 words

HCPCS/CPT Code

categorical

Approximate Distinct Count 1648
Approximate Unique (%) 1.9%
Missing 352
Missing (%) 0.4%
Memory Size 6.4 MB
  • The largest value (0C1713 ) is over 1.61 times larger than the second largest value (0C1776 )

Length

Mean 14.0381
Standard Deviation 4.357
Median 14
Minimum 4
Maximum 815

Sample

1st row 0J0130
2nd row
3rd row
4th row
5th row

Letter

Count 66513
Lowercase Letter 175
Space Separator 788930
Uppercase Letter 66338
Dash Punctuation 200
Decimal Number 335088
  • The top 2 categories (0C1713 , 0C1776 ) take over 50.0%
  • HCPCS/CPT Code contains many words: 1796 words
  • The largest value (0c1713) is over 1.61 times larger than the second largest value (0c1776)

Description

categorical

Approximate Distinct Count 75232
Approximate Unique (%) 88.3%
Missing 30
Missing (%) 0.0%
Memory Size 7.2 MB
  • The largest value (44.67% of BC) is over 1.68 times larger than the second largest value (OMNIFIT SERIES I ACET. I)

Length

Mean 23.9299
Standard Deviation 0.9932
Median 24
Minimum 3
Maximum 24

Sample

1st row ABCIXIMAB 10 MG IN...
2nd row ACARBOSE 50MG TAB ...
3rd row ACEBUTOLOL 200MG T...
4th row ACETAMIN/COD 12.5M...
5th row ACETAMIN/COD 5ML L...

Letter

Count 1215131
Lowercase Letter 647
Space Separator 470184
Uppercase Letter 1214484
Dash Punctuation 6939
Decimal Number 302865
  • Description contains many words: 32970 words

Gross Charge

categorical

Approximate Distinct Count 11494
Approximate Unique (%) 13.5%
Missing 312
Missing (%) 0.4%
Memory Size 5.8 MB

Length

Mean 6.6492
Standard Deviation 0.8736
Median 7
Minimum 3
Maximum 13

Sample

1st row 9353.93
2nd row 5.40
3rd row 5.40
4th row 13.68
5th row 10.80

Letter

Count 323
Lowercase Letter 160
Space Separator 153
Uppercase Letter 163
Dash Punctuation 0
Decimal Number 478982
  • Gross Charge contains many words: 11494 words

Discounted Cash Price (Gross Charges)

numerical

Approximate Distinct Count 11442
Approximate Unique (%) 13.5%
Missing 419
Missing (%) 0.5%
Infinite 0
Infinite (%) 0.0%
Memory Size 1.3 MB
Mean 4639.7885
Minimum 0.05
Maximum 279731.65
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Discounted Cash Price (Gross Charges) is skewed right (γ1 = 6.1268)

Quantile Statistics

Minimum 0.05
5-th Percentile 31.16
Q1 448.6125
Median 2181.2
Q3 5671.12
95-th Percentile 17853.74
Maximum 279731.65
Range 279731.6
IQR 5222.5075

Descriptive Statistics

Mean 4639.7885
Standard Deviation 7432.3374
Variance 5.524e+07
Sum 3.9336e+08
Skewness 6.1268
Kurtosis 106.7097
Coefficient of Variation 1.6019
  • Discounted Cash Price (Gross Charges) is not normally distributed (p-value 4.9503761784581505e-24)
  • Discounted Cash Price (Gross Charges) has 6399 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.0%
Missing 0
Missing (%) 0.0%
Memory Size 9.3 MB

Length

Mean 50
Standard Deviation 0
Median 50
Minimum 50
Maximum 50

Sample

1st row 832053115_angel-me...
2nd row 832053115_angel-me...
3rd row 832053115_angel-me...
4th row 832053115_angel-me...
5th row 832053115_angel-me...

Letter

Count 3067164
Lowercase Letter 3067164
Space Separator 0
Uppercase Letter 0
Dash Punctuation 170398
Decimal Number 766791
  • Filename has words of constant length

Interactions

Correlations

Missing Values