Overview

Dataset Statistics

Number of Variables 6
Number of Rows 1964
Missing Cells 422
Missing Cells (%) 3.6%
Duplicate Rows 20
Duplicate Rows (%) 1.0%
Total Size in Memory 774.4 KB
Average Row Size in Memory 403.8 B
Variable Types
  • Categorical: 5
  • Numerical: 1

Dataset Insights

HCPCS/CPT Code has 124 (6.31%) missing values Missing
Gross Charge has 118 (6.01%) missing values Missing
Discounted Cash Price (Gross Charges) has 162 (8.25%) missing values Missing
Discounted Cash Price (Gross Charges) is skewed Skewed
Dataset has 20 (1.02%) duplicate rows Duplicates
Procedure ID has a high cardinality: 1868 distinct values High Cardinality
HCPCS/CPT Code has a high cardinality: 733 distinct values High Cardinality
Description has a high cardinality: 1828 distinct values High Cardinality
Gross Charge has a high cardinality: 1105 distinct values High Cardinality
Filename has constant value "301114775_carepartners-rehabilitation-hospital_standardcharges.csv" Constant
Filename has constant length 66 Constant Length
  • 1
  • 2

Variables

Procedure ID

categorical

Approximate Distinct Count 1868
Approximate Unique (%) 95.1%
Missing 0
Missing (%) 0.0%
Memory Size 136.2 KB

Length

Mean 5.9954
Standard Deviation 3.9985
Median 5
Minimum 2
Maximum 69

Sample

1st row 346
2nd row 494
3rd row 845
4th row 886
5th row 1196

Letter

Count 2357
Lowercase Letter 1961
Space Separator 188
Uppercase Letter 396
Dash Punctuation 2
Decimal Number 9206
  • Procedure ID contains many words: 1882 words
  • The largest value (other) is over 1.65 times larger than the second largest value (outpatient)

HCPCS/CPT Code

categorical

Approximate Distinct Count 733
Approximate Unique (%) 39.8%
Missing 124
Missing (%) 6.3%
Memory Size 144.7 KB
  • The largest value ( ) is over 54.78 times larger than the second largest value (Coding)

Length

Mean 15.5364
Standard Deviation 29.2778
Median 14
Minimum 4
Maximum 815

Sample

1st row
2nd row
3rd row
4th row
5th row

Letter

Count 898
Lowercase Letter 111
Space Separator 20490
Uppercase Letter 787
Dash Punctuation 190
Decimal Number 6729
  • The top 2 categories ( , Coding) take over 50.0%

Description

categorical

Approximate Distinct Count 1828
Approximate Unique (%) 93.9%
Missing 18
Missing (%) 0.9%
Memory Size 167.0 KB
  • The largest value (Rate) is over 1.64 times larger than the second largest value (100% of MCR)

Length

Mean 22.8587
Standard Deviation 4.087
Median 24
Minimum 3
Maximum 24

Sample

1st row ACETAMIN/BUTALBITL...
2nd row ACETAZOLAMIDE 250M...
3rd row ACYCLOVIR 200MG CA...
4th row ACYCLOVIR 5% TOP 1...
5th row ALENDRONATE 70MG T...

Letter

Count 28514
Lowercase Letter 190
Space Separator 11473
Uppercase Letter 28324
Dash Punctuation 44
Decimal Number 3587
  • Description contains many words: 2317 words
  • The largest value (tab) is over 3.7 times larger than the second largest value (mg)

Gross Charge

categorical

Approximate Distinct Count 1105
Approximate Unique (%) 59.9%
Missing 118
Missing (%) 6.0%
Memory Size 126.4 KB

Length

Mean 5.1138
Standard Deviation 1.1655
Median 5
Minimum 3
Maximum 13

Sample

1st row 6.83
2nd row 10.37
3rd row 6.30
4th row 47.86
5th row 6.90

Letter

Count 97
Lowercase Letter 44
Space Separator 42
Uppercase Letter 53
Dash Punctuation 0
Decimal Number 7416
  • Gross Charge contains many words: 1100 words

Discounted Cash Price (Gross Charges)

numerical

Approximate Distinct Count 1073
Approximate Unique (%) 59.5%
Missing 162
Missing (%) 8.2%
Infinite 0
Infinite (%) 0.0%
Memory Size 28.2 KB
Mean 338.4672
Minimum 0.12
Maximum 47687
Zeros 0
Zeros (%) 0.0%
Negatives 0
Negatives (%) 0.0%
  • Discounted Cash Price (Gross Charges) is skewed right (γ1 = 16.8956)

Quantile Statistics

Minimum 0.12
5-th Percentile 6.19
Q1 7.2125
Median 24.11
Q3 130
95-th Percentile 1365
Maximum 47687
Range 47686.88
IQR 122.7875

Descriptive Statistics

Mean 338.4672
Standard Deviation 1851.267
Variance 3.4272e+06
Sum 609917.98
Skewness 16.8956
Kurtosis 353.5823
Coefficient of Variation 5.4696
  • Discounted Cash Price (Gross Charges) is not normally distributed (p-value 4.471404020862789e-25)
  • Discounted Cash Price (Gross Charges) has 257 outliers

Filename

categorical

Approximate Distinct Count 1
Approximate Unique (%) 0.1%
Missing 0
Missing (%) 0.0%
Memory Size 251.3 KB

Length

Mean 66
Standard Deviation 0
Median 66
Minimum 66
Maximum 66

Sample

1st row 301114775_carepart...
2nd row 301114775_carepart...
3rd row 301114775_carepart...
4th row 301114775_carepart...
5th row 301114775_carepart...

Letter

Count 102128
Lowercase Letter 102128
Space Separator 0
Uppercase Letter 0
Dash Punctuation 3928
Decimal Number 17676
  • Filename has words of constant length

Interactions

Correlations

Missing Values