HIPAA Deep Dive: PHI 18 identifiers, Security Rule
safeguards, BAA requirements, breach notification, and de-identification
HIPAA — Complaince
PHI · Security Rule · BAA · Breach Notification · De-identification
🔒 18 PHI Identifiers
🛡️ Security Rule
📝 BAA
🚨 Breach Notification
🧹 De-identification
PHI (Protected Health Information) = any health information that can identify a patient AND
relates to their health condition, care, or payment. It becomes PHI when ANY of these 18 identifiers are present
alongside health data. Remove all 18 → data is de-identified → no longer subject to HIPAA.
Identifier 01
Name
Full name, first name alone if combined with health data
Never log patient names in application logs
Identifier 02
Geographic Data
Street address, city, ZIP (first 3 digits of ZIP may be OK if population > 20,000)
Full ZIP codes in query params = PHI leak risk
Identifier 03
Dates (except year)
DOB, admission date, discharge date, date of death, age if > 89 years
Store as year only in de-identified datasets
Identifier 04
Phone Numbers
Any telephone number — home, cell, work, fax
Mask in UI: show only last 4 digits where possible
Identifier 05
Fax Numbers
Fax number associated with patient or their provider
Fax still widely used in healthcare — treat as PHI
Identifier 06
Email Addresses
Any email associated with the patient
Encrypt emails containing health info. Use Direct Secure Messaging for clinical email.
Identifier 07
Social Security Numbers
Full SSN or partial (last 4 digits can still be PHI in context)
Never store plaintext. Hash or tokenize. Audit access.
Identifier 08
Medical Record Numbers (MRN)
Any facility-assigned patient identifier — MRN, encounter ID, account number
MRNs in URLs or logs = PHI exposure. Use internal UUIDs instead.
Identifier 09
Health Plan Beneficiary Numbers
Insurance member ID, group number, Medicare/Medicaid ID
Common in 270/271 eligibility and 837 claims — encrypt in transit and at rest
Identifier 10
Account Numbers
Hospital billing account numbers, bank account if linked to health payment
RCM systems: mask account numbers in logs and error messages
Identifier 11
Certificate / License Numbers
Driver's license, medical license — if linked to patient health info
Identity verification workflows: treat as sensitive PII + PHI
Identifier 12
Vehicle Identifiers / Serial Numbers
VIN, license plate — can identify location of care
Rare in clinical systems but relevant in transport/ambulance data
Identifier 13
Device Identifiers
Implant serial numbers (pacemaker, hip), medical device IDs
IoT/wearable data: device ID + health reading = PHI
Identifier 14
Web URLs
URL if it identifies a patient (e.g. patient portal URL with patient ID)
Never put patient IDs in GET query params. Use POST body or session token.
Identifier 15
IP Addresses
Patient's IP address if linked to health data (e.g. portal login)
Web access logs with health context = ePHI. Protect server logs.
Identifier 16
Biometric Identifiers
Fingerprints, retinal scans, voiceprints used for patient identification
Biometric auth systems in healthcare: store hashes, not raw biometrics
Identifier 17
Full-Face Photos
Photographs that could identify the patient — clinical photos, ID photos
DICOM images often embed patient name in metadata — scrub before sharing
Identifier 18
Any Other Unique Identifying Number
Any other number or code not explicitly listed but uniquely identifying a person
Catch-all: if it can re-identify a patient when combined with health data, it's PHI
Dev rule of thumb:
If a field can answer "which patient is this?" AND the record contains
health data → it's PHI. Treat any combination of identifier + health condition as PHI by default. When in doubt,
protect it.
The HIPAA Security Rule applies specifically to ePHI (electronic PHI). It
requires three categories of safeguards. Each requirement is either REQUIRED
(must implement) or ADDRESSABLE
(implement if reasonable and appropriate, or document why not).
📋
Administrative Safeguards
Policies, training, and workforce management — ~50% of Security Rule
Required
Security Officer — designate one person responsible for HIPAA security
policy. In startups this is often the CTO or founder.
Required
Workforce Training — all employees who touch ePHI must be trained on
security policies. Document completion. Repeat annually.
Required
Access Management — formal process for granting/revoking access to ePHI
systems. Role-based access control (RBAC). Audit log of who was granted what.
Required
Contingency Plan — data backup, disaster recovery, emergency access.
RTO/RPO documented. Test the backup.
Addressable
Workforce Clearance — background checks for staff with ePHI access.
Addressable
Security Reminders — periodic security awareness updates (emails,
training refreshers).
🏢
Physical Safeguards
Physical access to systems and devices storing ePHI
Required
Facility Access Controls — locked server rooms, badge access, visitor
logs. Cloud vendors handle this for hosted systems.
Required
Workstation Use Policy — where and how workstations with ePHI access are
used. Auto-lock screens after inactivity.
Required
Device & Media Controls — procedures for disposal of hardware and media
containing ePHI. Wipe drives, shred documents.
Addressable
Workstation Security — physical protections like cable locks, privacy
screens for laptops in clinical areas.
💻
Technical Safeguards — Most Relevant to Developers
The code and infrastructure controls you build and configure
Required
Access Control — unique user IDs, no shared logins, automatic logoff,
encryption/decryption. Implement: RBAC, JWT with short expiry, MFA for ePHI systems. user_id NOT 'admin/admin'
Required
Audit Controls — record and examine activity in systems containing ePHI.
Log: who accessed, what record, when, from where. Immutable logs. Retain ≥6 years. SELECT * WHERE patient_id = X → logged
Required
Integrity Controls — ensure ePHI is not improperly altered or destroyed.
Checksums, digital signatures, version history, database transactions with rollback.
Required
Transmission Security — protect ePHI transmitted over networks. Minimum:
TLS 1.2+. All APIs must use HTTPS. No ePHI in unencrypted email. No ePHI in HTTP GET params. https:// · TLS 1.3
Addressable
Encryption at Rest — encrypt databases, file systems, backups containing
ePHI. Practically required — hard to justify not doing this. AES-256. AWS KMS / Azure Key Vault. AES-256-GCM
Addressable
Automatic Logoff — terminate sessions after a period of inactivity.
Implement as idle timeout in your session management. Standard: 15 minutes in clinical settings.
Addressable
Authentication — verify identity before granting access. In practice:
MFA is expected for any system with ePHI. FIDO2/WebAuthn for strongest security.
Quick dev checklist:
✅ HTTPS everywhere (TLS 1.2+ minimum)
✅ Encrypt DB at rest (AES-256)
✅ Unique user IDs + MFA
✅ Immutable audit logs (who, what, when)
✅ Role-based access control (RBAC)
✅ Session timeout (≤15 min idle)
✅ No PHI in URLs / query strings
✅ No PHI in application error logs
✅ Encrypted backups + tested restore
✅ Signed BAA with every cloud vendor
A Business Associate Agreement (BAA) is a legally required contract between a Covered
Entity (CE) and any Business Associate (BA) — any vendor or contractor who creates,
receives, maintains, or transmits PHI on your behalf. Without a signed BAA, both parties are in violation of
HIPAA.
1
You build a healthcare app that handles patient data → You are a Business
Associate (BA) or possibly a Covered Entity (CE)
2
You store ePHI on AWS S3 → AWS is your sub-BA. You must sign AWS's BAA
(available in AWS console). Same for Azure, GCP, Snowflake, Databricks.
3
You use Twilio to send appointment reminders with patient info → Twilio must sign
a BAA. Using a service without a BAA = HIPAA violation even if they're encrypted.
4
A hospital deploys your software → They (the CE) must sign a BAA with you
before you can access any of their patient data.
5
BAA must specify: what PHI is involved, permitted uses, security obligations, breach
reporting requirements, and how PHI is returned/destroyed at contract end.
✅ BAA Available — HIPAA-eligible vendors
AWS (HIPAA-eligible services — S3, RDS, Lambda, etc.)
Microsoft Azure (HIPAA/HITECH compliance)
Google Cloud Platform (GCP HIPAA BAA)
Snowflake (sign in web UI)
Twilio (available, must request)
SendGrid / Mailgun (with restrictions)
Auth0 / Okta (available)
Datadog, PagerDuty (available)
❌ No BAA — DO NOT use for ePHI
Slack (free/standard plan — no BAA)
Google Workspace (personal accounts)
Trello, Notion (standard plans)
GitHub (public repos — obviously)
ChatGPT / Claude API (without enterprise agreement)
Most analytics tools (Mixpanel, Amplitude — no BAA)
Zapier (standard plan)
Any free-tier SaaS tool
⚠️ Common dev mistakes:
Logging patient data to Datadog / Splunk without a BAA ·
Sending PHI in Slack messages ·
Using ChatGPT/Claude to analyze patient records without enterprise BAA ·
Storing test data with real patient records in dev/staging ·
Emailing PHI via Gmail
A breach = unauthorized acquisition, access, use, or disclosure of PHI that compromises its
security or privacy. HIPAA's Breach Notification Rule requires specific actions within strict timeframes. There is
a presumption of breach — you must prove it's NOT a breach, not the other way around.
Day 0
Breach Occurs or Is Discovered
Unauthorized access to PHI detected. Examples: database exposed publicly, ransomware,
employee snooping on celebrity patient records, wrong patient record sent to another provider, laptop stolen.
Day 1–10 (as soon as possible)
Internal Investigation & Containment
Contain the breach. Assess scope: how many records, which identifiers, what health data.
Apply the 4-factor risk assessment: (1) nature/extent of PHI, (2) who accessed it, (3) was it
actually acquired/viewed, (4) risk of harm mitigated. If low probability of compromise → not a reportable
breach.
Within 60 days of DISCOVERY
🔴 Notify Affected Individuals (Required)
Written notice by first-class mail (or email if patient consented). Must include:
description of breach, types of PHI involved, steps individuals can take, what you're doing to
investigate/mitigate, contact info. If 10+ individuals have outdated contact info → substitute notice (website
or media).
Within 60 days of DISCOVERY
🔴 Notify HHS (Required)
Report to HHS via online portal. <500 records: can submit annual log by
March 1 of following year. ≥500 records: must notify HHS within 60 days AND notify prominent
media outlets in affected state/region.
Immediately (if BA)
Notify Your Covered Entity
If you're a Business Associate, your BAA specifies how quickly you must notify the CE.
Typically without unreasonable delay. The CE's 60-day clock starts from when they discover it
(or when you notify them).
Civil Monetary Penalties (CMPs)
Tier 1 — Did Not Know
$100 – $50,000 / violation
Unaware of the violation even with reasonable diligence. Cap:
$25,000/year per category.
Tier 2 — Reasonable Cause
$1,000 – $50,000 / violation
Knew or should have known but not willful neglect. Cap:
$100,000/year.
Tier 3 — Willful Neglect, Corrected
$10,000 – $50,000 / violation
Willful neglect but violation corrected within 30 days. Cap:
$250,000/year.
Tier 4 — Willful Neglect, Not Corrected
$50,000 – $1,900,000 / violation
Willful neglect not corrected. Highest penalties. Criminal referral
possible.
Real examples: Anthem (2015) — $16M
settlement. UCLA Health — $865K. Small practices — $25K–$250K. Each patient record = potentially one "violation."
De-identification removes or transforms PHI so that data is no longer subject to HIPAA.
De-identified data can be freely shared, used for research, analytics, AI training, or published. Two official
methods under HIPAA:
Method 1: Safe Harbor
Remove ALL 18 identifiers listed in the Privacy Rule. Also: no actual knowledge that the
remaining information could identify an individual.
What gets removed:
NameDOBZIPMRNSSNPhoneEmailIPDevice IDPhoto
What remains:
Year only3-digit ZIP*Age
(if ≤89)ICD-10 codeLab values
*ZIP first 3 digits only if population > 20,000 in that
region
Method 2: Expert Determination
A statistician or expert applies methods to determine the risk of re-identification is "very
small." Allows more data to remain than Safe Harbor — including some dates and geographic detail.
Common techniques: k-anonymity (each record identical to ≥k-1 others),
l-diversity, differential privacy (add statistical noise),
tokenization (replace PHI with reversible token), data masking.
Must be documented and defensible. Expert signs off on the methodology.
Developer patterns for handling PHI safely:
Tokenization
Replace MRN "MRN-123" with opaque UUID. Store
mapping in separate secured vault. API returns token, not raw PHI.
Test Data
NEVER use real patient records in dev/staging. Use
synthetic data generators (Synthea) or properly de-identified datasets.
Logging
Scrub PHI from all logs before writing. Regex-strip SSN
patterns, email addresses, MRNs. Use log masking middleware.
AI / LLM
De-identify before sending to any LLM API. Or use
on-prem/enterprise agreements with BAA. Never send identifiable records to public APIs.