Skip to main content

What is ETHOS

The ETHOS (European Typology of Homelessness and Housing Exclusion) study, as carried out by Hogeschool Utrecht, aims to establish an objective count of homeless persons in the Netherlands and its regions.

The Challenge

A group of specialized professionals performed a point-in-time registration of homeless persons using a questionnaire. For each individual, personal characteristics were recorded without any unique identifier (e.g., BSN) to preserve privacy.

Because multiple professionals from different organizations were involved and homeless people cannot be uniquely identified, there is a high probability of duplicate records in the dataset.

The Solution: DEDUA Algorithm

The HU Data Science Pool developed DEDUA (DEDuplication Algorithm), which:

  • Identifies possible duplicates by analyzing patterns in personal characteristics
  • Flags records with insufficient data that may compromise matching accuracy
  • Recommends which records to retain based on data completeness when duplicates are found
  • Preserves privacy by operating without unique identifiers

Project Evolution

The ETHOS project has progressed through multiple rounds of data collection and refinement:

Rounds 1-3: Foundation & Learning

Over three completed rounds, the team developed and refined the DEDUA algorithm, establishing best practices for identifying duplicates while maintaining privacy. Each round provided valuable insights into data quality challenges and deduplication accuracy.

Round 4: Automation & Enhanced Protocols (Ongoing)

The current round focuses on:

  • Improved data cleaning protocols to ensure higher quality input data
  • Process automation to streamline the deduplication workflow
  • Enhanced algorithm efficiency based on learnings from previous rounds
  • Scalability improvements for larger datasets and more organizations

Impact

Through iterative development across multiple rounds, the DEDUA algorithm continues to improve the accuracy of homeless population counts in the Netherlands, providing crucial data for policy-making and resource allocation while respecting individual privacy.