Data Engineer

Ripjar
Cheltenham
2 weeks ago
Create job alert

About Ripjar


Ripjar specialises in the development of software and data products that help governments and organisations combat serious financial crime. Our technology is used to identify criminal activity such as money laundering and terrorist financing, enabling organisations to enforce sanctions at scale to help combat rogue entities and state actors.


Data infuses everything Ripjar does. We work with a wide variety of datasets of all scales, including an ever‑growing archive of billions of news articles covering most languages going back over 30 years, sanctions and watchlist data provided by governments, and vast organisation and ownership datasets.


About the Role


We see a Data Engineer as a software engineer who specialises in distributed data systems. You’ll join the Data Engineering team, whose prime responsibility is the development and operation of the Data Collection Hub, a platform that ingests data from many sources, processes/enriches it, and distributes it to multiple downstream systems.


We’re looking for someone with 2+ years of industry experience building and operating production software who enjoys working across data pipelines, distributed systems, and operational reliability.


What you’ll do



  • Engineer distributed ingestion services that reliably pull data from diverse sources, handle messy real‑world edge cases, and deliver clean, well‑structured outputs to multiple downstream products.
  • Build high‑throughput processing components (batch and/or near‑real‑time) with a focus on performance, scalability, and predictable cost, using strong profiling and measurement practices.
  • Design and evolve data contracts (schemas, validation rules, versioning, backward compatibility) so downstream teams can build with confidence.
  • Own production quality: write maintainable code, strong unit/integration tests, and add the observability you need (metrics/logs/tracing) to diagnose issues quickly.
  • Improve platform reliability by hardening pipelines against partial failures, retries, rate limits, data drift, and infrastructure issues—then codify those learnings into better tooling and guardrails.
  • Contribute to CI/CD and developer experience: faster builds, better test signal, safer releases, and automated operational checks.
  • Participate in design reviews, code reviews, incident retrospectives, and iterative delivery—making pragmatic trade‑offs and documenting them clearly.

Technology Stack



  • Languages: Predominantly Python and Node.js
  • Distributed/data platforms: HDFS, HBase, Spark, plus increasing use of Kubernetes and cloud services
  • Storage/search: MongoDB, OpenSearch
  • Orchestration: Airflow, Dagster, NiFi
  • Tooling: GitHub, GitHub Actions, Rundeck, Jira, Confluence
  • Deployment/config: Ansible (physical), Terraform / Argo CD / Helm (Kubernetes)
  • Development environment: MacBook (typical)

Essential:



  • 2+ years building and operating production software systems
  • Fluency in at least one programming language (Python/Node.js a plus)
  • Experience debugging moderately complex systems and improving reliability/performance
  • Strong fundamentals: data structures, testing, version control, Linux basics

Nice to have:



  • Spark/PySpark experience
  • Hadoop ecosystem exposure (HDFS/HBase)
  • Workflow orchestration (Airflow/Dagster/NiFi)
  • Search/indexing (OpenSearch, MongoDB)
  • Kubernetes and infrastructure‑as‑code
  • Degree in Computer Science or numerical degree


  • Competitive salary DOE


  • 25 days annual leave + your birthday off, in addition to bank holidays, rising to 30 days after 5 years of service.
  • Remote working
  • Private Family Healthcare.
  • 35 hour working week.
  • Employee Assistance Programme.
  • Company contributions to your pension.
  • Pension salary sacrifice.
  • Enhanced maternity/paternity pay.
  • The latest tech including a top of the range MacBook Pro.


#J-18808-Ljbffr

Related Jobs

View all jobs

Data Engineer - AI Analytics and EdTech Developments

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Data Engineer

Subscribe to Future Tech Insights for the latest jobs & insights, direct to your inbox.

By subscribing, you agree to our privacy policy and terms of service.

Industry Insights

Discover insightful articles, industry insights, expert tips, and curated resources.

How Many Data Engineering Tools Do You Need to Know to Get a Data Engineering Job?

If you’re aiming for a career in data engineering, it can feel like you’re staring at a never-ending list of tools and technologies — SQL, Python, Spark, Kafka, Airflow, dbt, Snowflake, Redshift, Terraform, Kubernetes, and the list goes on. Scroll job boards and LinkedIn, and it’s easy to conclude that unless you have experience with every modern tool in the data stack, you won’t even get a callback. Here’s the honest truth most data engineering hiring managers will quietly agree with: 👉 They don’t hire you because you know every tool — they hire you because you can solve real data problems with the tools you know. Tools matter. But only in service of outcomes. Jobs are won by candidates who know why a technology is used, when to use it, and how to explain their decisions. So how many data engineering tools do you actually need to know to get a job? For most job seekers, the answer is far fewer than you think — but you do need them in the right combination and order. This article breaks down what employers really expect, which tools are core, which are role-specific, and how to focus your learning so you look capable and employable rather than overwhelmed.

What Hiring Managers Look for First in Data Engineering Job Applications (UK Guide)

If you’re applying for data engineering jobs in the UK, the first thing to understand is this: Hiring managers don’t read every word of your CV. They scan it. They look for signals of relevance, credibility, delivery and collaboration — and if they don’t see the right signals quickly, your application may never get a second look. In data engineering, hiring managers are especially focused on whether you can build and operate reliable, scalable data systems, handle real-world data challenges and work effectively with analytics, BI, data science and engineering teams. This guide breaks down exactly what they look at first in your application — and how to shape your CV, portfolio and cover letter so you stand out.

The Skills Gap in Data Engineering Jobs: What Universities Aren’t Teaching

Data engineering has quietly become one of the most critical roles in the modern technology stack. While data science and AI often receive the spotlight, data engineers are the professionals who design, build and maintain the systems that make data usable at scale. Across the UK, demand for data engineers continues to rise. Organisations in finance, retail, healthcare, government, media and technology all report difficulty hiring candidates with the right skills. Salaries remain strong, and experienced professionals are in short supply. Yet despite this demand, many graduates with degrees in computer science, data science or related disciplines struggle to secure data engineering roles. The reason is not academic ability. It is a persistent skills gap between university education and real-world data engineering work. This article explores that gap in depth: what universities teach well, what they consistently miss, why the gap exists, what employers actually want, and how jobseekers can bridge the divide to build successful careers in data engineering.