Data

You’re not a real Data Engineer if you use no or low code tools

You’re not a real Data Engineer if you use no or low code tools

I tend to read a lot of posts online to keep up with developments in the data industry. This usually consists of keeping an eye on the data engineering subreddit, people I follow on Twitter (no, I’m not calling it “X”, that’s stupid), my LinkedIn network and data topics on Medium. I find this gives me a good blend of opinions and content to keep up to date with things (even if my reading list on Medium is getting wildly out of hand). One thing I find interesting is there seems to be a sense of elitism from some in the community who are quite vocal about what they perceive a Data Engineer to be (or not to be).

Data Engineer vs Analytics Engineer

Maybe this should be re-titled “what does a Data Engineer actually do?”. A recent post on reddit shared a table highlighting responsibilities of Data Engineers vs Analytics Engineers vs Data Analysts. A few highlights of this were:

  • Build custom data ingestion integrations. (Data Engineer)
  • Develop and deploy machine learning endpoints. (Data Engineer)
  • Build and maintain the data platform. (Data Engineer)
  • Data warehouse performance optimization. (Data Engineer)
  • Provide clean, transformed data ready for analysts. (Analytics Engineer)
  • Train business users on how to use a data platform data visualization tools. (Analytics Engineer)
  • Work with business users to understand data requirements. (Data Analyst)

The problem I see is that for some organisations this table might be absolutely correct. For others it might not. Personally, I’ve done all of these things and have never been titled Data Analyst, Analytics Engineer, or even Data Engineer, for that matter. For some, you’re only a Data Engineer if you’re dealing with the E&L (Extract & Load) parts of data processing. For others, if you only do the T part (Transform) then you’re an Analytics Engineer (I think we can thank dbt for that one). The confusion created by splitting out these job titles was brilliantly summed up by one commenter on the DE subreddit who asked “WTF is an Analytics Engineer?”

It’s possible some of this is down to the maturity of the data industry. I think a lot of it is down to the size of the organisation in question. A lot of smaller shops will have people in multi-disciplinary roles, maybe even dealing with the whole data lifecycle. In larger orgs, it might be more like the definitions above. There are no hard and fast rules, and IMO there shouldn’t be. Even compared with IT or dev as a whole, there are a number of newer concepts which are yet to be widely understood. In saying that, many of the tasks above have been done decades ago by people titled “SQL Developer”, “BI Developer”, “DBA”, or any number of job titles that don’t fully encapsulate what somebody does.

Tools of the trade

I was interested to read a post by Data Engineer Monica Miller about her self-doubt around the fact she’s a Data Engineer who works almost exclusively in SQL and doesn’t know Scala or work heavily with python. You know what? I don’t know much python either. I can figure it out if I need to, and I have the experience to know when something is better handled in python, or SQL, or something else. I’ve spent plenty of time working with no/low code ETL and ELT tools, from my SQL Server days working with SSIS, to adopting Matillion as one of the few available ELT tools with native Redshift compatibility back in 2016.

As I’ve gone through my career I’ve come to agree with many smart people who have long argued that code is a liability, rather than an asset.

That’s not to say one shouldn’t write code, but do so for the right reasons. Write code that adds specific value for your use case, or because you don’t have the budget to buy the tool(s) that do what you need. Don’t spend time writing code that re-solves problems that are already solved. All you’re doing is adding technical debt that somebody (possibly you) will need to maintain in future. You wouldn’t write your own UDF to perform a SUBSTRING operation, would you?

Use the tool that gets you up and running and delivering value to your business as quickly as possible. There are very few truly “wrong” decisions in this industry. You can change your approach later. Sure it might cost time and money, but who’s to say your new CTO/CDO won’t bring in a new vendor anyway? There are so many tools out there it’s impossible to define a specific stack that is the only way to do DE. Things change, and the key part of architecture is making sure you assess the landscape and make the most informed decision you can while being aware of as many risks as you can. Nothing is risk free, it’s about having reasonable mitigations in place should things go wrong.

For some teams that’ll be doing all their ELT via python code running via Airflow, for others it’ll be using Fivetran to land data and dbt to Transform it. For others it’ll be using Matillion or SSIS or other no/low-code tools to visually build pipelines, sprinkling in custom functions where the out of the box stuff doesn’t do what you need. And do you know what? They can all call themselves Data Engineers, because they are delivering data to their business by doing some or all of the ELT/ETL process.

Am I a Data Engineer?

All the elitism on social media does is gatekeep an industry that is massively growing and in need of more talented people who can understand the value in delivering clean, timely data to their business. With the exception of SQL, languages and tools tend to come and go. Fundamental understanding of data concepts such as data modelling, data pipelines, data quality, and so on will always be valuable, regardless of the tools used to deliver.

Do you think you’re a Data Engineer? Does your company call you a Data Engineer? Are you involved in the ELT process? Maybe you’re doing “old-school” ETL on so-called “legacy” on-prem solutions. Want to call yourself a Data Engineer? Go for it.

comments powered by Disqus