How to get the most out of Salesforce data: from Business to Data Science

Project’s evolution Intro to Jobandtalent workflow Jobandtalent is a very ambitious data-driven company that aims to automate any data process. Finding the best way to accomplish this goal is one of the responsibilities of the Data Science team, but the first step is to keep calm and understand these processes in depth. This is why […]

Project’s evolution

Intro to Jobandtalent workflow

Jobandtalent is a very ambitious data-driven company that aims to automate any data process. Finding the best way to accomplish this goal is one of the responsibilities of the Data Science team, but the first step is to keep calm and understand these processes in depth. This is why we need to have a very close relationship with the other departments; Product, Ops, Sales, and Marketing are indeed among our main partners and stakeholders.

This creates a win-win relationship between the Business and Data teams, where limerence can happen:

  • The Business teams keep us up to date with their most relevant projects, convey to us their needs, and allow us to learn from their strategies — and this is super exciting!
  • The Data Science team supports them, either through Exploratory Analysis, Dashboard Systems, ETL Automation, Advanced Analysis, or Artificial Intelligence solutions — and this is super rewarding!

It is important to point out that many times Business teams start proactively with an initial solution that quickly solves their initial problem. This is very positive for Jobandtalent as well as for the Data team, since this solution can meet their initial business needs and they can test, evaluate, mature and use their solution before the Data team steps in to build an automated and scalable one.

Intro to the Salesforce project

One of the best examples is when it comes to integrating key 3rd party products such as Salesforce. For the Data Automation team, it’s not just a matter of integration, it’s much more than that.

In order to get the most out of your CRM data, it is necessary to understand its meaning. Integrating several data sources, generating multiple tables whose information comes from 8 Salesforce objects and other Jobandtalent data, and building 6 dashboards with more than 300 charts, is only possible by working side by side with the Sales Operations team.

Photo by Josh Calabrese on Unsplash

That is why we believe it is important to talk about not only the final Business Intelligence solution but also each of the steps that every team had to take before reaching it.

Business approach

Business context: A lot of data to take advantage of

Salesforce helps us to manage all the information related to our clients and prospects and to record all of our Sales team’s actions in each of the 7 countries in which we operate. This means that Salesforce stores a large volume of data about accounts, opportunities, events (meetings), tasks (calls, emails, LinkedIn messages), contacts, users, etc.

It is not only important, but also imperative to define and analyse the related KPIs in order to make the right business decisions.

To this end, the Sales Operations team performs these main analyses:

  • Compares real vs. target information of calls, meetings, open opportunities, sent proposals, won opportunities, and tracks the fulfilment and its temporal evolution.
  • Calculates ratios regarding opportunities. Follows up the open opportunities according to the stage they are in and the type of opportunity.
  • Calculates conversion rates between calls, meetings and opportunities.
  • Makes predictions of results based on historical data.
  • Tracks the individual fulfilment of targets for each member of the Sales team.

1st Operations solution: Manual Reporting

Before defining the final requirements and involving the rest of the teams, the Operations team began to exploit this data through a first manual approach consisting of:

  1. Creating reports in Salesforce.
  2. Exportation of each report to a CSV file.
  3. Importation of the files into an Excel template containing all the calculations.
  4. Sending of the weekly report by email, through screenshots.

This first manual solution allowed the business team to explore the data and start getting super useful insights. However, there were some drawbacks:

  • Too much time spent in manual data export and import, and adaptation of the template.
  • Tedious manual preparation of various reports and emails.
  • Possible human errors in data manipulation.

2nd Operations solution: Semi-manual Reporting

The Operations team, aware that they were spending too much time on manual actions, decided to make a semi-automation of their first solution, including:

As their project evolved, the following issues emerged:

  • This add-on only allows us to refresh the information every 5 hours.
  • There were more and more tabs on the spreadsheet, and the calculations were becoming too many and too complex. It required a lot of maintenance.
  • Automation only saved time in exporting from Salesforce and importing into the spreadsheet.
  • The team spent a lot of time reviewing the spreadsheet in search of details. A solution was needed that would allow them to easily drill-down into the data.
  • The project was highly collaborative, so too many people accessed this spreadsheet and that affected its loading performance.

Data Science approach

Technology Context: How Data Science helps Business

At this point, the Data Science team stepped in to automate the entire process by an ETL solution, helping the Business team to forget about any manual tasks and focus exclusively on analysis and decision making.

Automated MVP: Programming a local ETL

Before starting on any complex integration, it is necessary to make first contact with the data and start getting familiar with it.

The first objective is to obtain an automated MVP, following the steps below:


  • Use the Salesforce API to retrieve the Salesforce data we need. The Simple Salesforce library allows you to retrieve the data through SOQL queries or directly fetch reports already created.


  • Analyse and understand all the calculations made on the spreadsheet. Work with Operations to resolve any concerns that may exist, before, during, and after the construction of the MVP.
  • Using Python, build a local ETL to replicate Salesforce reports in a Jupyter notebook, performing all the calculations and transformations defined by the Operations team, resulting in several dataframes about Accounts, Opportunities, Meetings, and Calls.


  • With these dataframes, create the test tables in our Data Warehouse to start making the first dashboarding tests in Metabase (read our blog post about Data Visualization tools at Jobandtalent).

Final solution: integrated solution in the production ETL

The time has come to implement the process in our ETL in production. The Data Engineering team adapts and refactors the code until a robust and optimized solution is achieved.

The final solution is the implementation of the following ETL in Airflow:


  • Through Stitch, every hour we dump new Salesforce raw data into our Data Lake.
  • An Airflow task unloads the data from the Data Lake.


  • Creates the dataframes that replicate Salesforce reports.
  • Generates the in-depth reports, which will be the final dataframes containing all the calculated fields.


  • Uploads the final tables to our Data Warehouse.
  • Finishes building and fine-tuning the final dashboards in Metabase.
Airflow DAG: graph view showing tasks and the running status

Next Steps: keep iterating

Small improvements, as an adaptation of small changes

  • Updating dashboards with new values of existing fields.
  • Adding new fields to existing Salesforce tables.
  • Creation of new charts.

Great improvements, as a result of the evolution of the Sales Operations reports:

  • Adaptation of ETL for the integration of Salesforce with other tools, such as SalesLoft.
  • Creation of new tables and new metrics.
  • Construction of new dashboards.


By working together with the Operations and Data Science teams, we have achieved a robust and automated solution that retrieves and calculates relevant information from Salesforce, and currently feeds a total of 6 dashboards with over 300 charts.

In short, the solution adopted has the following advantages:

  • Complete elimination of manual actions. The Operations team focuses exclusively on tracking dashboards and defining strategy.
  • Fast loading of dashboards. Very efficient for all Operations meetings.
  • A robust solution that allows us to have all the critical parts of the code properly tested (something that is not possible in other solutions even with “drag & click” ETL).
  • Clearer visualization of the results, with charts adapted to the nature of data.
  • Higher frequency of execution of ETL’s DAG. The data is updated every N minutes, where we can configure N according to the needs.
  • Drill-down that allows deep-diving into data.
  • Detailed and accessible documentation.
  • The ready-to-use tables available in the Data Warehouse allow Operations to continue experimenting with new reports, which will be automated in the near future.

We are hiring!

body[data-twttr-rendered=”true”] {background-color: transparent;}.twitter-tweet {margin: auto !important;}

function notifyResize(height) {height = height ? height : document.documentElement.offsetHeight; var resized = false; if (window.donkey && donkey.resize) {donkey.resize(height);resized = true;}if (parent && parent._resizeIframe) {var obj = {iframe: window.frameElement, height: height}; parent._resizeIframe(obj); resized = true;}if (window.location && window.location.hash === “#amp=1” && window.parent && window.parent.postMessage) {window.parent.postMessage({sentinel: “amp”, type: “embed-size”, height: height}, “*”);}if (window.webkit && window.webkit.messageHandlers && window.webkit.messageHandlers.resize) {window.webkit.messageHandlers.resize.postMessage(height); resized = true;}return resized;}‘rendered’, function (event) {notifyResize();});‘resize’, function (event) {notifyResize();});if (parent && parent._resizeIframe) {var maxWidth = parseInt(window.frameElement.getAttribute(“width”)); if ( 500 < maxWidth) {window.frameElement.setAttribute("width", "500");}}

If you want to know more about how is work at Jobandtalent, you can read the first impressions of some of our teammates on this blog post or visit our Twitter.

Acknowledgments: thanks to Michele Trevisiol, Cecil Fernandez, and Sergio Espeja for feedback and reviews. Thanks to the Data Science and Operations teams for making this project a success.

How to get the most out of Salesforce data: from Business to Data Science was originally published in Jobandtalent Engineering on Medium, where people are continuing the conversation by highlighting and responding to this story.

Source: Jobandtalent