S3 + MLS → it just works 

Exploring MLS compensation data via S3 and Prospective

Perspective is an open source library built for high-performance streaming data visualization that lets folks quickly dissect their data → ask & answer questions → share

🔒 secure by default → data doesn't leave your network

🌊 performant → stream large data sets

⚖️ scale predictably → offload bursty ($$) server pressure

Introduction

Major League Soccer (MLS) publishes it’s player compensation information to it’s website @ Salary Guide.  The guide provides background and context on the data, including how guaranteed compensation is calculated:

"The Annual Average Guaranteed Compensation (Guaranteed Comp) number includes a player's base salary and all signing and guaranteed bonuses annualized over the term of the player's contract, including option years."

Ask & Answer

The majority of the compensation information is by year, dating back to 2007; all of the information is available via their public S3 bucket(s) via PDF files.  We’ll leave extracting insights from PDF files for another day 😅 ... but MLS’ 2024 salary information is available via CSV!

So using that information we wanted to answer the question:

What position(s) are clubs spending most of their salary on? least?

Clean the data

The data MLS provided required some cleaning and some collapsing of categories for legibility, so we used Prospective’s expression editor, ExprTK. We used computed columns in order to:

▶️ generate number columns from original Base Salary and Guaranteed Comp data provided

For example: Base Salary $89,716.00 → 89,716.00 


float(
replace_all(substring("Base Salary", 1), ',', '')
)

and Guaranteed Comp $126,383.00 → 126,383.00 


float(
replace_all(substring("Guaranteed Comp", 1), ',', '')
)

▶️ collapse the number of buckets for player positions, specifically by creating a new field called Gen Position that updated

  • Right Wing or Left Wing or Left Midfield or Right MidfieldWinger
  • Left Back or Right BackFullback
var pos := "Position(s)";
if (pos == 'Right Wing' or pos == 'Left Wing' or pos == 'Left Midfield' or pos == 'Right Midfield') {  'Winger'
} else if (pos == 'Right-back' or pos == 'Left-back') {  
'Full back'
} else {  
pos
}

▶️ Combine “First Name” and “Last Name” into a single field called “Name”

concat("First Name", ' ', "Last Name")

Global Filter

...and we added a Global Filter to be able select individual clubs → Prospective - MLS + S3

Prospective - MLS + S3

Future

It’d be awesome to join this with transfer payments data (e.g. from Transfermarkt) to get a more complete picture of spend.

TIL, "transfer payments" are payments from one club to another, for the right to sign a player off their roster.

For example, 

e.g. San Diego FC paid PSV Eindhoven $12M for rights to sign Chucky Lozano while he was under contract with PSV. This is separate from the salary / bonuses San Diego will be paying Chucky, where on the other hand, Messi joined Inter Miami on a free transfer (because his contract w/ previous club expired).

Next time...

Ad Read

Prospective’s S3 adapter (1) reduces our customers data engineering and ETL footprint and (2) directly increases their ability to execute on their core business priorities. We’ve been able to do this because of the early bet we made on the browser as the foundation for software delivery.

If you are looking for a way to make sense of S3, we’d love to chat with you about how we could simplify and enhance your existing user experience --  always happy to chat @ https://prospective.co/meet-eric or via our newsletter