Methodology
Methodological Approach
While the core question of the GSER is “Which city-level startup ecosystems maximize the chances of an entrepreneur to build a very big success?”, the core question of the APEXE Report is “How good are countries at converting their innovation potential into exponential entrepreneurship?” The primary audience is intended to be policymakers to focus attention on the areas they should address to further boost entrepreneurial innovation in their countries.
The APEXE Report takes a different approach to many other indices: rather than just measuring the factor conditions for entrepreneurial innovation (which one might consider “inputs”), or examining metrics relating to entrepreneurial activity and performance, it examines the difference between certain factor conditions and entrepreneurial activity and performance. It also scores relevant startup policies a country has put in place.
In other words, we not only expect that countries with higher levels of traditional innovation will have higher startup performance, but our methodology seeks to measure whether this performance is above or below expectations; we term this Lab-to-Startup Conversion. To this is added to a Policy Score, reflecting a number of policy areas related to startups.
To enable a fair comparison across a wide variety of countries, we normalize many metrics. While the GSER effectively normalizes city-level startup ecosystems by defining a standard geographical size for all of them, that is not possible on a country level. We therefore seek to avoid metrics that reward or penalize countries for size alone. Furthermore, for metrics with skewed distributions (such as the number of patents and Ecosystem Value) we either normalize them to the country's population or GDP, or use the log of their values.
As one example, even though it is clearly helpful for firms to be able to sell into a large domestic market (and some other indices consider this as an important factor), we do not include domestic population within the evaluation of local market conditions. This is explained further below. In this way, we attempt to avoid simply ranking the largest or richest countries, or those that have produced the most startups to date, but rather to provide a comparison of which countries are making the most of their potential.
Note: Startup Genome defines startups according to Steve Blank, one of our foundational collaborators and a global tech thought leader, as “temporary organizations in search of a repeatable and scalable business model.” They must be innovative and are meant to scale or fail, which therefore excludes all traditional SMBs/SMEs. Being innovative is a fast-evolving criteria that we continuously keep up to date through AI and machine learning algorithms.
Logic Model
Construction of the report started with a conceptual framework. In order to measure how countries were using their potential, we conceptualized the logic model shown below. In this model, potential is defined by precursor conditions outside the startup ecosystem; we also term this the Traditional Innovation Ecosystem. This potential gives rise to the startup activity and consequent performance (collectively termed the Startup Ecosystem), with this process mediated by governmental policy and programs, called Startup-Relevant Policy (or just Policy).
Based on this model, we define three main blocks, or Factors:
- Innovation Potential is intended to represent the ingredients or components of an ecosystem that are valuable inputs into the entrepreneurial process, but which typically rest outside the remit of a Ministry of Entrepreneurship. As a result, these can only be changed slowly by governments, if at all.
- Startup Ecosystem is intended to represent the activity and output of a country’s startup sector.
- Startup-Relevant Policy (or just Policy) is intended to represent the quality of a county’s startup-relevant policies and government interventions. The components of this are typically within the remit of a Ministry of Entrepreneurship (or, at least, within government as a whole), and thus should be within the power of a government to change, budgetary constraints aside.
We also define a measure which we call Lab-to-Startup Score, which is a function of Startup Ecosystem Score minus Traditional Innovation Ecosystem Score. Effectively, we are measuring whether countries are performing above or below performance expectations built upon their traditional innovation ecosystems and a number of economic and foundational factors. Furthermore, by normalizing both Innovation Potential and Startup Ecosystem Performance metrics by a mix of population size and GDP, we are quantifying to what degree resources across the whole nation — not only that of a few top cities — are being put to bear in producing economic impact through entrepreneurial innovation.
Defining Sub-Factors
Startup Genome’s primary research spanning more than a decade and five continents was used to create the missing data needed to assess and model startup ecosystems (see article "New Science of Ecosystem Assessment and Methodology"). A number of sub-factors capture, in a normalized fashion, the components that build up startup ecosystem potential (Innovation Potential) and the leading, current, and lagging variables that measure and explain startup ecosystem performance (Startup Genome's Ecosystem Success Factors).
The Innovation Potential Factor was assigned sub-factors relating to Business Foundations, Talent (skills & attitude), Infrastructure, Knowledge and R&D, and Market Access sub-factors. Policy measures were excluded. The Startup Ecosystem Factor was assigned sub-factors relating to Funding, Programs, Performance, and Talent (experience). Policy measures are again excluded.
The Policy Factor represents policies and government actions which affect sub-factors such as startup Funding, Talent, and Global Reach. Policies acting on other factors will be added over time. The rationale for the sub-factors is as described on the right:
Selection of Metrics
Following the identification of sub-factors, we identified and selected specific metrics and data sources for each.
The guiding principles for metric selection were that the data should be:
- relevant to the sub-factor concerned (and the rationale of the logic model)
- geographically complete (covering as many countries as possible, including those not part of the G20, in order to allow for future expansion of the report and a larger dataset for testing)
- reliable (not containing anomalies, errors, or biases)
- timely (not having a large time lag between the time of measurement and publication)
- likely to maintain publication in the future (in order to allow for future editions)
Invariably, many compromises had to be made in this process. It was often the case that no ideal data source could be found, so we sometimes had to resort to proxy measures (that is, measures which did not directly fit our logic model, but which we had good reason to believe were linked or strongly correlated with the metric we wanted).
In other cases, we had to accept a trade-off between characteristics of the metrics: for example, where data sources had a time-lag, meaning that the data for more recent periods was less complete or more uncertain, we sometimes used slightly older time periods, where the data was more complete – although this risks becoming less representative of the current state of a country, especially for fast-growing ecosystems.
Data Gathering & Imputation of Missing Data
Having selected the desired metrics, we attempted to gather the data. At this stage, it was often discovered that data sources were less complete than expected. Where gaps were large, the dataset was usually rejected in its entirety; where gaps were small (e.g., missing data for a few countries), we had to impute the missing data. This imputation was done on a metric-by-metric basis: in some cases, we could combine different data sources, or use slightly older data from the same source; in other cases, we could use a value for a country which was known to be very similar; in yet other cases, we had to estimate the missing data by taking averages of similar countries. On occasion, we had to amend anomalous data.
Normalization & Standardization
Given the tremendous variation in country size, normalization is obviously needed in order to make any fair comparison. Even restricting ourselves to the G20 countries, there is more than a 50-fold difference in population between largest and smallest, and more than a 60-fold difference between the size of the economies. However, it was not always obvious whether a metric should be normalized, and if so, what the normalizing measure should be. This was further complicated by feedback loops in the logic model.
Where metrics varied over several orders of magnitude, we sometimes used the logarithm of the numbers. One principle which guided this choice was that we generally wanted the distribution of sub-factor scores to be roughly normal distributions.
Adjusted Global Reach is a special case of normalization. This metric is intended to represent the degree to which startups in a country expand internationally, as measured by the percentage of startups from that country which open a secondary office abroad. However, it is clear that startups in very small countries, such as Malta, rapidly exhaust their domestic market; this is demonstrated by an inverse relationship between the logarithm of a country’s GDP and the proportion of startups which expand overseas. In this case, rather than using the percentage of secondary offices, we adjust for country size by looking at the difference between the percentage of secondary offices and the number predicted by the inverse function.
In addition to normalizing by GDP or population, data was also standardized (also called z-score normalization), to scale the data to a uniform mean and uniform standard deviation. This is a common data-scaling technique which is useful when dealing with distributions of different scales, and which preserves the shape of the original distribution.
Correlation Analysis
We wanted metrics which were not strongly correlated with one another, since correlation would indicate a high degree of redundancy, adding additional complication to the report with little extra information. In order to suggest redundant metrics, we examined correlations between every possible pair of metrics, and where the correlation coefficient was very high (r > 0.9), we considered rejecting one of the pair, and retaining whichever we considered better according to the original selection criteria. (Note: this was performed on normalized data, since unnormalized data – e.g., total STEM researchers vs. coders – would otherwise show many correlations which were simply indicative of the size of the country.) This was further refined when building the regression models (described below).
Weighting and Aggregation
Not all metrics are of equal importance, and the report aims to assign greater weight to the metrics which play a greater role in ecosystem success. In order to guide this process, we undertook a number of multivariable regression analyses to understand the contribution of each metric to ecosystem success. The process was follows:
Defining a Combined Performance Model
For the purpose of this ranking, we define successful ecosystems as those which are creating economic value from startups. In many instances, Ecosystem Value normalized by GDP (EV/GDP) is a good measure of success. However, as discussed above, this is imperfect in some situations (e.g., where a country has produced a few high-value exits but has little other activity). We therefore developed a combined model of success (our Performance Model) that captures a wider range of indicators relating to success. This Performance Model is composed of the metrics listed under Performance above, namely a combination of: Ecosystem Value relative to GDP; a normalized count of exits over $50M; a normalized count of exits over $1B; a normalized sum of late-stage funding (LSF) and a normalized count of unicorns. These components were combined to generate a combined score for each country’s ecosystem, representing a weighted, quantified summary of its performance. Note that this Performance Model score contributes towards the final ranking of the report, but is not the final ranking itself; it is introduced in the model as a means of determining what contributes towards success.
In training our regression models (below), we used this Performance Model Score as the dependent variable, and also re-ran the models with the simpler EV/GDP ratio as the dependent variable.
Regression Models
We trained four different regression models, by using (separately) the Performance Model and the EV/GDP ratio as the dependent variable, and other ecosystem metrics as independent variables:
- Random Forest: A Random Forest is an ensemble learning method that constructs multiple decision trees and combines their predictions to improve accuracy and reduce overfitting. It captures complex interactions between features by averaging multiple decision tree outputs.
- Ordinary Least Squares (OLS): OLS is a linear regression technique that minimizes the sum of the squared differences between observed and predicted values, fitting a straight line through the data points to model relationships between variables.
- Lasso (L1 Regularization): Lasso regression introduces an L1 penalty, which constrains the sum of the absolute values of the model coefficients, promoting sparsity by shrinking some coefficients to zero, effectively performing feature selection.
- Ridge (L2 Regularization): Ridge regression applies an L2 penalty to the model coefficients, which minimizes their squared values. This regularization prevents overfitting by reducing model complexity without eliminating features entirely.
Each model produced regression coefficients for the independent variables, reflecting their contribution to the Performance Model Score. We evaluated the models based on Error rate (the degree to which the model's predictions deviated from the actual Performance Model scores) and R² Score (the proportion of variance in the dependent variable that is explained by the independent variables). We selected the best-performing model based on these criteria, favoring models that assigned meaningful weights to most metrics.
Weight Assignment
To determine the weighting of metrics, we looked at the coefficients of each metric in the regression model. Metrics with higher regression coefficients were assigned higher weight. However, the weighting varied between models (i.e., with the choice of dependent variable) and so the weights were ultimately only a guide; the final decision concerning weights was based on the team’s experience.
The regression models run with all metrics provided an indication of the most significant metrics overall, and the relative weight of metrics within each sub-factor.
Once we had obtained weights for all individual metrics, we performed another round of regression analysis. This time, we used the Performance Model as the dependent variable and all the sub-factor scores as independent variables. The result was a set of regression coefficients for each sub-factor. The regression coefficients from this final model were used to assign weights to each sub-factor.
Based on the methodology, the final weights for each sub-factor were assigned as follows:
- Innovation Potential weights by Sub-Factor
- Business Foundation: 30%
- Rule of Law Index - Regulatory enforcement
- WGI: Regulatory quality
- Risk of state appropriation
- Knowledge and R&D: 20%
- GERD
- Total patent applications
- Industrial design registrations
- Science and engineering journal articles
- Talent (Skills & Attitudes): 20%
- Number of STEM researchers per million
- GEM - Intrapreneurship Data
- English proficiency score
- Market Access and Corporate Fabric: 15%
- Digital skills among population
- GDP growth
- GDP nominal
- Companies based on Forbes 2000 by GDP
- Infrastructure & Support: 15%
- ICT Development Index (IDI)
- Business Foundation: 30%
- Startup Ecosystems Performance weights by Sub-Factor
- Performance: 40%
- Ecosystem Value by GDP
- Count of exits over $50M by GDP
- Count of exits over $1B by GDP
- LSF amount by GDP
- Number of unicorns by GDP
- Funding: 35%
- Number of early-stage funding by population
- Amount of early-stage funding by GDP
- Number of new active VCs
- Number of VCs with exits
- Talent (Experience): 15%
- Github developers
- Startup experience (number of Series A rounds by population)
- Scaleup experience (exits over $50M)
- Global Reach: 5%
- GC: Inbound Score
- GMR: Outbound Score
- Programming: 5%
- Number of accelerators and incubators
- Performance: 40%
- Policy Score:
- Employee stock options (ESOPs)
- Visa
- Fund-of-Funds by GDP
- Early-stage investor support (individual relief)
- Soft-landing program
Assembling the Components
To compile the APEXE Report, we use the weighted metrics to determine the sub-factor scores, and the weighted sub-factor scores are used to determine the Factor scores.
To assemble the final ranking, we took a multiple of the Startup Ecosystem Score and subtracted the Innovation Potential score, to provide the Lab-to-Startup score. We then took a multiple of the Lab-to-Startup score and added the Policy score, to produce the APEXE score, on which countries were ranked.
Note that most of the methodological steps, including the correlation and regression analysis, was undertaken with as large a dataset as possible, including data from non-G20 countries. This was done in order to increase the power of the analysis. Non-G20 countries were removed at the final stage when the G20 ranking was produced.