detected the commit is purged from history. The main obstacle
to further improvements in this line of research is the lack
of training data available with known malicious commits in
OSS repositories. We encourage readers who are aware of any
examples to tweet them to us at @saintesmsr #anomalicious.
VII. CONCLUSION
Supply-chain attacks target software development ecosys-
tems and package repositories and have increased over the
past years. We presented the Anomalicious tool that identified
53.33% of malicious commits in 15 malware-infested repos-
itories, while flagging less than 5% of commits for 99% of
repositories in a large-scale experiment with NPM packages.
Our future work will focus on adding contextual factors to the
tool. We also plan to add a mechanism to automatically adjust
thresholds based on maintainer’s feedback. Another direction
for future work is to work on approaches to detect and prevent
account hijacking and typo-squatting attacks.
Acknowledgements. We thank Oege de Moor, Bas Alberts,
Adam Baldwin, and Ravi Gadhia from GitHub for encouraging
us to pursue this line of research, for their help in collecting
OSS malware datasets, and for their thoughtful feedback.
REFERENCES
[1] M. Kaczorowski, “Secure at every step: What is software
supply chain security and why does it matter?” 2020. [Online].
Available: https://github.blog/2020-09-02-secure-your-software-supply-
chain-and-protect-against-supply-chain-threats-github-blog/
[2] D. Halfin, B. Woodbury, D. Simpson, A. M. Gorze-
lany, and Eavena, “Supply chain attacks,” 2019. [Online].
Available: https://docs.microsoft.com/en-us/windows/security/threat-
protection/intelligence/supply-chain-malware
[3] M. Ohm, H. Plate, A. Sykosch, and M. Meier, “Backstabber’s knife
collection: A review of open source software supply chain attacks,”
2020.
[4] R. Duan, O. Alrawi, R. P. Kasturi, R. Elder, B. Saltaformaggio, and
W. Lee, “Measuring and preventing supply chain attacks on package
managers,” 2020.
[5] J. Overson, “How two malicious npm pack-
ages targeted & sabotaged others,” 2019. [Online].
Available: https://medium.com/@jsoverson/how-two-malicious-npm-
packages-targeted-sabotaged-one-other-fed7199099c8
[6] T. Claburn, “This typosquatting attack on npm went
undetected for 2 weeks,” 2017. [Online]. Available:
https://www.theregister.com/2017/08/02/typosquatting npm/
[7] D. Grander and L. Tal, “A post-mortem of the malicious event-
stream backdoor,” 2018. [Online]. Available: https://snyk.io/blog/a-post-
mortem-of-the-malicious-event-stream-backdoor/
[8] Merriam-Webster, “Anomalous,” in Merriam-Webster Dictionary, 2020.
[9] A. van der Stock, B. Glas, and T. Gigler, “Owasp top 10 2017: The ten
most critical web application security risks,” OWASP Foundation, vol.
null, no. null, p. 23, 2017.
[10] M. Zimmermann, C.-A. Staicu, C. Tenny, and M. Pradel, “Smallworld
with high risks: A study of security threats in the npm ecosystem,” in
Proceedings of the 28th USENIX Conference on Security Symposium,
ser. SEC’19. USA: USENIX Association, 2019, p. 995–1010.
[11] A. Decan, T. Mens, and E. Constantinou, “On the impact of
security vulnerabilities in the npm package dependency network,” in
Proceedings of the 15th International Conference on Mining Software
Repositories, ser. MSR ’18. New York, NY, USA: Association
for Computing Machinery, 2018, p. 181–191. [Online]. Available:
https://doi.org/10.1145/3196398.3196401
[12] A. Alali, H. Kagdi, and J. Maletic, “What's a typical commit? a
characterization of open source software repositories,” in 2008 16th
IEEE International Conference on Program Comprehension. IEEE,
Jun. 2008.
[13] L. Dabbish, C. Stuart, J. Tsay, and J. Herbsleb, “Social coding in
GitHub,” in Proceedings of the ACM 2012 conference on Computer
Supported Cooperative Work - CSCW '12. ACM Press, 2012.
[14] R. Goyal, G. Ferreira, C. K
¨
astner, and J. Herbsleb, “Identifying unusual
commits on GitHub,” Journal of Software: Evolution and Process,
vol. 30, no. 1, p. e1893, Sep. 2018.
[15] L. Leite, C. Treude, and F. Figueira Filho, “Uedashboard: Awareness of
unusual events in commit histories,” in Proceedings of the 2015 10th
Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE
2015. New York, NY, USA: Association for Computing Machinery,
2015, p. 978–981.
[16] C. Treude, L. Leite, and M. Aniche, “Unusual events in GitHub
repositories,” Journal of Systems and Software, vol. 142, pp. 237–247,
Aug. 2018.
[17] C. Rosen, B. Grawi, and E. Shihab, “Commit guru: Analytics and risk
prediction of software commits,” in Proceedings of the 2015 10th Joint
Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015.
New York, NY, USA: Association for Computing Machinery, 2015, p.
966–969.
[18] MITRE, “Cwe-200: Exposure of sensitive information
to an unauthorized actor,” 2020. [Online]. Available:
https://cwe.mitre.org/data/definitions/200.html
[19] A. Mu
˜
noz, “The octopus scanner malware: Attacking
the open source supply chain,” 2020. [Online]. Avail-
able: https://securitylab.github.com/research/octopus-scanner-malware-
open-source-supply-chain
[20] C. Bird, N. Nagappan, B. Murphy, H. Gall, and P. Devanbu, “Don't touch
my code!” in Proceedings of the 19th ACM SIGSOFT symposium and
the 13th European conference on Foundations of software engineering
- SIGSOFT/FSE '11. ACM Press, 2011.
[21] J. Tsay, L. Dabbish, and J. Herbsleb, “Influence of social and technical
factors for evaluating contribution in GitHub,” in Proceedings of the
36th International Conference on Software Engineering - ICSE 2014.
ACM Press, 2014.
[22] F. Calefato, F. Lanubile, and N. Novielli, “A preliminary analysis on
the effects of propensity to trust in distributed software development,”
in 2017 IEEE 12th International Conference on Global Software Engi-
neering (ICGSE). IEEE, May 2017.
[23] S. Morrison-Smith and J. Ruiz, “Challenges and barriers in virtual teams:
a literature review,” SN Applied Sciences, vol. 2, no. 6, May 2020.
[24] H. Sapkota, P. K. Murukannaiah, and Y. Wang, “A network-centric
approach for estimating trust between open source software developers,”
PLOS ONE, vol. 14, no. 12, p. e0226281, Dec. 2019.
[25] R. Pham, L. Singer, O. Liskin, F. F. Filho, and K. Schneider, “Creating a
shared understanding of testing culture on a social coding site,” in 2013
35th International Conference on Software Engineering (ICSE). IEEE,
May 2013.
[26] D. Spadini, M. Aniche, and A. Bacchelli, “PyDriller: Python framework
for mining software repositories,” in Proceedings of the 2018 26th
ACM Joint Meeting on European Software Engineering Conference and
Symposium on the Foundations of Software Engineering - ESEC/FSE
2018. New York, New York, USA: ACM Press, 2018, pp. 908–911.
[27] C. Bogart, C. Kastner, and J. Herbsleb, “When it breaks, it breaks: How
ecosystem developers reason about the stability of dependencies,” in
2015 30th IEEE/ACM International Conference on Automated Software
Engineering Workshop (ASEW). IEEE, Nov. 2015.
[28] D. Gonzalez, T. Zimmermann, and P. Godefroid, “100 npm repositories
used for anomalicious commit detection experiment,” 2020. [Online].
Available: https://zenodo.org/record/4097244
[29] Y. Dæmount, “Implement kite promotion? issue #588,”
https://github.com/atom-minimap/minimap/issues/588, 2017.
[30] K. Crawley, “Start-up accused of undermining popular open-source
tools,” https://nakedsecurity.sophos.com/2017/07/27/start-up-accused-
of-undermining-popular-open-source-tools/, 2017.
[31] Y. Kamei, E. Shihab, B. Adams, A. E. Hassan, A. Mockus, A. Sinha,
and N. Ubayashi, “A large-scale empirical study of just-in-time quality
assurance,” IEEE Transactions on Software Engineering, vol. 39, no. 6,
pp. 757–773, Jun. 2013.
[32] N. Idika and A. Mathur, “A Survey of Malware Detection Techniques,”
2007.
[33] A. Souri, and R. Hosseini, “A state-of-the-art survey of malware
detection approaches using data mining techniques,” in Hum. Cent.
Comput. Inf. Sci. 8, 3, 2018.