The Good Tech Companies - Here’s How we Made a Real-time Phishing Website Detector for MacOS

Starting point is 00:00:00 This audio is presented by Hacker Noon, where anyone can learn anything about any technology. Here's how we made a real-time phishing website detector for macOS, by Moonlock, by MacPaw. This real-time, on-device antiphishing solution for macOS takes reference-based detection to a new level, instantly warning Mac users they err on a phishing website. First, background. How many unique phishing websites were published in 2023? The anti-phishing working group counted almost 5 million. At the beginning of 2024, Macpaw Cybersecurity Division Moonlock reported about the Amos Steeler relying on fake websites of trustworthy brands to spread malware on Apple computers. Not only do they infect our devices,

Starting point is 00:00:42 but they also collect victims' credentials for malicious purposes. Spoofed websites are dangerous, so my team and I decided toto something about them. The solution I describe below started as a proof-of-concept experiment at Moonlock. We ironed down the wrinkles at McPaw's technological R&D and presented the working prototype at STAST 2024. Our position paper described the solution in detail and was originally uploaded on Archive.org. For a sneak peek under the hood of our antifishing app, please read on. What do we have on our hands as of now? Current antifishing apps primarily use three detection methods, blacklisting, classification-based approach, and reference-based approach. Each method has its

Starting point is 00:01:25 advantages, but all require further improvements. Let's explore each of them. Blacklisting the blacklist approach is practical and accurate, but it can't keep up with how quickly phishing websites spread. It's not always effective since new phishing websites might still need to be added to the list, while attackers often change URLs to dodge detection. For instance, Google Safe Browsing uses lists of known phishing sites. When you try to visit a website, it checks the address against this list. If there's a match, it blocks access and warns you about the danger. But what if the website was published mere minutes ago? It won't be on the list, and the user will be betrapped.

Starting point is 00:02:03 Classification-based approach In this anti-phishing method, machine learning analyzes web page features like URL structures, HTML content, and metadata to determine whether a website is spoofed or legitimate. Classification is excellent for browser extensions because it learns from user data to spot new phishing sites. The disadvantage here is that machine learning requires complex algorithms and lots of training data, while cybercriminals swiftly invent new OBFU scatian tactics to evade detection. This makes classification-based approaches less accurate and not ideal for standalone security products. Reference-based APPROACH Some of the

Starting point is 00:02:41 reference-based solutions are considered state-of-the-art. They use computer vision to analyze webpage appearances and effectively detect phishing websites. What we also see, however, is that reference-based solutions could be faster if they weren't processing phishing cases in cloud. There's a critical time gap between a phishing website going live and the reference-based detection systems adding it to the list. We wanted to shrink this gap to ensure quicker detection and response. How our native macOS anti-phishing app works. Our goal was to warn Mac users about phishing websites as soon as they go live. To achieve this, we took the reference-based approach and improved it. We eliminated cloud processing and suggested to do all computations locally, aiming to cut detection time.

Starting point is 00:03:26 As a bonus, our solution enhances privacy since all user data is processed on the device and doesn't go anywhere else. We built a native macOS app using Swift, incorporating frameworks for screen capturing and machine learning. By converting our models to Core ML format, we ensured smooth performance and minimized the use of system resources. This way, our prototype continuously scans webpages in the background, protecting Macusers from phishing websites without requiring extra interactions. The prototype works independently from browsers. The macOS accessibility framework and accessibility metadata help the app focus on certain regions of interest so it knows where to look for phishing. Here's how it works in a nutshell. First step. Web page analysis When on a website, our app tries to understand the page layout. It identifies key page elements

Starting point is 00:04:16 like logos, input fields, and buttons. For this task, we chose DETR with RESNET50 because of its accuracy and performance. In this step, it's important to recognize the placements of the elements on the website, particularly the area with a brand logo and forms for entering credentials. Second step. Brand AT attribution next. The prototype checks if a detected logo on the website matches any well-known brands. On top of it, it compares the web page url against a reference list of legitimate websites if the website is official we skip further steps on a side note we were dismayed to see how many official domains brands use for marketing

Starting point is 00:04:56 it's no wonder phishing websites are so effective at trick has several official domains like dhl.com, express, dhl, mightly, com, dhl same day, com, and dhl express commerce.com. Third step. Prevent credential harvesting. We classify the webpage into two categories, whether it requires credentials or not. This step verifies if a phishing website is trying to steal personal user information. In the screenshot, our prototype found credential input fields, attributed the page to DHL, and checked the URL against the list of official DHL domains. The CER got a phishing warning since the page does not belong to DHL. How accurate is the prototype? Our system maintains or surpasses

Starting point is 00:05:46 baseline accuracy and surely has faster processing times. We achieved a 90, 8% accuracy in logo recognition and 98, 1% in detecting credential input. The graph below showcases our performance against other anti-phishing solutions and how we compare in precision, recall, and false positive rate. We proudly detected 87.7% of phishing attempts while keeping the false positive rate at ALOW3. 4%. It's fast and smooth, too. The final metrics demonstrate that our solution runs smoothly in the background without a noticeable loss of performance. The use of CPU is minimal. With 8 cores in Apple M1 Mac, our prototype uses just 16% of the available 800% capacity. This consumption level is similar to 3 active Safari tabs or 1 Zoom call. Final thoughts. There are plenty of anti-phishing apps on the market, but most of them process data on external servers. Our prototype shows

Starting point is 00:06:44 that hardware on modern computers allows us to bring machine learning models locally on device. We can use them to combat phishing and not worry about processing speeds and the use of system resources. Fortunately, the Apple ecosystem provides frameworks and tools for optimization. Author, Ivan Petruka, Senior Research Engineer at MacPaw Technological R&D, ex-Moonlock.

The Good Tech Companies - Here’s How we Made a Real-time Phishing Website Detector for MacOS

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.