This Java project demonstrates a simple web scraper using Jsoup. It connects to a specified URL, retrieves the webpage's title, and extracts and prints the content from a specific section of the page.
- src/main/java/org/oxylabs/Main.java: The main class that performs web scraping.
- Connects to a webpage using Jsoup
- Retrieves and prints the webpage title
- Extracts and prints the content of the "About Us" section (or a similar section based on the provided CSS selector)
- Java Development Kit (JDK) 8 or higher
- Maven for dependency management
-
Clone the repository:
git clone https://github.com/Hemanths05/web-scraper-java.git cd web-scraper-java
-
Navigate to the project directory:
cd web-scraper-java
-
Add Jsoup dependency to your
pom.xml
:<dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.15.3</version> <!-- Check for the latest version --> </dependency> </dependencies>
-
Compile and run the project:
-
Compile the code:
mvn clean compile
-
Run the application:
mvn exec:java -Dexec.mainClass="org.oxylabs.Main"
-
- The
Main
class connects to the URLhttps://hemanths05.github.io/portfolio_/
using Jsoup. - It retrieves the title of the webpage and prints it.
- It selects the content of the section with the CSS class
.txtHead
and prints it. - If the section is not found, it prints a message indicating that the section was not found.
- The program handles exceptions and prints the stack trace if any errors occur during the connection or scraping process.
Contributions are welcome! Feel free to open an issue or submit a pull request.